SongScript

Learn to sing songs in any language through interactive lyric practice. SongScript combines AI-powered transcription, vocal separation, and transliteration to help you master pronunciation and lyrics in Persian, Korean, Arabic, Hebrew, and more.

The Story Behind SongScript

SongScript was built to connect with heritage through music. By combining modern AI (WhisperX transcription, vocal separation, and transliteration), the app makes it accessible to learn songs in languages like Persian—helping you understand pronunciation, practice singing, and deepen your connection to the culture and music that inspires you.

Features

Line-by-Line Learning - Follow along with lyrics displayed one line at a time
Transliteration - See phonetic spelling to help with pronunciation
Multiple Playback Modes:
- Fluid - Video plays continuously with synchronized lyrics
- Single - Play one line at a time, pause between lines
- Loop - Repeat the current line until you've mastered it
Adjustable Speed - Slow down playback to 0.5x, 0.75x, or 1x
Word-by-Word Breakdown - Tap any line to see detailed word meanings
Progress Tracking - Mark words and lines as "learned"
Practice Stats - Track your vocabulary, practice time, and streaks
Learning Dashboard - See your progress across languages and songs
Song Wishlist - Queue up songs you want to learn next

Tech Stack

Frontend & Backend

Framework: TanStack Start + Bun
Database: Convex (real-time sync)
Auth: Better Auth with Convex adapter
Styling: Tailwind CSS + shadcn/ui
State: @convex-dev/react-query + TanStack Query

Audio Processing Pipeline (WhisperX)

Transcription: WhisperX (faster-whisper + alignment)
Vocal Separation: Demucs
Language Support: Persian (hazm), Korean (kiwipiepy), Arabic (pyarabic)
Translation: NLLB-200
Audio Download: yt-dlp

Word Pronunciation

Text-to-Speech: ElevenLabs API (optional, for word audio generation)
Forvo API: Alternative for crowdsourced pronunciation audio

Getting Started

Prerequisites

Bun (v1.0 or higher)
Python 3.9+ (for WhisperX pipeline)
Convex account (free tier available)
FFmpeg (for audio processing)

Installation

Clone the repository:

git clone https://github.com/EtanHey/songscript.git
cd songscript

Install Node.js dependencies:
```
bun install
```

Set up Python environment for WhisperX:

cd scripts/whisperx
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Set up environment variables:

cp .env.example .env.local

Edit .env.local with your values:

# Convex
VITE_CONVEX_URL=https://your-project.convex.cloud

# Auth
ADMIN_EMAIL=your-email@example.com
VITE_ADMIN_EMAIL=your-email@example.com

# Optional: Audio generation
ELEVENLABS_API_KEY=your-elevenlabs-api-key

Start the development server:

# Terminal 1: Convex backend
bunx convex dev

# Terminal 2: Frontend (in a new terminal)
bun run dev

Open http://localhost:3001 in your browser

WhisperX Pipeline: Adding New Songs

The WhisperX pipeline automates transcription and transliteration. It takes a YouTube link and produces accurate lyrics with timestamps and transliteration.

Quick Start (30 minutes)

cd scripts/whisperx
source venv/bin/activate

# Run full pipeline: download → separate vocals → transcribe → transliterate
python3 pipeline.py "https://www.youtube.com/watch?v=VIDEO_ID" \
  -l fa \
  --model large-v3 \
  -o output/

Step-by-Step: Manual Pipeline

If you prefer more control, run each step separately:

1. Download Audio

python3 download.py "https://www.youtube.com/watch?v=VIDEO_ID" downloads/

This uses yt-dlp to grab the best quality audio (usually M4A) and extracts metadata (title, artist, duration).

2. Separate Vocals (Optional)

Remove background music and focus on vocals for cleaner transcription:

python3 separate.py downloads/SONGNAME.m4a separated/

This uses Demucs (Meta's music source separation) to isolate vocals. Recommended for songs with heavy instrumentation.

3. Transcribe with WhisperX

python3 transcribe.py separated/SONGNAME.wav \
  -l fa \
  --model large-v3 \
  -o output/

Options:

-l, --language - Language code: fa (Persian), ko (Korean), ar (Arabic), en (English)
--model - Whisper model size: tiny, base, small, medium, large-v3 (larger = slower but more accurate)
--align - Enable WhisperX alignment (recommended for accuracy)

4. Get Transliteration

python3 pipeline.py "https://www.youtube.com/watch?v=VIDEO_ID" \
  -l fa \
  --translate \
  -o output/

Pipeline Output

The pipeline generates a JSON file with:

{
  "url": "https://www.youtube.com/watch?v=VIDEO_ID",
  "language": "fa",
  "videoInfo": {
    "title": "Song Title",
    "artist": "Artist Name",
    "duration": 240
  },
  "lines": [
    {
      "original": "بارای",
      "transliteration": "barâye",
      "startTime": 0.5,
      "endTime": 1.2,
      "english": "for",
      "words": [
        {
          "word": "بارای",
          "transliteration": "barâye",
          "startTime": 0.5,
          "endTime": 1.2
        }
      ]
    }
  ]
}

Language Support & Models

Language	Code	Alignment Model	Difficulty
Persian	`fa`	`jonatasgrosman/wav2vec2-large-xlsr-53-persian`	⭐⭐⭐
Korean	`ko`	`jonatasgrosman/wav2vec2-large-xlsr-53-korean`	⭐⭐⭐⭐
Arabic	`ar`	`jonatasgrosman/wav2vec2-large-xlsr-53-arabic`	⭐⭐⭐⭐
Hebrew	`he`	`imvladikon/wav2vec2-xls-r-300m-hebrew`	⭐⭐⭐⭐
English	`en`	(built-in)	⭐

Word Pronunciation Audio

Generate audio pronunciations for individual words using ElevenLabs:

# Requires ELEVENLABS_API_KEY in .env.local
bun run scripts/generate-word-audio.ts

This creates MP3 files for each unique word, enabling users to click and hear native pronunciation.

Integrating with the App

Once you have the transcription JSON:

Update Song in Database:

npx convex run lyrics:updateTimestamps '{
  "songId": "song-id",
  "unlockCode": "UNLOCK_TIMESTAMPS",
  "updates": [...]
}'

Upload to Convex Storage: Use the Convex dashboard to store audio files and video references.
Add to Catalog: Update convex/seed.ts or the admin panel to make the song visible in the app.

Project Structure

songscript/
├── src/
│   ├── components/          # React components
│   ├── routes/              # TanStack file-based routes
│   ├── hooks/               # Custom React hooks
│   ├── lib/                 # Utility functions
│   └── styles.css           # Global CSS
├── convex/
│   ├── schema.ts            # Database schema
│   ├── songs.ts             # Song queries/mutations
│   └── auth.ts              # Auth configuration
├── public/
│   ├── audio/               # Generated word pronunciation audio
│   ├── video/               # Local video files
│   └── flags/               # Language flag assets
├── scripts/
│   ├── whisperx/
│   │   ├── pipeline.py      # Full transcription pipeline
│   │   ├── download.py      # YouTube audio download
│   │   ├── separate.py      # Vocal separation (Demucs)
│   │   ├── transcribe.py    # WhisperX transcription
│   │   └── requirements.txt # Python dependencies
│   ├── generate-word-audio.ts      # ElevenLabs word audio generation
│   ├── fetch-forvo-audio.ts        # Forvo pronunciation downloader
│   └── extract-snippets.sh         # Audio snippet extraction
└── package.json             # Node.js dependencies

Development

Running Tests

# Run all tests once
bun run test

# Run tests in watch mode
bun run test:watch

# Run E2E tests with Playwright
bun run test:e2e

Tests are automatically run on commit via Husky pre-commit hooks.

TypeScript & Linting

bun run typecheck      # Type checking
bun run lint           # ESLint

Building for Production

# Build the app
bun run build

# Preview production build
bun run preview

Troubleshooting

Convex Bundler Error

If you see: "Two output files share the same path but have different contents: out/*.js"

Solution:

rm -f convex/*.js
npx convex dev

This happens when compiled .js files conflict with TypeScript sources. Only .ts files should be in convex/.

WhisperX Installation Issues

On macOS with Apple Silicon:

# Install PyTorch for ARM64
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

CUDA Support (NVIDIA GPU):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Missing FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu
sudo apt-get install ffmpeg

# Windows
choco install ffmpeg

Architecture

Frontend

TanStack Start handles file-based routing and SSR
Convex React Query provides real-time data sync from the database
Tailwind + shadcn/ui components for consistent UI

Backend

Convex manages database, authentication, and real-time subscriptions
Better Auth handles passwordless login via magic links
Convex Storage stores audio and video files

Audio Processing

The WhisperX pipeline is a standalone Python application that:

Downloads audio from YouTube
Separates vocals using Demucs
Transcribes with Whisper (OpenAI)
Aligns words to timestamps using WhisperX
Transliterates to Latin script
Optionally translates to English

Output is a structured JSON that's then imported into the Convex database.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Run tests locally to ensure everything passes
Commit your changes with a descriptive message
Push to the branch and open a Pull Request

License

MIT

Built with love and a desire to connect with heritage through music. 🎵

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
.claude		.claude
.husky		.husky
convex		convex
docs.local		docs.local
docs/plans		docs/plans
e2e		e2e
prd-json		prd-json
public		public
scripts		scripts
src		src
.claude-project-id		.claude-project-id
.cta.json		.cta.json
.env.example		.env.example
.geminiignore		.geminiignore
.gitignore		.gitignore
.nvmrc		.nvmrc
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lockb		bun.lockb
components.json		components.json
frontend-wiring-analysis.md		frontend-wiring-analysis.md
index.json		index.json
package.json		package.json
playwright.config.ts		playwright.config.ts
progress.txt		progress.txt
screenshot.png		screenshot.png
timestamp-issues-documentation.md		timestamp-issues-documentation.md
tsconfig.json		tsconfig.json
update.json		update.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

SongScript

The Story Behind SongScript

Features

Tech Stack

Frontend & Backend

Audio Processing Pipeline (WhisperX)

Word Pronunciation

Getting Started

Prerequisites

Installation

WhisperX Pipeline: Adding New Songs

Quick Start (30 minutes)

Step-by-Step: Manual Pipeline

1. Download Audio

2. Separate Vocals (Optional)

3. Transcribe with WhisperX

4. Get Transliteration

Pipeline Output

Language Support & Models

Word Pronunciation Audio

Integrating with the App

Project Structure

Development

Running Tests

TypeScript & Linting

Building for Production

Troubleshooting

Convex Bundler Error

WhisperX Installation Issues

Architecture

Frontend

Backend

Audio Processing

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages