Learn to sing songs in any language through interactive lyric practice. SongScript combines AI-powered transcription, vocal separation, and transliteration to help you master pronunciation and lyrics in Persian, Korean, Arabic, Hebrew, and more.
SongScript was built to connect with heritage through music. By combining modern AI (WhisperX transcription, vocal separation, and transliteration), the app makes it accessible to learn songs in languages like Persian—helping you understand pronunciation, practice singing, and deepen your connection to the culture and music that inspires you.
- Line-by-Line Learning - Follow along with lyrics displayed one line at a time
- Transliteration - See phonetic spelling to help with pronunciation
- Multiple Playback Modes:
- Fluid - Video plays continuously with synchronized lyrics
- Single - Play one line at a time, pause between lines
- Loop - Repeat the current line until you've mastered it
- Adjustable Speed - Slow down playback to 0.5x, 0.75x, or 1x
- Word-by-Word Breakdown - Tap any line to see detailed word meanings
- Progress Tracking - Mark words and lines as "learned"
- Practice Stats - Track your vocabulary, practice time, and streaks
- Learning Dashboard - See your progress across languages and songs
- Song Wishlist - Queue up songs you want to learn next
- Framework: TanStack Start + Bun
- Database: Convex (real-time sync)
- Auth: Better Auth with Convex adapter
- Styling: Tailwind CSS + shadcn/ui
- State: @convex-dev/react-query + TanStack Query
- Transcription: WhisperX (faster-whisper + alignment)
- Vocal Separation: Demucs
- Language Support: Persian (hazm), Korean (kiwipiepy), Arabic (pyarabic)
- Translation: NLLB-200
- Audio Download: yt-dlp
- Text-to-Speech: ElevenLabs API (optional, for word audio generation)
- Forvo API: Alternative for crowdsourced pronunciation audio
- Bun (v1.0 or higher)
- Python 3.9+ (for WhisperX pipeline)
- Convex account (free tier available)
- FFmpeg (for audio processing)
-
Clone the repository:
git clone https://github.com/EtanHey/songscript.git cd songscript -
Install Node.js dependencies:
bun install
-
Set up Python environment for WhisperX:
cd scripts/whisperx python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env.local
Edit
.env.localwith your values:# Convex VITE_CONVEX_URL=https://your-project.convex.cloud # Auth ADMIN_EMAIL=your-email@example.com VITE_ADMIN_EMAIL=your-email@example.com # Optional: Audio generation ELEVENLABS_API_KEY=your-elevenlabs-api-key
-
Start the development server:
# Terminal 1: Convex backend bunx convex dev # Terminal 2: Frontend (in a new terminal) bun run dev
-
Open http://localhost:3001 in your browser
The WhisperX pipeline automates transcription and transliteration. It takes a YouTube link and produces accurate lyrics with timestamps and transliteration.
cd scripts/whisperx
source venv/bin/activate
# Run full pipeline: download → separate vocals → transcribe → transliterate
python3 pipeline.py "https://www.youtube.com/watch?v=VIDEO_ID" \
-l fa \
--model large-v3 \
-o output/If you prefer more control, run each step separately:
python3 download.py "https://www.youtube.com/watch?v=VIDEO_ID" downloads/This uses yt-dlp to grab the best quality audio (usually M4A) and extracts metadata (title, artist, duration).
Remove background music and focus on vocals for cleaner transcription:
python3 separate.py downloads/SONGNAME.m4a separated/This uses Demucs (Meta's music source separation) to isolate vocals. Recommended for songs with heavy instrumentation.
python3 transcribe.py separated/SONGNAME.wav \
-l fa \
--model large-v3 \
-o output/Options:
-l, --language- Language code:fa(Persian),ko(Korean),ar(Arabic),en(English)--model- Whisper model size:tiny,base,small,medium,large-v3(larger = slower but more accurate)--align- Enable WhisperX alignment (recommended for accuracy)
python3 pipeline.py "https://www.youtube.com/watch?v=VIDEO_ID" \
-l fa \
--translate \
-o output/The pipeline generates a JSON file with:
{
"url": "https://www.youtube.com/watch?v=VIDEO_ID",
"language": "fa",
"videoInfo": {
"title": "Song Title",
"artist": "Artist Name",
"duration": 240
},
"lines": [
{
"original": "بارای",
"transliteration": "barâye",
"startTime": 0.5,
"endTime": 1.2,
"english": "for",
"words": [
{
"word": "بارای",
"transliteration": "barâye",
"startTime": 0.5,
"endTime": 1.2
}
]
}
]
}| Language | Code | Alignment Model | Difficulty |
|---|---|---|---|
| Persian | fa |
jonatasgrosman/wav2vec2-large-xlsr-53-persian |
⭐⭐⭐ |
| Korean | ko |
jonatasgrosman/wav2vec2-large-xlsr-53-korean |
⭐⭐⭐⭐ |
| Arabic | ar |
jonatasgrosman/wav2vec2-large-xlsr-53-arabic |
⭐⭐⭐⭐ |
| Hebrew | he |
imvladikon/wav2vec2-xls-r-300m-hebrew |
⭐⭐⭐⭐ |
| English | en |
(built-in) | ⭐ |
Generate audio pronunciations for individual words using ElevenLabs:
# Requires ELEVENLABS_API_KEY in .env.local
bun run scripts/generate-word-audio.tsThis creates MP3 files for each unique word, enabling users to click and hear native pronunciation.
Once you have the transcription JSON:
-
Update Song in Database:
npx convex run lyrics:updateTimestamps '{ "songId": "song-id", "unlockCode": "UNLOCK_TIMESTAMPS", "updates": [...] }'
-
Upload to Convex Storage: Use the Convex dashboard to store audio files and video references.
-
Add to Catalog: Update
convex/seed.tsor the admin panel to make the song visible in the app.
songscript/
├── src/
│ ├── components/ # React components
│ ├── routes/ # TanStack file-based routes
│ ├── hooks/ # Custom React hooks
│ ├── lib/ # Utility functions
│ └── styles.css # Global CSS
├── convex/
│ ├── schema.ts # Database schema
│ ├── songs.ts # Song queries/mutations
│ └── auth.ts # Auth configuration
├── public/
│ ├── audio/ # Generated word pronunciation audio
│ ├── video/ # Local video files
│ └── flags/ # Language flag assets
├── scripts/
│ ├── whisperx/
│ │ ├── pipeline.py # Full transcription pipeline
│ │ ├── download.py # YouTube audio download
│ │ ├── separate.py # Vocal separation (Demucs)
│ │ ├── transcribe.py # WhisperX transcription
│ │ └── requirements.txt # Python dependencies
│ ├── generate-word-audio.ts # ElevenLabs word audio generation
│ ├── fetch-forvo-audio.ts # Forvo pronunciation downloader
│ └── extract-snippets.sh # Audio snippet extraction
└── package.json # Node.js dependencies
# Run all tests once
bun run test
# Run tests in watch mode
bun run test:watch
# Run E2E tests with Playwright
bun run test:e2eTests are automatically run on commit via Husky pre-commit hooks.
bun run typecheck # Type checking
bun run lint # ESLint# Build the app
bun run build
# Preview production build
bun run previewIf you see: "Two output files share the same path but have different contents: out/*.js"
Solution:
rm -f convex/*.js
npx convex devThis happens when compiled .js files conflict with TypeScript sources. Only .ts files should be in convex/.
On macOS with Apple Silicon:
# Install PyTorch for ARM64
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpuCUDA Support (NVIDIA GPU):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118Missing FFmpeg:
# macOS
brew install ffmpeg
# Ubuntu
sudo apt-get install ffmpeg
# Windows
choco install ffmpeg- TanStack Start handles file-based routing and SSR
- Convex React Query provides real-time data sync from the database
- Tailwind + shadcn/ui components for consistent UI
- Convex manages database, authentication, and real-time subscriptions
- Better Auth handles passwordless login via magic links
- Convex Storage stores audio and video files
The WhisperX pipeline is a standalone Python application that:
- Downloads audio from YouTube
- Separates vocals using Demucs
- Transcribes with Whisper (OpenAI)
- Aligns words to timestamps using WhisperX
- Transliterates to Latin script
- Optionally translates to English
Output is a structured JSON that's then imported into the Convex database.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests locally to ensure everything passes
- Commit your changes with a descriptive message
- Push to the branch and open a Pull Request
Built with love and a desire to connect with heritage through music. 🎵