Real-time LLM benchmarking tool — compare model speed, cost, and quality side-by-side.
ModelPulse is a desktop application that benchmarks Large Language Model providers head-to-head with real-time streaming. Select models, fire the same prompt at each, and instantly see which one is faster, cheaper, and better.
- Multi-provider support — OpenRouter, Groq, and OpenAI in one tool
- Real-time streaming — Watch responses arrive token-by-token with TTFT (time-to-first-token) tracking
- Side-by-side comparison — Benchmark 2 models simultaneously on the same prompt
- Cost tracking — Per-request USD cost calculated from provider pricing
- Performance metrics — TTFT, total latency, tokens/second, input/output token counts
- History — Browse and restore previous benchmark runs with full state
- Smart caching — 30-minute TTL cache for model listings (no redundant API calls)
- Persistent config — API keys and settings saved locally in TOML format
- Dark UI — Professional navy-black theme with purple-violet accents
- Python 3.11+
- At least one API key: OpenRouter, Groq, or OpenAI
git clone https://github.com/DevStrategist/ModelPulse.git
cd ModelPulse/llm-benchmark
pip install -r requirements.txtpython main.pyOn first launch, click Settings to enter your API key(s). Select models in each panel, type a prompt, and hit Run Benchmark (or Ctrl+Enter).
llm-benchmark/
├── main.py # Entry point
├── src/
│ ├── benchmark_runner.py # Orchestrates concurrent benchmark runs
│ ├── clients/ # API client implementations
│ │ ├── base_client.py # Abstract base with streaming logic
│ │ ├── openrouter_client.py
│ │ ├── groq_client.py
│ │ └── openai_client.py
│ ├── gui/ # PySide6 user interface
│ │ ├── main_window_clean_dark.py # Main application window
│ │ ├── settings_dialog.py # API key management
│ │ ├── history_widget.py # Run history sidebar
│ │ ├── dark_design_system.py # Colors, typography, spacing
│ │ └── styles/ # Qt stylesheets
│ ├── models/ # Data classes (RunResult, ModelInfo, etc.)
│ └── utils/ # Config (TOML), cache (TTL), logger (JSONL)
└── tests/ # Unit and integration tests
- User selects models and enters a prompt
BenchmarkRunnerfires concurrent async requests viahttpx- Each
Clientstreams the response, tracking TTFT and latency withtime.monotonic() - Results are displayed in real-time, with the fastest model highlighted
- Run data is logged to JSONL and stored in history for later comparison
- Async streaming —
httpx.AsyncClient.stream()for true streaming with accurate TTFT measurement - Thread isolation — Async event loops run in
QThreadworkers to keep the GUI responsive - Provider abstraction —
BaseClienthandles all streaming/timing logic; subclasses only define endpoints and headers - TTL cache — Thread-safe cache prevents redundant model-listing API calls within 30 minutes
Settings are saved to ~/.openrouter-bench/config.toml:
[api_keys]
openrouter = "sk-or-..."
groq = "gsk_..."
openai = "sk-..."
[settings]
temperature = 0.7
max_tokens = 1000Benchmark logs are appended to ~/.openrouter-bench/benchmark.jsonl.
# Unit tests
pytest tests/ -v
# Integration tests (requires API keys as env vars)
export OPENROUTER_API_KEY=your_key
export GROQ_API_KEY=your_key
pytest tests/test_integration.py -v -m integrationSee CONTRIBUTING.md for a guide on extending ModelPulse with additional LLM providers.
- Python 3.11+ with async/await
- PySide6 for the desktop GUI
- httpx for async HTTP streaming
- Pydantic for data validation
- TOML for configuration persistence