ModelPulse

Real-time LLM benchmarking tool — compare model speed, cost, and quality side-by-side.

ModelPulse is a desktop application that benchmarks Large Language Model providers head-to-head with real-time streaming. Select models, fire the same prompt at each, and instantly see which one is faster, cheaper, and better.

Features

Multi-provider support — OpenRouter, Groq, and OpenAI in one tool
Real-time streaming — Watch responses arrive token-by-token with TTFT (time-to-first-token) tracking
Side-by-side comparison — Benchmark 2 models simultaneously on the same prompt
Cost tracking — Per-request USD cost calculated from provider pricing
Performance metrics — TTFT, total latency, tokens/second, input/output token counts
History — Browse and restore previous benchmark runs with full state
Smart caching — 30-minute TTL cache for model listings (no redundant API calls)
Persistent config — API keys and settings saved locally in TOML format
Dark UI — Professional navy-black theme with purple-violet accents

Quick Start

Prerequisites

Python 3.11+
At least one API key: OpenRouter, Groq, or OpenAI

Install

git clone https://github.com/DevStrategist/ModelPulse.git
cd ModelPulse/llm-benchmark
pip install -r requirements.txt

Run

python main.py

On first launch, click Settings to enter your API key(s). Select models in each panel, type a prompt, and hit Run Benchmark (or Ctrl+Enter).

Architecture

llm-benchmark/
├── main.py                    # Entry point
├── src/
│   ├── benchmark_runner.py    # Orchestrates concurrent benchmark runs
│   ├── clients/               # API client implementations
│   │   ├── base_client.py     # Abstract base with streaming logic
│   │   ├── openrouter_client.py
│   │   ├── groq_client.py
│   │   └── openai_client.py
│   ├── gui/                   # PySide6 user interface
│   │   ├── main_window_clean_dark.py   # Main application window
│   │   ├── settings_dialog.py          # API key management
│   │   ├── history_widget.py           # Run history sidebar
│   │   ├── dark_design_system.py       # Colors, typography, spacing
│   │   └── styles/                     # Qt stylesheets
│   ├── models/                # Data classes (RunResult, ModelInfo, etc.)
│   └── utils/                 # Config (TOML), cache (TTL), logger (JSONL)
└── tests/                     # Unit and integration tests

How It Works

User selects models and enters a prompt
BenchmarkRunner fires concurrent async requests via httpx
Each Client streams the response, tracking TTFT and latency with time.monotonic()
Results are displayed in real-time, with the fastest model highlighted
Run data is logged to JSONL and stored in history for later comparison

Key Design Decisions

Async streaming — httpx.AsyncClient.stream() for true streaming with accurate TTFT measurement
Thread isolation — Async event loops run in QThread workers to keep the GUI responsive
Provider abstraction — BaseClient handles all streaming/timing logic; subclasses only define endpoints and headers
TTL cache — Thread-safe cache prevents redundant model-listing API calls within 30 minutes

Configuration

Settings are saved to ~/.openrouter-bench/config.toml:

[api_keys]
openrouter = "sk-or-..."
groq = "gsk_..."
openai = "sk-..."

[settings]
temperature = 0.7
max_tokens = 1000

Benchmark logs are appended to ~/.openrouter-bench/benchmark.jsonl.

Testing

# Unit tests
pytest tests/ -v

# Integration tests (requires API keys as env vars)
export OPENROUTER_API_KEY=your_key
export GROQ_API_KEY=your_key
pytest tests/test_integration.py -v -m integration

Adding a New Provider

See CONTRIBUTING.md for a guide on extending ModelPulse with additional LLM providers.

Tech Stack

Python 3.11+ with async/await
PySide6 for the desktop GUI
httpx for async HTTP streaming
Pydantic for data validation
TOML for configuration persistence

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
llm-benchmark		llm-benchmark
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelPulse

Features

Quick Start

Prerequisites

Install

Run

Architecture

How It Works

Key Design Decisions

Configuration

Testing

Adding a New Provider

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ModelPulse

Features

Quick Start

Prerequisites

Install

Run

Architecture

How It Works

Key Design Decisions

Configuration

Testing

Adding a New Provider

Tech Stack

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages