Skip to content

DevStrategist/ModelPulse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelPulse

Real-time LLM benchmarking tool — compare model speed, cost, and quality side-by-side.

Python 3.11+ License: MIT PySide6

ModelPulse is a desktop application that benchmarks Large Language Model providers head-to-head with real-time streaming. Select models, fire the same prompt at each, and instantly see which one is faster, cheaper, and better.

Features

  • Multi-provider support — OpenRouter, Groq, and OpenAI in one tool
  • Real-time streaming — Watch responses arrive token-by-token with TTFT (time-to-first-token) tracking
  • Side-by-side comparison — Benchmark 2 models simultaneously on the same prompt
  • Cost tracking — Per-request USD cost calculated from provider pricing
  • Performance metrics — TTFT, total latency, tokens/second, input/output token counts
  • History — Browse and restore previous benchmark runs with full state
  • Smart caching — 30-minute TTL cache for model listings (no redundant API calls)
  • Persistent config — API keys and settings saved locally in TOML format
  • Dark UI — Professional navy-black theme with purple-violet accents

Quick Start

Prerequisites

Install

git clone https://github.com/DevStrategist/ModelPulse.git
cd ModelPulse/llm-benchmark
pip install -r requirements.txt

Run

python main.py

On first launch, click Settings to enter your API key(s). Select models in each panel, type a prompt, and hit Run Benchmark (or Ctrl+Enter).

Architecture

llm-benchmark/
├── main.py                    # Entry point
├── src/
│   ├── benchmark_runner.py    # Orchestrates concurrent benchmark runs
│   ├── clients/               # API client implementations
│   │   ├── base_client.py     # Abstract base with streaming logic
│   │   ├── openrouter_client.py
│   │   ├── groq_client.py
│   │   └── openai_client.py
│   ├── gui/                   # PySide6 user interface
│   │   ├── main_window_clean_dark.py   # Main application window
│   │   ├── settings_dialog.py          # API key management
│   │   ├── history_widget.py           # Run history sidebar
│   │   ├── dark_design_system.py       # Colors, typography, spacing
│   │   └── styles/                     # Qt stylesheets
│   ├── models/                # Data classes (RunResult, ModelInfo, etc.)
│   └── utils/                 # Config (TOML), cache (TTL), logger (JSONL)
└── tests/                     # Unit and integration tests

How It Works

  1. User selects models and enters a prompt
  2. BenchmarkRunner fires concurrent async requests via httpx
  3. Each Client streams the response, tracking TTFT and latency with time.monotonic()
  4. Results are displayed in real-time, with the fastest model highlighted
  5. Run data is logged to JSONL and stored in history for later comparison

Key Design Decisions

  • Async streaminghttpx.AsyncClient.stream() for true streaming with accurate TTFT measurement
  • Thread isolation — Async event loops run in QThread workers to keep the GUI responsive
  • Provider abstractionBaseClient handles all streaming/timing logic; subclasses only define endpoints and headers
  • TTL cache — Thread-safe cache prevents redundant model-listing API calls within 30 minutes

Configuration

Settings are saved to ~/.openrouter-bench/config.toml:

[api_keys]
openrouter = "sk-or-..."
groq = "gsk_..."
openai = "sk-..."

[settings]
temperature = 0.7
max_tokens = 1000

Benchmark logs are appended to ~/.openrouter-bench/benchmark.jsonl.

Testing

# Unit tests
pytest tests/ -v

# Integration tests (requires API keys as env vars)
export OPENROUTER_API_KEY=your_key
export GROQ_API_KEY=your_key
pytest tests/test_integration.py -v -m integration

Adding a New Provider

See CONTRIBUTING.md for a guide on extending ModelPulse with additional LLM providers.

Tech Stack

  • Python 3.11+ with async/await
  • PySide6 for the desktop GUI
  • httpx for async HTTP streaming
  • Pydantic for data validation
  • TOML for configuration persistence

License

MIT

About

Real-time LLM benchmarking tool — compare model speed, cost, and quality side-by-side

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages