Status: Archived
A simple console-based chat interface for running local LLMs. Built November 2023.
Loads a local GGUF/GGML model and runs an interactive chat session in the terminal. The user types messages and the model streams responses back in real time. Uses a "chat-with-bob" system prompt format with a helpful assistant persona.
- Streams token-by-token output to the console
- Runs inference on GPU via CUDA 12
- Color-coded user input (green) vs. assistant output (white)
- C# / .NET 8
- LLamaSharp 0.7.0 (C# bindings for llama.cpp)
- LLamaSharp.Backend.Cuda12 for GPU acceleration
- Tested with WizardLM-7B and Orca-2-7B models
- Place a GGUF model file somewhere on disk
- Update
modelPathinProgram.csto point to your model dotnet run
This was an early experiment in local LLM integration -- one of my first attempts at building something with language models. It's minimal by design: a single Program.cs file, no abstractions, just a direct chat loop.
MIT