Skip to content

A Personal AI Assistant that lives inside your machine and works 24/7 for you.

License

Notifications You must be signed in to change notification settings

zfoong/CraftBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

340 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CraftOS Banner

CraftBot Logo

日本語版はこちら | 中文版README

🚀 Overview

CraftBot is your Personal AI Assistant that lives inside your machine and works 24/7 for you.

It autonomously interprets tasks, plans actions, and executes them to achieve your goals.

Set up CraftBot on your machine or a separate environment. Interact with it via the TUI or from anywhere through your favorite messaging apps. Extend the agent's capabilities with MCPs and Skills, and connect to tools like Google Workspace, Slack, Notion, and Telegram to expand its reach. CraftBot intelligently switches between CLI mode for standard tasks and GUI mode when screen interaction is required (GUI mode runs in an isolated environment so it won't disturb your work).

CraftBot awaits your orders, set up your own CraftBot now.


✨ Features

  • CLI/GUI Mode — Agent intelligently switches between CLI and GUI mode based on task complexity. GUI mode enables full desktop automation with screen capture, mouse/keyboard control, and window management.
  • Multi-LLM Support — Flexible LLM provider system supporting OpenAI, Google Gemini, Anthropic Claude, BytePlus, and local Ollama models. Easily switch between providers.
  • 37+ Built-in Actions — Comprehensive action library including:
    • File Operations: Find, read, write, grep, and convert files
    • Web Capabilities: HTTP requests, web search, PDF generation, image generation
    • GUI Automation: Mouse clicks, keyboard input, screenshots, window control
    • Application Control: Open apps, manage windows, clipboard operations
  • Persistent Memory — RAG-based semantic memory system powered by ChromaDB. The agent remembers context across sessions with intelligent retrieval and incremental updates.
  • External tools integration — Connect to Google Workspace, Slack, Notion, Zoom, LinkedIn, Discord, and Telegram (more to come!) with embedded credentials and OAuth support.
  • MCP — Model Context Protocol integration for extending agent capabilities with external tools and services.
  • Skills — Extensible skill framework with built-in skills for task planning, research, code review, git operations, and more.
  • Cross-Platform — Full support for Windows and Linux with platform-specific code variants and Docker containerization.

Important

Note for GUI mode: The GUI mode is still in experimental phase. This means you may encounter issues when the agent switches to GUI mode. We are actively improving this feature.


🧩 Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    User Interface Layer                      │
│  ┌──────────────────┐  ┌──────────────────────────────────┐ │
│  │  TUI (Textual)   │  │  GUI Module (Docker + Gradio)    │ │
│  └──────────────────┘  └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      Agent Base                              │
│           (Task orchestration & lifecycle management)        │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────┬──────────┼──────────┬─────────┐
        ▼         ▼          ▼          ▼         ▼
   ┌─────────┐ ┌────────┐ ┌────────┐ ┌───────┐ ┌────────┐
   │   LLM   │ │Context │ │ Action │ │ Event │ │ Memory │
   │Interface│ │ Engine │ │  Mgmt  │ │Stream │ │  (RAG) │
   └─────────┘ └────────┘ └────────┘ └───────┘ └────────┘
Component Description
Agent Base Core orchestration layer that manages task lifecycle, coordinates between components, and handles the main agentic loop.
LLM Interface Unified interface supporting multiple LLM providers (OpenAI, Gemini, Anthropic, BytePlus, Ollama).
Context Engine Generates optimized prompts with KV-cache support.
Action Manager Retrieves and executes actions from the library. Custom action is easy to extend
Action Router Intelligently selects the best matching action based on task requirements and resolves input parameters via LLM when needed.
Event Stream Real-time event publishing system for task progress tracking, UI updates, and execution monitoring.
Memory Manager RAG-based semantic memory using ChromaDB. Handles memory chunking, embedding, retrieval, and incremental updates.
State Manager Global state management for tracking agent execution context, conversation history, and runtime configuration.
Task Manager Manages task definitions, enable simple and complex tasks bode, create todos, and multi-step workflow tracking.
Skill Manager Loads and injects pluggable skills into the agent context.
MCP Adapter Model Context Protocol integration that converts MCP tools into native actions.
TUI Interface Terminal user interface built with Textual framework for interactive command-line operation.
GUI Module Experimental GUI automation using Docker containers, OmniParser for UI element detection, and Gradio client.

🔜 Roadmap

  • Memory Module — Done.
  • External Tool integration — Still adding more!
  • MCP Layer — Done.
  • Skill Layer — Done.
  • Proactive Behaviour — Pending

🧰 Getting Started

Prerequisites

  • Python 3.9+
  • git, conda, and pip
  • An API key for your chosen LLM provider (e.g., OpenAI or Gemini)

Installation

git clone https://github.com/zfoong/CraftBot.git
cd CraftBot
conda env create -f environment.yml

⚡ Quick Start

Export your API key:

export OPENAI_API_KEY=<YOUR_KEY_HERE>
or
export GOOGLE_API_KEY=<YOUR_KEY_HERE>

Run:

python start.py

This executes the built-in CraftBot, that you can communicate to:

  1. Talk to the agent
  2. Ask it to perform complex series of tasks
  3. Run command /help to seek help
  4. Get along with the AI agent
  5. Do advanced computer-use tasks with a dedicated but lightweight WebRTC Linux VM

Terminal Arguments

Argument Description
--only-cpu Run the agent on CPU mode
--fast Skip unecessary update checks and launch agent faster.
NOTE: You have to run without --fast the first time you launch
--no-omniparser Disable the use of microsoft omniparser to analyse UI - will greatly reduce GUI action accuracy. It is highly encouraged to use omniparser
--no-conda Installs all packages globally instead of inside a conda environment
--no-gui Disable GUI mode. The agent will run in CLI-only mode and cannot switch to GUI mode. This setting is persisted across restarts. OmniParser is also automatically disabled
--enable-gui Re-enable GUI mode if it was previously disabled with --no-gui. This setting is persisted across restarts

EXAMPLE

python start.py --only-cpu --fast

[!HINT] Onboarding: Launching CraftBot for the first time will trigger an onboarding sequence where you set up API keys, the agent's name, MCPs, and Skills. Then, chatting with your CraftBot for the first time will prompt an interview session so it can update its USER.md and AGENT.md for future reference.


🔐 OAuth Setup (Optional)

The agent can connect to various services using OAuth. Release builds come with embedded credentials, but you can also use your own.

Quick Start

For release builds with embedded credentials:

/google login    # Connect Google Workspace
/zoom login      # Connect Zoom
/slack invite    # Connect Slack
/notion invite   # Connect Notion
/linkedin login  # Connect LinkedIn

Service Details

Service Auth Type Command Requires Secret?
Google PKCE /google login No (PKCE)
Zoom PKCE /zoom login No (PKCE)
Slack OAuth 2.0 /slack invite Yes
Notion OAuth 2.0 /notion invite Yes
LinkedIn OAuth 2.0 /linkedin login Yes

Using Your Own Credentials

If you prefer to use your own OAuth credentials, add them to your .env file:

Google (PKCE - only Client ID needed)

GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
  1. Go to Google Cloud Console
  2. Enable Gmail, Calendar, Drive, and People APIs
  3. Create OAuth credentials as Desktop app type
  4. Copy the Client ID (secret not required for PKCE)

Zoom (PKCE - only Client ID needed)

ZOOM_CLIENT_ID=your-zoom-client-id
  1. Go to Zoom Marketplace
  2. Create an OAuth app
  3. Copy the Client ID

Slack (Requires both)

SLACK_SHARED_CLIENT_ID=your-slack-client-id
SLACK_SHARED_CLIENT_SECRET=your-slack-client-secret
  1. Go to Slack API
  2. Create a new app
  3. Add OAuth scopes: chat:write, channels:read, users:read, etc.
  4. Copy Client ID and Client Secret

Notion (Requires both)

NOTION_SHARED_CLIENT_ID=your-notion-client-id
NOTION_SHARED_CLIENT_SECRET=your-notion-client-secret
  1. Go to Notion Developers
  2. Create a new integration (Public integration)
  3. Copy OAuth Client ID and Secret

LinkedIn (Requires both)

LINKEDIN_CLIENT_ID=your-linkedin-client-id
LINKEDIN_CLIENT_SECRET=your-linkedin-client-secret
  1. Go to LinkedIn Developers
  2. Create an app
  3. Add OAuth 2.0 scopes
  4. Copy Client ID and Client Secret

Run with container

The repository root included a Docker configuration with Python 3.10, key system packages (including Tesseract for OCR), and all Python dependencies defined in environment.yml/requirements.txt so the agent can run consistently in isolated environments.

Below are the setup instruction of running our agent with container.

Build the image

From the repository root:

docker build -t craftbot .

Run the container

The image is configured to launch the agent with python -m core.main by default. To run it interactively:

docker run --rm -it craftbot

If you need to supply environment variables, pass an env file (for example, based on .env.example):

docker run --rm -it --env-file .env craftbot

Mount any directories that should persist outside the container (such as data or cache folders) using -v, and adjust ports or additional flags as needed for your deployment. The container ships with system dependencies for OCR (tesseract), screen automation (pyautogui, mss, X11 utilities, and a virtual framebuffer), and common HTTP clients so the agent can work with files, network APIs, and GUI automation inside the container.

Enabling GUI/screen automation

GUI actions (mouse/keyboard events, screenshots) require an X11 server. You can either attach to your host display or run headless with xvfb:

  • Use the host display (requires Linux with X11):

    docker run --rm -it 
      -e DISPLAY=$DISPLAY \
      -v /tmp/.X11-unix:/tmp/.X11-unix \
      -v $(pwd)/data:/app/core/data \
      craftbot

    Add extra -v mounts for any folders the agent should read/write.

  • Run headlessly with a virtual display:

      docker run --rm -it --env-file .env craftbot bash -lc "Xvfb :99 -screen 0 1920x1080x24 & export DISPLAY=:99 && exec python -m core.main"

By default the image uses Python 3.10 and bundles the Python dependencies from environment.yml/requirements.txt, so python -m core.main works out of the box.


🤝 How to Contribute

Contributions and suggestions are welcome! You can contact @zfoong @ thamyikfoong(at)craftos.net. We currently don't have checks set up, so we can't allow direct contributions but we appreciate any suggestions and feedback.

🧾 License

This project is licensed under the MIT License. You are free to use, host, and monetize this project (you must credit this project in case of distribution and monetization).


⭐ Acknowledgements

Developed and maintained by CraftOS and contributors @zfoong and @ahmad-ajmal.
If you find CraftBot useful, please ⭐ the repository and share it with others!

About

A Personal AI Assistant that lives inside your machine and works 24/7 for you.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages