This repository implements MDriveBench, a multi-agent driving benchmark.
It was originally built on top of CoLMDriver; CoLMDriver is now one of multiple models in the repository, and the benchmark infrastructure has been built on top of it.
MDriveBench provides:
- Benchmark infrastructure (CARLA integration, scenarios, evaluation, analysis)
- Multiple baseline and LLM-based driving models (TCP, CoDriving, LMDrive, UniAD, CoLMDriver, and VAD)
- Training code for CoLMDriver components
- Quickstart
- Results Analysis and Visualization
- Challenge Submission Instructions
- Baseline Evaluation Setup
- Full Benchmark Evaluation (Internal)
Use the setup script below. It applies compatibility fixes, so start CARLA from this install.
./download_and_setup_carla.sh
export CARLA_ROOT=$PWD/carla912conda env create -f model_envs/run_custom_eval_baseline.yaml --solver libmamba
conda activate run_custom_eval_baseline# terminal A
$CARLA_ROOT/CarlaUE4.sh --world-port=2014 -RenderOffScreenRun LLM-Generated Scenarios:
# terminal B
python tools/run_custom_eval.py \
--routes-dir scenarioset/llmgen \
--agent /abs/path/to/agents.py \
--agent-config /abs/path/to/agent_config.yamlRun V2X-PnP Real-to-Sim Scenarios:
# terminal B
python tools/run_custom_eval.py \
--routes-dir scenarioset/v2xpnp \
--agent /abs/path/to/agents.py \
--agent-config /abs/path/to/agent_config.yaml \
--custom-actor-control-mode replay \
--log-replay-actorsWarmup outputs are written to results/results_driving_custom/warmupscenarios/<scenario_name>/.
Use visualization/results_analysis.py on any results folder, not just CoLMDriver outputs.
# Single run folder
python visualization/results_analysis.py \
results/results_driving_custom/<run_tag> \
--output-dir report/<run_tag>
# Compare multiple run folders and export one markdown summary
python visualization/results_analysis.py \
results/results_driving_custom/<run_tag_a> \
results/results_driving_custom/<run_tag_b> \
--output-dir report/compare \
--markdown report/compare/summary.mdThe script generates markdown/CSV summaries and plots (driving score, success rate, infractions, negotiation stats when available).
# Build a video from one scenario result folder
python visualization/gen_video.py \
results/results_driving_custom/<run_tag>/<scenario_name>/<route_run_dir> \
--output <scenario_name>.mp4Optional flags include --fps, --width, --height, and --font-scale.
To ensure your model is evaluated accurately, you must submit a single .zip file containing your model and code.
Your ZIP file must be organized as follows:
team_name.zip
├── agents.py # Main agent class (must inherit from BaseAgent)
├── config/ # Folder containing all .yaml or .py configs
├── src/ # Folder containing model architecture & utilities
├── weights/ # Folder containing all trained checkpoints (.pth/.ckpt)
└── model_env.yaml # Conda environment specification
MDriveBench supports two methods of environment provisioning. To ensure 100% reproducibility, we strongly recommend providing a Dockerfile.
-
Docker (Primary): Your Dockerfile should be based on a stable CUDA image (e.g., nvidia/cuda:11.3.1-devel-ubuntu20.04). It must install all necessary libraries so that the agent can run immediately upon container launch.
-
Conda (Fallback): If no Dockerfile is provided, we will build a dedicated environment using your model_env.yaml. Note: Your code must be compatible with Python 3.7 to interface with the CARLA 0.9.12 API. Do not include CARLA in your environment files; the evaluation server will automatically link the standardized CARLA 0.9.12 build.
Our team will manually verify your submission using the following pipeline:
- Env Build: The evaluator prioritizes the Dockerfile. If missing, it builds the Conda environment from model_env.yaml.
- Path Injection: Standardized CARLA 0.9.12 PythonAPI will be appended to your PYTHONPATH.
- Execution: Your agent will be run through a batch of closed-loop scenarios (OpenCDA, InterDrive, and Safety-critical).
- Scoring: We will record the Driving Score (DS) and Success Rate (SR) as the official leaderboard metrics.
Setup and get ckpts.
| Methods | TCP | CoDriving |
|---|---|---|
| Installation Guide | github | github |
| Checkpoints | google drive | google drive |
The downloaded checkpoints should follow this structure:
|--CoLMDriver
|--ckpt
|--codriving
|--perception
|--planning
|--TCP
|--new.ckpt- Create TCP conda environment
cd CoLMDriver
conda env create -f model_envs/tcp_codriving.yaml -n tcp_codriving
conda activate tcp_codriving- Set CARLA path environment variables
export CARLA_ROOT=PATHTOYOURREPOROOT/CoLMDriver/carla912
export PYTHONPATH=$CARLA_ROOT/PythonAPI:$CARLA_ROOT/PythonAPI/carla:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.12-py3.7-linux-x86_64.egg- Create CoDriving conda environment
cd CoLMDriver
conda env create -f model_envs/tcp_codriving.yaml -n tcp_codriving
conda activate tcp_codriving- Set CARLA path environment variables
export CARLA_ROOT=PATHTOYOURREPOROOT/CoLMDriver/carla912
export PYTHONPATH=$CARLA_ROOT/PythonAPI:$CARLA_ROOT/PythonAPI/carla:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.12-py3.7-linux-x86_64.egg- Clone LMDrive into the assets directory
git clone https://github.com/opendilab/LMDrive simulation/assets/LMDrive- Prepare LMDrive checkpoints
cd simulation/assets/LMDrive
mkdir -p ckptDownload and place the following into simulation/assets/LMDrive/ckpt:
- Vision encoder: https://huggingface.co/OpenDILabCommunity/LMDrive-vision-encoder-r50-v1.0
- LMDrive LLaVA weights: https://huggingface.co/OpenDILabCommunity/LMDrive-llava-v1.5-7b-v1.0
Download and place the following into CoLMDriver/ckpt/llava-v1.5-7b:
- Base LLaVA model: https://huggingface.co/liuhaotian/llava-v1.5-7b
- Create environment and install dependencies
cd CoLMDriver
conda env create -f model_envs/lmdrive.yaml -n lmdrive
conda activate lmdrive
pip install carla-birdeye-view==1.1.1 --no-deps
pip install -e simulation/assets/LMDrive/vision_encoder- Set CARLA path environment variables
export CARLA_ROOT=PATHTOYOURREPOROOT/CoLMDriver/carla912
export PYTHONPATH=$CARLA_ROOT/PythonAPI:$CARLA_ROOT/PythonAPI/carla:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.12-py3.7-linux-x86_64.eggUniAD is a unified perception–prediction–planning autonomous driving model.
We evaluate it on the InterDrive benchmark using its official pretrained weights and a standardized conda environment to avoid dependency conflicts.
To ensure consistent and reproducible evaluation of the UniAD baseline model, we standardize the environment setup using a pre-built conda environment. This avoids dependency conflicts and ensures that anyone can run UniAD without rebuilding environments from scratch.
The YAML file for the UniAD environment is located in:
model_envs/uniad_env.yaml
To create and activate the environment:
conda env create -f model_envs/uniad_env.yaml -n uniad_env
conda activate uniad_envUniAD runs inside the uniad_env conda environment, which contains all required CUDA, PyTorch, CARLA, and UniAD dependencies.
Create a ckpt/UniAD directory if it does not exist:
mkdir -p CoLMDriver/ckpt/UniAD
Download the UniAD checkpoint from https://huggingface.co/rethinklab/Bench2DriveZoo/blob/main/uniad_base_b2d.pth and place it here:
CoLMDriver/ckpt/UniAD/uniad_base_b2d.pth
Download the UniAD config file from https://github.com/Thinklab-SJTU/Bench2DriveZoo/blob/uniad/vad/adzoo/uniad/configs/stage2_e2e/base_e2e_b2d.py and place it in:
simulation/assets/UniAD/base_e2e_b2d.py
The YAML file for the VAD environment is located in:
model_envs/vad_env.yaml
- Create VAD conda environment
cd CoLMDriver
conda env create -f model_envs/vad_env.yaml -n vad
conda activate vadCUDA_VISIBLE_DEVICES=0 $CARLA_ROOT/CarlaUE4.sh --world-port=2000 -prefer-nvidia- Run VAD on Interdrive
# CARLA must already be running on port 2000
bash scripts/eval/eval_mode.sh 0 2000 vad ideal Interdrive_allUse this section only for CoLMDriver-specific workflows.
conda create -n vllm python=3.10
conda activate vllm
pip install vllmconda create --name colmdriver python=3.7 cmake=3.22.1
conda activate colmdriver
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda install cudnn -c conda-forge
pip install -r opencood/requirements.txt
pip install -r simulation/requirements.txt
pip install openaiWe use spconv 1.2.1 to generate voxel features in the CoLMDriver perception stack.
conda activate colmdriver
conda install -y cmake=3.22.1 ninja boost ccache -c conda-forge
pip install pybind11 numpy
git clone -b v1.2.1 --recursive https://github.com/traveller59/spconv.git
cd spconv
python setup.py bdist_wheel
pip install dist/spconv-1.2.1-*.whl
cd ..conda activate colmdriver
python setup.py develop
python opencood/utils/setup.py build_ext --inplacegit clone https://github.com/klintan/pypcd.git
cd pypcd
pip install python-lzf
python setup.py install
cd ..Step 1: Download checkpoints from Google drive. The downloaded checkpoints of CoLMDriver should follow this structure:
|--CoLMDriver
|--ckpt
|--colmdriver
|--LLM
|--perception
|--VLM
|--waypoints_plannerTo download the checkpoints through command line and move them into the correct directories (no GUI required):
# In CoLMDriver repository directory, with colmdriver conda env activated
pip install gdown
gdown 1z3poGdoomhujCNQtoQ80-BCO34GTOLb-
mkdir ckpt
mv colmdriver.zip ckpt
cd ckpt
unzip colmdriver.zip
rm colmdriver.zip
# Fix obsolete dataset dependency bug
sed -i "s|root_dir: .*|root_dir: $(pwd)|; s|test_dir: .*|test_dir: $(pwd)|; s|validate_dir: .*|validate_dir: $(pwd)|" colmdriver/percpetion/config.yaml
touch dataset_index.txtStep 2: Running VLM, LLM (from repository root)
#Enter conda ENV
conda activate vllm
# VLM on call
CUDA_VISIBLE_DEVICES=6 vllm serve ckpt/colmdriver/VLM --port 1111 --max-model-len 8192 --trust-remote-code --enable-prefix-caching
# LLM on call (in new terminal, with vllm env activated)
CUDA_VISIBLE_DEVICES=7 vllm serve ckpt/colmdriver/LLM --port 8888 --max-model-len 4096 --trust-remote-code --enable-prefix-cachingMake sure that the CUDA_VISIBLE_DEVICES variable is set to a GPU available, which can be checked using the nvidia-smi command
Note: make sure that the selected ports (1111,8888) are not occupied by other services. If you use other ports, please modify values of key 'comm_client' and 'vlm_client' in simulation/leaderboard/team_code/agent_config/colmdriver_config.yaml accordingly.
This section is for internal/lab benchmark operations (manual evaluation workflow and submission verification).
MDriveBench Leaderboard evaluates on two metrics:
- Driving Score (DS): Score of route completion adjusted by infraction penalties
- Success Rate (SR): The percentage of routes completed without failure.
A full evaluation consists of three distinct benchmarks:
OpenCDA (12 Scenarios): Uses ZIP-based scenario loading. Ensure all 12 ZIPs (including Scenes A, D, G, J) are in the opencdascenarios/ folder.
InterDrive (Full Suite): Cooperative driving evaluated via the Interdrive_all set.
Safety-Critical: Pre-crash scenarios.
Evaluation consists of 3 main phases: Submission Retrieval, Environment Setup, and Checkpoint Evaluation.
Before internal evaluation, ensure CARLA and required model-specific environments are prepared (see Quickstart and Baseline Evaluation Setup).
- Verify CARLA 0.9.12 is installed and the egg is linked.
- Ensure model-specific environments are functional (for CoLMDriver:
vllmfor inference andcolmdriverfor simulation). - Confirm model-specific dependencies are installed where required (for CoLMDriver/TCP/CoDriving stacks:
spconvandpypcd).
To transfer participant submissions from Hugging Face to the lab's local evaluation server:
Step A: Download and unzip the participant's .zip file from the submission portal into the submissions/ directory.
unzip Team-A_submission.zip -d submissions/Team-A
Step B: Verify structure. Ensure the unzipped folder contains the following files:
agents.py
config/
src/
weights/
model_env.yaml
Step C: Symbolic linking. Point the evaluation suite to the new submission.
# Remove previous link and point to the current team
rm -rf simulation/leaderboard/team_code
ln -s ${PWD}/submissions/Team-A simulation/leaderboard/team_code
To prevent discrepancies caused by library version mismatches, build a fresh environment for every team.
# Build the team's specific environment
conda env create -f submissions/Test-Team/model_env.yaml -n mdrive_eval_test
conda activate mdrive_eval_test
Step A: Inject the standardized CARLA paths into the active team environment.
export CARLA_ROOT=${CARLA_ROOT:-$PWD/carla912}
export PYTHONPATH=$PYTHONPATH:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.12-py3.7-linux-x86_64.egg
Step B: Running VLM, LLM (from repository root)
# Enter conda ENV
conda activate vllm
# VLM on call
CUDA_VISIBLE_DEVICES=6 vllm serve ckpt/colmdriver/VLM --port 1111 --max-model-len 8192 --trust-remote-code --enable-prefix-caching
# LLM on call (in new terminal, with vllm env activated)
CUDA_VISIBLE_DEVICES=7 vllm serve ckpt/colmdriver/LLM --port 8888 --max-model-len 4096 --trust-remote-code --enable-prefix-caching
Make sure CUDA_VISIBLE_DEVICES is set to an available GPU (nvidia-smi).
If you use ports other than 1111/8888, update comm_client and vlm_client in simulation/leaderboard/team_code/agent_config/colmdriver_config.yaml.
Step C: Run evaluation
# ==============================================================================
# BATCH 1: OpenCDA Scenarios (12 ZIPs)
# ==============================================================================
echo ">>> [BATCH 1/3] Running OpenCDA Scenarios..."
SCENARIO_DIR="opencdascenarios"
for zipfile in "$SCENARIO_DIR"/*.zip; do
name=$(basename "$zipfile" .zip)
$RUN_CMD tools/run_custom_eval.py \
--zip "$zipfile" \
--scenario-name "$name" \
--results-tag "${name}_${TEAM_NAME}" \
--agent "$SUB_DIR/agents.py" \
--agent-config "$SUB_DIR/config/submission_config.yaml" \
--port $PORT
done
# ==============================================================================
# BATCH 2: InterDrive Benchmark (Full Suite)
# ==============================================================================
echo ">>> [BATCH 2/3] Running InterDrive All..."
# Note: eval_mode.sh must be present in your scripts/eval directory
bash scripts/eval/eval_mode.sh $GPU $PORT $TEAM_NAME ideal Interdrive_all
# ==============================================================================
# BATCH 3: Warmup Scenarios
# ==============================================================================
echo ">>> [BATCH 3/3] Running Warmup Scenarios..."
$RUN_CMD tools/run_custom_eval.py \
--routes-dir "warmupscenarios" \
--agent "$SUB_DIR/agents.py" \
--agent-config "$SUB_DIR/config/submission_config.yaml" \
--port $PORT \
--results-tag "warmup_${TEAM_NAME}"
echo "Evaluation Complete for $TEAM_NAME."
Step D: Record DS and SR. Extract the Driving Score (DS) and Success Rate (SR) from the generated summary.json. Verify logs manually if the score is unexpectedly low to ensure no simulator glitches occurred.