emlearn GBT Benchmark Suite

Fork of emlearn/emlearn — Machine learning inference for microcontrollers. This fork adds comprehensive GradientBoosting (GBT) vs RandomForest (RF) benchmarks for embedded inference. For installation, usage, and API docs, see the upstream repository.

Key Findings

Benchmarked on Renode nRF52840 (64 MHz Cortex-M4F, hardware FPU). Cycle counts use instruction-level simulation via DWT. Results are preliminary and may vary with different configurations, datasets, or hardware.

GBT excels on complex tasks: 90-92% accuracy on digits (10-class) where RF plateaus at 78-92%
RF wins on simple datasets: 100% accuracy on iris/wine with minimal flash (<20 KB)
LUT-optimized activations: 1.05-1.72x speedup for GBT predict_proba via sigmoid/softmax LUT approximations
GBT calibration advantage: Better Brier scores on larger/harder datasets (digits, sonar, embedded_synth)
Flash tradeoffs: GBT uses more flash for classification (float leaves); for regression, GBT is typically smaller

Raw data: examples/mcu_benchmark/data/

Binary Classification

predict_proba mode: RF, GBT (standard), and GBT+LUT variants. Bar height = CPU cycles (log scale), labels show accuracy and flash size. GBT+LUT achieves 1.10-1.72x speedup over standard GBT.

Flash vs accuracy trade-off. Solid=d3, dashed=d5. Each line: n=3 to n=40 trees.

Multi-class Classification

predict_proba mode: GBT+LUT achieves 1.05-1.54x speedup over standard GBT.

Flash vs accuracy curves. Iris/Wine: RF reaches 100% with minimal flash. Digits: GBT leads at 90-92%.

Probability Calibration

Dataset	GBT Brier	RF Brier
digits (10-class)	0.005	0.028
embedded_synth (3-class)	0.055	0.090
sonar (binary)	0.127	0.139
wine (3-class)	0.025	0.015
iris (3-class)	0.005	0.006
breast_cancer (binary)	0.031	0.031

Brier score (lower = better). GBT shows superior calibration on complex datasets; RF matches or wins on simpler ones.

Regression

Dataset	n	d	Type	Cycles	Flash	R²
additive_synth	40	5	GBT	45,362	198 KB	0.97
additive_synth	40	5	RF	45,542	219 KB	0.91
california	40	5	GBT	45,274	184 KB	0.75
california	40	5	RF	45,233	193 KB	0.69
diabetes	40	5	GBT	45,366	173 KB	0.37
diabetes	40	5	RF	45,425	192 KB	0.48

GBT shows advantage on additive regression (0.97 vs 0.91 R²). RF wins on diabetes (0.48 vs 0.37 R²).

Sample Efficiency

GBT reaches 90%+ of peak accuracy with n=2-4 trees on most datasets. RF saturates faster on simple datasets.

Size-Constrained Performance

Best accuracy/R² at fixed flash budgets (2-64 KB):

Dataset	2 KB	8 KB	32 KB	64 KB
embedded_synth	--	RF ~0.58	RF ~0.81	GBT ~0.84
sonar	--	GBT ~0.77	RF ~0.80	RF ~0.81
iris	--	RF 1.00	GBT 1.00	GBT 1.00
wine	--	RF ~0.90	RF 1.00	RF 1.00
breast_cancer	--	GBT ~0.94	GBT ~0.96	GBT ~0.96
digits	--	RF ~0.45	RF ~0.77	RF ~0.89
additive_synth	--	RF ~0.70	RF ~0.87	GBT ~0.96
california	--	RF ~0.54	GBT ~0.63	GBT ~0.72
diabetes	--	RF ~0.45	RF ~0.47	RF ~0.48

LUT Optimizations

GBT predict_proba uses expensive activation functions (sigmoid, softmax). LUT approximations trade minimal accuracy loss for significant speedup:

Activation	Classes	LUT Size	Flash Cost	Speedup Range
Sigmoid	Binary	17 floats	68 bytes	1.10-1.72x
Softmax	Multi-class	33 floats	132 bytes	1.05-1.54x

Small ensembles (n=3) benefit most: 1.28-1.72x speedup. Large ensembles (n=40): ~1.05-1.14x as tree traversal dominates.

Running the Benchmarks

See examples/mcu_benchmark/README.md for full setup, CLI reference, sweep parameters, dataset details, and platform configuration.

# Quick validation (host only)
.venv/bin/python examples/mcu_benchmark/run_all.py --benchmark latency --quick --host-only

# Renode benchmarks (requires Zephyr environment)
source .env.local
.venv/bin/python examples/mcu_benchmark/run_all.py --benchmark all --quick --renode-only

# Generate figures
.venv/bin/python examples/mcu_benchmark/generate_figures.py runs/<timestamp>_sweep

Sweep Parameters

Parameter	Values
n_estimators	3, 10, 20, 40
max_depth	3, 5
learning_rate (GBT)	0.1, 0.2, 0.5

9 datasets: 6 classification (binary + multi-class) and 3 regression. All results use test_size=0.33, random_state=42.

Platform

Platform	Timing	Use
Host (CFFI)	Python overhead	Functional validation
Renode nRF52840	DWT cycles	Instruction-level timing
Hardware nRF52 DK	DWT cycles	Ground truth

Name		Name	Last commit message	Last commit date
Latest commit History 802 Commits
.github		.github
bindings		bindings
brand		brand
docs		docs
emlearn		emlearn
examples		examples
platform_examples/zephyr		platform_examples/zephyr
test		test
zephyr		zephyr
.appveyor.yml		.appveyor.yml
.env.local.example		.env.local.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
CITATION.cff		CITATION.cff
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
library.json		library.json
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

emlearn GBT Benchmark Suite

Key Findings

Binary Classification

Multi-class Classification

Probability Calibration

Regression

Sample Efficiency

Size-Constrained Performance

LUT Optimizations

Running the Benchmarks

Sweep Parameters

Platform

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

emlearn GBT Benchmark Suite

Key Findings

Binary Classification

Multi-class Classification

Probability Calibration

Regression

Sample Efficiency

Size-Constrained Performance

LUT Optimizations

Running the Benchmarks

Sweep Parameters

Platform

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors