Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 23 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,44 @@
# Microimpute

Microimpute enables variable imputation through a variety of statistical methods. By providing a consistent interface across different imputation techniques, it allows researchers and data scientists to easily compare and benchmark different approaches using quantile loss and log loss calculations to determine the method providing most accurate results.
Microimpute is a Python package for imputing variables from one survey dataset onto another. It wraps five imputation methods behind a common interface so you can benchmark them on your data and pick the one that works best, rather than defaulting to a single approach.

## Features
## Methods

### Multiple imputation methods
- **Statistical Matching**: Distance-based matching for finding similar observations
- **Ordinary Least Squares (OLS)**: Linear regression-based imputation
- **Quantile Regression**: Distribution-aware regression imputation
- **Quantile Random Forests (QRF)**: Non-parametric forest-based approach
- **Mixture Density Networks (MDN)**: Neural network with Gaussian mixture approximation head
- **Statistical Matching**: distance-based matching to find similar donor observations
- **Ordinary Least Squares (OLS)**: linear regression imputation
- **Quantile Regression**: models conditional quantiles instead of the conditional mean
- **Quantile Random Forests (QRF)**: non-parametric, tree-based quantile estimation
- **Mixture Density Networks (MDN)**: neural network with a Gaussian mixture output

### Automated method selection
- **AutoImpute**: Automatically compares and selects the best imputation method for your data
- **Cross-validation**: Built-in evaluation using quantile loss (numerical) and log loss (categorical)
- **Variable type support**: Handles numerical, categorical, and boolean variables
## Autoimpute

### Developer-friendly design
- **Consistent API**: Standardized `fit()` and `predict()` interface across all models
- **Extensible architecture**: Easy to implement custom imputation methods
- **Weighted data handling**: Preserve data distributions with sample weights
- **Input validation**: Automatic parameter and data validation
The `autoimpute` function tunes hyperparameters, runs cross-validation across all five methods, and selects the best performer based on quantile loss (for numerical targets) or log loss (for categorical targets). It handles numerical, categorical, and boolean variables.

### Interactive dashboard
- **Visual exploration**: Analyze imputation results through interactive charts at https://microimpute-dashboard.vercel.app/
- **GitHub integration**: Load artifacts directly from CI/CD workflows
- **Multiple data sources**: File upload, URL loading and sample data
## API

All models follow a `fit()` / `predict()` interface. The package supports sample weights to account for survey design, and validates inputs automatically. Adding a custom imputation method is straightforward since new models just need to implement the same interface.

## Documentation and paper

- [Documentation](https://policyengine.github.io/microimpute/) with examples and interactive notebooks
- [Paper](https://github.com/PolicyEngine/microimpute/blob/main/paper/main.pdf) presenting microimpute and demonstrating it for SCF-to-CPS net worth imputation

## Dashboard

An interactive dashboard for exploring imputation results is available at https://microimpute-dashboard.vercel.app/. It supports file upload, URL loading, direct GitHub artifact integration, and sample data.

## Installation

```bash
pip install microimpute
```

For image export functionality (PNG/JPG), install with:
For image export (PNG/JPG):

```bash
pip install microimpute[images]
```

## Examples and documentation

For detailed examples and interactive notebooks, see the [documentation](https://policyengine.github.io/microimpute/).

## Contributing

Contributions are welcome to the project. Please feel free to submit a Pull Request with your improvements.
Pull requests are welcome. If you find a bug or have a feature idea, open an issue or submit a PR.
1 change: 1 addition & 0 deletions changelog.d/maria-paper_review.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Updated paper and package documentation with latest changes. Fix pandas 2.x compatibility for Arrow string types and dtype checks.
2 changes: 0 additions & 2 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,3 @@ parts:
- caption: Use cases
chapters:
- file: use_cases/index
sections:
- file: use_cases/scf_to_cps/imputing-from-scf-to-cps
Loading