MITE (Minimum Information about a Tailoring Enzyme) is a community-driven database for the characterization of tailoring enzymes. These enzymes play crucial roles in the biosynthesis of secondary or specialized metabolites. These naturally occurring molecules often show strong biological activities, and many drugs (e.g. antibiotics) derive from them.
This repository contains the single source of truth of the Minimum Information about a Tailoring Enzyme (MITE) database.
For more information, visit the MITE Data Standard Organization page or read our publication.
For feature requests and suggestions, please refer to the MITE Discussion forum.
For simple data submissions, please refer to the MITE web portal. For more complex or large-scale submission, please get in touch with us by e.g. opening an Issue.
You can reserve MITE Accession IDs for your to-be-published manuscript. Please read more about it in this discussion.
This repository contains the single source of truth of the MITE database, as well as derived data artifacts.
This data is in the form of JSON files controlled by mite_schema These files are created by user submissions via the MITE web portal. Upon submission, entries are automatically checked with mite_extras library and a new pull request is created.
After user submission, our domain expert reviewers check then entries and approve the pull requests. Next, automatic checks are performed to check entries and automatically create the derived artifacts.
Upon release of a new version, data is automatically backed up in its Zenodo repository, from where it is used by other sources (e.g. MITE Web)
All code and data in mite_data is released to the public domain under the CC0 license (see LICENSE).
See CITATION.cff or MITE online for information on citing MITE.
This work was supported by the Netherlands Organization for Scientific Research (NWO) KIC grant KICH1.LWV04.21.013.
Workflow for release creation (for details, see below):
- Update version in pyproject.toml file (major is reserved to manuscript publications)
- On new release, fill in tag (as
versioninpyproject.toml), add vversionas release title, and add release notes (identical tochangelog) - IMPORTANT: SAVE AS DRAFT - this will automatically trigger the release workflow that also performs the CI/CD checks
- DO NOT PUBLISH RELEASE MANUALLY
The repo consists of a data part (mite_data) and an associated validation library (mite_data_lib).
A number of pipelines is available to perform fully automated data validation and artifact validation and generation.
Additionally, a json file allows to track reserved MITE accessions.
repo/
├ mite_data/ <-- source of truth
| ├ data/ <-- MITE JSON entries
| ├ fasta/ <-- FASTA files related to MITE entries
| ├ metadata/ <-- Artifacts created from MITE entries
| └ mibig/ <-- Artifacts created from MIBiG dataset
├ reserved/ <-- Reserved MITE accessions
├ mite_data_lib/ <-- Validation library
| ├ config/ <-- Library-wide configuration settings
| ├ models/ <-- (Pydantic) data contracts
| ├ rules/ <-- Validation rules
| └ services/ <-- Artifact generation
└ pipeline/ <-- Data processing pipelines
To preserve data integrity, this repository implements several stages of CI/CD (continuous integration/continuous deployment) using GitHub Actions. These actions are tiggered automatically and perform validation and artifact generation in a stepwise manner, as described below.
Pull request (affecting mite_data/data) <-- User contribution
├ pipeline/validate_mibig.py <-- Reference dataset validation
├ pipeline/validate_entry.py <-- MITE entry validation
Commit to main (affecting mite_data/data) <-- PR merge by maintainer
├ pipeline/create_artifacts.py <-- Artifact creation
New release <-- By maintainer
└ pipeline/validate_artifacts.py <-- Validate artifacts + entries
Every PR affecting the mite_data/data directory automatically triggers data validation functions.
Only if these pass, the PR may be merged into main.
Every commit to main affecting the mite_data/data directory automatically triggers artifact creation.
These artifacts are automatically added to main to reflect the updated file.
Every new release triggers the artifact and entry validation pipeline. This step is computationally expensive but provides a sanity check.
If the MIBiG validation check does not pass, the MIBiG dataset needs to be updated manually (see below)
For development purposes, pipelines can also be run automatically. For this, local installation is required, as follows:
Nota bene: local installation was only tested on Ubuntu Linux 20.04 and 22.04. Also assumes that uv is installed locally - see the methods described here
- Download and install
git clone https://github.com/mite-standard/mite_data
uv sync
uv run pre-commit install
- Run tests
uv run pytest --download # includes more time-consuming tests with network calls
- Run pipelines
uv run python pipeline/validate_mibig.py # Checks if MIBIG Ref is valid
uv run python pipeline/create_mibig.py # Downloads MIBiG Ref dataset
uv run python pipeline/validate_entry.py entry1.json ... # Checks entries
uv run python pipeline/create_artifacts_single.py entry1.json ... # Creates artifacts in single entry mode
uv run python pipeline/create_artifacts_all.py # Re-creates all artifacts (expensive!)
uv run python validate_artifacts.py # Validates artifacts
All rules follow a standardized API.