mite_extras

Overview

MITE (Minimum Information about a Tailoring Enzyme) is a community-driven database for the characterization of tailoring enzymes. These enzymes play crucial roles in the biosynthesis of secondary or specialized metabolites. These naturally occurring molecules often show strong biological activities, and many drugs (e.g. antibiotics) derive from them.

This repository contains the single source of truth of the Minimum Information about a Tailoring Enzyme (MITE) database.

For more information, visit the MITE Data Standard Organization page or read our publication.

For feature requests and suggestions, please refer to the MITE Discussion forum.

For simple data submissions, please refer to the MITE web portal. For more complex or large-scale submission, please get in touch with us by e.g. opening an Issue.

MITE Accession Reservation

You can reserve MITE Accession IDs for your to-be-published manuscript. Please read more about it in this discussion.

Documentation

This repository contains the single source of truth of the MITE database, as well as derived data artifacts.

This data is in the form of JSON files controlled by mite_schema These files are created by user submissions via the MITE web portal. Upon submission, entries are automatically checked with mite_extras library and a new pull request is created.

After user submission, our domain expert reviewers check then entries and approve the pull requests. Next, automatic checks are performed to check entries and automatically create the derived artifacts.

Upon release of a new version, data is automatically backed up in its Zenodo repository, from where it is used by other sources (e.g. MITE Web)

Attribution

License

All code and data in mite_data is released to the public domain under the CC0 license (see LICENSE).

Publications

See CITATION.cff or MITE online for information on citing MITE.

Acknowledgements

This work was supported by the Netherlands Organization for Scientific Research (NWO) KIC grant KICH1.LWV04.21.013.

For Developers

Release checklist

Workflow for release creation (for details, see below):

Update version in pyproject.toml file (major is reserved to manuscript publications)
On new release, fill in tag (as version in pyproject.toml), add vversion as release title, and add release notes (identical to changelog)
IMPORTANT: SAVE AS DRAFT - this will automatically trigger the release workflow that also performs the CI/CD checks
DO NOT PUBLISH RELEASE MANUALLY

Background

The repo consists of a data part (mite_data) and an associated validation library (mite_data_lib).

A number of pipelines is available to perform fully automated data validation and artifact validation and generation.

Additionally, a json file allows to track reserved MITE accessions.

repo/
├ mite_data/      <-- source of truth
|     ├ data/     <-- MITE JSON entries
|     ├ fasta/    <-- FASTA files related to MITE entries
|     ├ metadata/ <-- Artifacts created from MITE entries
|     └ mibig/    <-- Artifacts created from MIBiG dataset 
├ reserved/       <-- Reserved MITE accessions
├ mite_data_lib/  <-- Validation library
|     ├ config/   <-- Library-wide configuration settings
|     ├ models/   <-- (Pydantic) data contracts
|     ├ rules/    <-- Validation rules
|     └ services/ <-- Artifact generation
└ pipeline/       <-- Data processing pipelines

CI/CD

To preserve data integrity, this repository implements several stages of CI/CD (continuous integration/continuous deployment) using GitHub Actions. These actions are tiggered automatically and perform validation and artifact generation in a stepwise manner, as described below.

Pull request (affecting mite_data/data)   <-- User contribution
├ pipeline/validate_mibig.py              <-- Reference dataset validation
├ pipeline/validate_entry.py              <-- MITE entry validation
Commit to main (affecting mite_data/data) <-- PR merge by maintainer
├ pipeline/create_artifacts.py            <-- Artifact creation
New release                               <-- By maintainer
└ pipeline/validate_artifacts.py          <-- Validate artifacts + entries

Every PR affecting the mite_data/data directory automatically triggers data validation functions. Only if these pass, the PR may be merged into main.

Every commit to main affecting the mite_data/data directory automatically triggers artifact creation. These artifacts are automatically added to main to reflect the updated file.

Every new release triggers the artifact and entry validation pipeline. This step is computationally expensive but provides a sanity check.

If the MIBiG validation check does not pass, the MIBiG dataset needs to be updated manually (see below)

Manual execution/development

For development purposes, pipelines can also be run automatically. For this, local installation is required, as follows:

Nota bene: local installation was only tested on Ubuntu Linux 20.04 and 22.04. Also assumes that uv is installed locally - see the methods described here

Download and install

git clone https://github.com/mite-standard/mite_data
uv sync
uv run pre-commit install

Run tests

uv run pytest --download # includes more time-consuming tests with network calls

Run pipelines

uv run python pipeline/validate_mibig.py                          # Checks if MIBIG Ref is valid
uv run python pipeline/create_mibig.py                            # Downloads MIBiG Ref dataset
uv run python pipeline/validate_entry.py entry1.json ...          # Checks entries
uv run python pipeline/create_artifacts_single.py entry1.json ... # Creates artifacts in single entry mode
uv run python pipeline/create_artifacts_all.py                    # Re-creates all artifacts (expensive!)
uv run python validate_artifacts.py                               # Validates artifacts

Adding new rules

All rules follow a standardized API.

Add your rule to rules. Follow the interface of the existing functions.
Update the corresponding pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 720 Commits
.github/workflows		.github/workflows
mite_data		mite_data
mite_data_lib		mite_data_lib
pipeline		pipeline
reserved		reserved
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mite_extras

Contents

Overview

MITE Accession Reservation

Documentation

Attribution

License

Publications

Acknowledgements

For Developers

Release checklist

Background

CI/CD

Manual execution/development

Adding new rules

About

Uh oh!

Releases 22

Packages

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

License

mite-standard/mite_data

Folders and files

Latest commit

History

Repository files navigation

mite_extras

Contents

Overview

MITE Accession Reservation

Documentation

Attribution

License

Publications

Acknowledgements

For Developers

Release checklist

Background

CI/CD

Manual execution/development

Adding new rules

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

Packages