Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions medcat-v2/.readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,19 @@
version: 2

build:
os: ubuntu-20.04
os: "ubuntu-24.04"
tools:
python: "3.10"
python: "3.13"
jobs:
pre_create_environment:
- cd medcat-v2/docs
- asdf plugin add uv
- asdf install uv latest
- asdf global uv latest
create_environment:
- uv venv "${READTHEDOCS_VIRTUALENV_PATH}"
install:
- cd medcat-v2/docs && UV_PROJECT_ENVIRONMENT="${READTHEDOCS_VIRTUALENV_PATH}" uv sync --frozen

sphinx:
configuration: medcat-v2/docs/conf.py

python:
install:
- requirements: medcat-v2/docs/requirements.txt
- method: pip
path: medcat-v2/
mkdocs:
configuration: medcat-v2/mkdocs.yml
23 changes: 0 additions & 23 deletions medcat-v2/docs/Makefile

This file was deleted.

25 changes: 0 additions & 25 deletions medcat-v2/docs/_static/css/overrides.css

This file was deleted.

Binary file added medcat-v2/docs/_static/img/cat-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
73 changes: 73 additions & 0 deletions medcat-v2/docs/_static/img/cat-logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 12 additions & 9 deletions medcat-v2/docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,18 @@ MedCAT is built on a flexible, registry-based architecture that allows you to cu
**Components** are the building blocks of MedCAT. They fall into two categories:

- **Core components**: Essential components that provide entity recognition and linking
- **NER** (Named Entity Recognition): Identifies medical entities in text
- **Linker**: Links identified entities to concepts in your medical database (CDB)
- Also: Token normalizers and taggers
- **NER** (Named Entity Recognition): Identifies medical entities in text
- **Linker**: Links identified entities to concepts in your medical database (CDB)
- Also: Token normalizers and taggers

- **Addon components**: Optional components that add functionality beyond NER and linking
- **MetaCAT**: Adds meta-annotation (e.g., experiencer, negation, temporality)
- **RelCAT**: Extracts relationships between entities
- Custom addons for domain-specific tasks
- **MetaCAT**: Adds meta-annotation (e.g., experiencer, negation, temporality)
- **RelCAT**: Extracts relationships between entities
- Custom addons for domain-specific tasks

### Registry System
All components are registered in a central registry. This means you can:

- Swap out default implementations with your own
- Choose between multiple NER or linking strategies
- Add custom processing stages to the pipeline
Expand All @@ -32,9 +33,9 @@ MedCAT v2 also includes a **curated plugin catalog** and an **installer**:

- `medcat.plugins.catalog.PluginCatalog` maintains a list of known plugins, their metadata, and MedCAT compatibility rules (e.g. “this plugin supports `>=2.5.0,<3.0.0`”).
- `medcat.plugins.installer.PluginInstallationManager` uses that catalog to select a compatible version and install it (currently via `pip`), with support for:
- PyPI packages
- Git repositories (including subdirectories such as monorepo layouts)
- Direct URLs (e.g. wheels or tarballs)
- PyPI packages
- Git repositories (including subdirectories such as monorepo layouts)
- Direct URLs (e.g. wheels or tarballs)

The curated catalog can be updated from a remote JSON file, and plugins can be installed either programmatically or via the `python -m medcat plugins install ...` CLI.

Expand Down Expand Up @@ -375,6 +376,7 @@ This is **not** required when creating a new model pack from scratch.
### Component Dependencies

Components can depend on each other:

- **Linkers** receive entities from NER as input
- **Addons** receive fully annotated documents from NER + Linker
- All components receive the tokenizer, CDB, and vocab
Expand All @@ -394,6 +396,7 @@ class MyNER(AbstractEntityProvidingComponent):
### Error Handling

Components should handle errors gracefully:

- Return empty lists rather than raising exceptions when no entities are found
- Log warnings for configuration issues
- Validate inputs in `create_new_component()`
Expand Down
1 change: 1 addition & 0 deletions medcat-v2/docs/breaking_changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Though do note, that only the major API-level changes will be listed.

Training is now separated from the main `CAT` class into its own class (`Trainer`) and module (`trainer.py`).
This affects the following methods (assumption is that `cat` is an instance of `CAT`):

| v1 method | v2 method |
| --------------------------- | ---------------------------------- |
| `cat.train` | `cat.trainer.train_unsupervised` |
Expand Down
106 changes: 0 additions & 106 deletions medcat-v2/docs/conf.py

This file was deleted.

5 changes: 3 additions & 2 deletions medcat-v2/docs/main.md → medcat-v2/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
**There's a number of breaking changes in MedCAT v2 compared to v1.**
Details are outlined [here](breaking_changes.md).

[![Build Status](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml)](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml)
[![Build Status](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml/badge.svg?branch=main)](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml/badge.svg?branch=main)
[![Documentation Status](https://readthedocs.org/projects/cogstack-nlp/badge/?version=latest)](https://readthedocs.org/projects/cogstack-nlp/badge/?version=latest)
[![Latest release](https://img.shields.io/github/v/release/CogStack/cogstack-nlp?filter=medcat/*)](https://github.com/CogStack/cogstack-nlp/releases/latest)
[![pypi Version](https://img.shields.io/pypi/v/medcat.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/medcat/)
Expand Down Expand Up @@ -83,9 +83,11 @@ Access to v2 models is upcoming. They will initially (probably) be converted mod
A basic trained model is made public. It contains ~ 35K concepts available in `MedMentions`. This was compiled from MedMentions and does not have any data from [NLM](https://www.nlm.nih.gov/research/umls/) as that data is not publicaly available.

Model packs:

- MedMentions with Status (Is Concept Affirmed or Negated/Hypothetical) [Download](https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/medmen_wstatus_2021_oct.zip)

Separate models:

- Vocabulary [Download](https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/vocab.dat) - Built from MedMentions
- CDB [Download](https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/cdb-medmen-v1.dat) - Built from MedMentions
- MetaCAT Status [Download](https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/mc_status.zip) - Built from a sample from MIMIC-III, detects is an annotation Affirmed (Positve) or Other (Negated or Hypothetical) -->
Expand All @@ -95,7 +97,6 @@ Entity extraction was trained on [MedMentions](https://github.com/chanzuckerberg

The vocabulary was compiled from [Wiktionary](https://en.wiktionary.org/wiki/Wiktionary:Main_Page) In total ~ 800K unique words


## Powered By
A big thank you goes to [spaCy](https://spacy.io/) and [Hugging Face](https://huggingface.co/) - who made life a million times easier.

Expand Down
20 changes: 0 additions & 20 deletions medcat-v2/docs/index.rst

This file was deleted.

Loading
Loading