feat(medcat-trainer): Startup Provisioning feature to initialize from config by alhendrickson · Pull Request #358 · CogStack/cogstack-nlp

alhendrickson · 2026-03-02T17:02:54Z

What does this do

You can now provide a yaml file with details of projects to create on startup. "Provisioning" appears to be the term for this.

This is a reworking of the exisitng load_examples.py . Most of it is actually the same just moved around.

By default it should all behave the exact same way as it does today, pulling files from s3.

Why do this?

This allows trainer to be setup declaratively instead of needing manual steps. The primary aim for this is using it in kubernetes - I can have a one line install that runs trainer, and sets up projects already linked to medcat services. The yaml for the projects will be defined in a values.yaml file

Early days for this though. Next steps may be to also provision users, and allow all the parameters projects etc can use. Plan would be to repeat this pattern in other apps/services too.

If things get out of hand, making an "operator" is probably the end goal here - create yaml & it then magically would create projects.

… config - refactor load_examples & test

… config

… config - add

… config - add docs

…m config - fix test

…m config - docs

…m config - fix run.sh

…m config - fix test

mart-r

Nothing wrong with it as far as I can tell.

But traditionally, projects have comprised of datasets and model packs (and now links to services that do the same thing). And the latter two have been independent of the projects. I.e normally a trainer instance has a small number of model packs and a large number of datasets. And then each project uses a model pack (or now a URL) + a dataset.
Maybe there's a reason to tie them together for provisioning, but it just seems to go counter to my normal understanding of how the trainer is used.

EDIT:
For reference, normally "we" provide a set of model packs (or back in the day, CDBs), a clinician separately provides the dataset(s). And then the clinician (or someone else on their instruction) creates the project(s) to be annotated.
EDIT2:
But I'm not working with trainer super tightly so it's possible my understanding of these workflows is flawed.

mart-r · 2026-03-03T09:55:16Z

medcat-trainer/webapp/scripts/provisioning/model.py

+    )
+
+
+class ProvisioningProjectSpec(BaseModel):


In principle, there is no real reason to tie these things together as far as a I can tell.
In fact, normally you would upload one model pack and link it to multiple annotations projects (effectively datasets). And you can also have multiple projects use the same dataset(s). And you can (though don't know if this is common) have multiple projects use the same datasets.

That makes a lot of sense! I think this will be good to have to make it much more useful. https://app.clickup.com/t/869cba2h9

mart-r · 2026-03-03T09:59:35Z

medcat-trainer/webapp/scripts/provisioning/model.py

+class ProjectSpec(BaseModel):
+    """Project to create via project-annotate-entities/."""
+
+    model_config = _common_config


I think this will work because pydantic does some magic and makes different instances of the comon config.
However, from an initial view it would seem that this ties all the models to the same config instance (which would suggest changing one config chnages them all).

Just a note, really.

Will fix this up shortly. Aim was to just not repeat the to_camel part essentially https://app.clickup.com/t/869cba2m6](https://app.clickup.com/t/869cba2m6)

alhendrickson added 10 commits March 2, 2026 15:51

feat(medcat-trainer): Startup Provisioning feature to initialize from…

38f154a

… config - refactor load_examples & test

feat(medcat-trainer): Startup Provisioning feature to initialize from…

81e9104

… config

feat(medcat-trainer): Startup Provisioning feature to initialize from…

fa828eb

… config - add

feat(medcat-trainer): Startup Provisioning feature to initialize from…

14e9a94

… config - add docs

feat(medcat-trainer): Startup Provisioning feature to initialize from…

91c7392

… config - add docs

feat(medcat-trainer): Startup Provisioning feature to initialize fro…

2f9656b

…m config - fix test

feat(medcat-trainer): Startup Provisioning feature to initialize fro…

62a0260

…m config - fix test

feat(medcat-trainer): Startup Provisioning feature to initialize fro…

edf448b

…m config - docs

feat(medcat-trainer): Startup Provisioning feature to initialize fro…

fe39432

…m config - fix run.sh

feat(medcat-trainer): Startup Provisioning feature to initialize fro…

1895e38

…m config - fix test

mart-r approved these changes Mar 3, 2026

View reviewed changes

alhendrickson merged commit be9825f into main Mar 3, 2026
10 checks passed

alhendrickson deleted the feat/medcat-trainer/startup-provisioning branch March 3, 2026 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(medcat-trainer): Startup Provisioning feature to initialize from config#358

feat(medcat-trainer): Startup Provisioning feature to initialize from config#358
alhendrickson merged 10 commits intomainfrom
feat/medcat-trainer/startup-provisioning

alhendrickson commented Mar 2, 2026 •

edited

Loading

Uh oh!

mart-r left a comment •

edited

Loading

Uh oh!

mart-r Mar 3, 2026

Uh oh!

alhendrickson Mar 3, 2026

Uh oh!

mart-r Mar 3, 2026

Uh oh!

alhendrickson Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		)


		class ProvisioningProjectSpec(BaseModel):

Conversation

alhendrickson commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this do

Why do this?

Uh oh!

mart-r left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mart-r Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

alhendrickson Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mart-r Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

alhendrickson Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alhendrickson commented Mar 2, 2026 •

edited

Loading

mart-r left a comment •

edited

Loading