fix: Fix the sync issue between DataMate & LabelStudio; Fix the tag update issue in DataMate by JasonW404 · Pull Request #399 · ModelEngine-Group/DataMate

JasonW404 · 2026-03-03T02:02:22Z

No description provided.

…pdate issue in DataMate

Copilot

Pull request overview

This PR addresses tag-format inconsistencies and synchronization edge-cases between DataMate (DM) and Label Studio (LS), aiming to make tag updates and annotation sync more deterministic and compatible with the values-based internal tag schema.

Changes:

Normalize/merge DM file tags more robustly during partial updates (handle value→values, type/value key normalization, semantic matching).
Improve LS↔DM annotation sync orchestration by optionally syncing files first and adding payload-based forcing logic for equal timestamps.
Align schemas/tests/utilities to use values consistently and improve key/type normalization.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
runtime/datamate-python/app/module/dataset/service/service.py	Enhances tag partial-update merge logic with normalization and semantic matching.
runtime/datamate-python/app/module/dataset/schema/dataset_file.py	Enables alias population and normalizes `values`/`type` lookup in tag parsing.
runtime/datamate-python/app/module/annotation/utils/test_tag_converter.py	Updates tests to assert `values` instead of `value`.
runtime/datamate-python/app/module/annotation/utils/tag_converter.py	Emits `values` wrapper and fixes simplified/full format detection.
runtime/datamate-python/app/module/annotation/service/sync.py	Adds deterministic LS payload comparison and pre-sync file sync hooks for annotation sync paths.
runtime/datamate-python/app/module/annotation/service/ls_annotation_sync.py	Normalizes LS result keys/types and includes `to_name` in extracted tags.
runtime/datamate-python/app/module/annotation/interface/task.py	Threads `sync_files_first` through API endpoints.
runtime/datamate-python/app/module/annotation/interface/project.py	Switches manual sync-db flow to the SyncService orchestrator.
runtime/datamate-python/app/module/annotation/interface/auto.py	Switches auto sync-db flow to file-diff sync + bidirectional annotation sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T02:07:24Z

runtime/datamate-python/app/module/annotation/service/sync.py

+                                and ls_annotations
+                            ):
+                                latest_ls_result_payload = latest_ls_annotation.get("result", []) if latest_ls_annotation else []
+                                if not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):


In the equal-timestamp conflict path, ls_result is referenced before it is constructed. Specifically, the payload comparison calls _ls_result_payload_equal(ls_result, latest_ls_result_payload) before ls_result is assigned later in the function, which will raise UnboundLocalError when timestamps are equal and overwrite_ls is true. Build the DM→LS ls_result (or a normalized equivalent) before this comparison, or compare using the raw dm_tags/a helper that doesn’t depend on ls_result being defined yet.

Suggested change

if not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):

if "ls_result" in locals() and not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):

Copilot · 2026-03-03T02:07:25Z

runtime/datamate-python/app/module/dataset/service/service.py

+            def _normalize_tag(raw_tag: Dict[str, Any]) -> Dict[str, Any]:
+                normalized = dict(raw_tag)
+                if "values" not in normalized and isinstance(normalized.get("value"), dict):
+                    normalized["values"] = normalized.get("value")
+                normalized.pop("value", None)
+


_normalize_tag normalizes value→values and lower-cases type, but it doesn’t normalize camelCase field names (fromName/toName) into the canonical snake_case (from_name/to_name). Later merge/backfill logic only checks from_name/to_name, which can lead to tags being stored with both key variants (or missing the snake_case keys), making matching and downstream consumers inconsistent. Consider normalizing/remapping fromName→from_name and toName→to_name inside _normalize_tag (and dropping the camelCase keys).

Copilot · 2026-03-03T02:07:25Z

runtime/datamate-python/app/module/annotation/utils/tag_converter.py

            full_tag = {
                "id": tag_id,
                "from_name": from_name,
                "to_name": to_name,
                "type": control_type,
                "values": {
                    control_type: values
-                }
+                },
            }


The module-level docstring still describes the internal tag format as using a value wrapper, but the converter now emits/stores values (and the rest of the codebase expects values). Please update the docstring/examples to match the current internal schema to avoid misleading future readers.

Copilot · 2026-03-03T02:07:25Z

runtime/datamate-python/app/module/annotation/service/ls_annotation_sync.py

            tag = {
                "id": r.get("id"),
                "type": r_type,
                "from_name": from_name,
+                "to_name": to_name,
                "values": values,
            }


_extract_tags_from_results now includes to_name in the normalized tag payload, but the docstring above still lists the output fields as id/type/from_name/values only. Update the docstring to include to_name so callers and future maintainers have an accurate contract.

Copilot · 2026-03-03T02:07:25Z

runtime/datamate-python/app/module/annotation/interface/project.py

+    dm_client = DatasetManagementService(db)
+    sync_orchestrator = SyncService(dm_client, ls_client, mapping_service)
+    file_sync_result = await sync_orchestrator.sync_files(mapping, batch_size=50)


This endpoint now uses SyncService for manual sync, so LSAnnotationSyncService appears unused in this module (imported at the top but not referenced). Please remove the unused import to avoid lint/type-check failures and keep dependencies clear.

Copilot · 2026-03-03T02:07:26Z

runtime/datamate-python/app/module/annotation/interface/auto.py

+    # 3. 先执行文件差异同步，确保 LS 工程任务集合与当前自动标注任务文件集合一致
+    dm_client = DatasetManagementService(db)
+    sync_orchestrator = SyncService(dm_client, ls_client, mapping_service)
+


LSAnnotationSyncService is imported in this module but is no longer referenced after switching the /sync-db flow to SyncService orchestration. Please remove the unused import to avoid lint/type-check failures.

fix: Fix the sync issue between DataMate & LabelStudio; Fix the tag u…

7929061

…pdate issue in DataMate

Copilot AI review requested due to automatic review settings March 3, 2026 02:02

Copilot started reviewing on behalf of JasonW404 March 3, 2026 02:02 View session

Merge branch 'main' into fix/tagSync&Update

e7ec7b4

Copilot AI reviewed Mar 3, 2026

View reviewed changes

JasonW404 merged commit c84ad5d into main Mar 3, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix the sync issue between DataMate & LabelStudio; Fix the tag update issue in DataMate#399

fix: Fix the sync issue between DataMate & LabelStudio; Fix the tag update issue in DataMate#399
JasonW404 merged 2 commits intomainfrom
fix/tagSync&Update

JasonW404 commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):
	if "ls_result" in locals() and not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):

Conversation

JasonW404 commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants