Skip to content

fix: Fix the sync issue between DataMate & LabelStudio; Fix the tag update issue in DataMate#399

Merged
JasonW404 merged 2 commits intomainfrom
fix/tagSync&Update
Mar 3, 2026
Merged

fix: Fix the sync issue between DataMate & LabelStudio; Fix the tag update issue in DataMate#399
JasonW404 merged 2 commits intomainfrom
fix/tagSync&Update

Conversation

@JasonW404
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings March 3, 2026 02:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses tag-format inconsistencies and synchronization edge-cases between DataMate (DM) and Label Studio (LS), aiming to make tag updates and annotation sync more deterministic and compatible with the values-based internal tag schema.

Changes:

  • Normalize/merge DM file tags more robustly during partial updates (handle valuevalues, type/value key normalization, semantic matching).
  • Improve LS↔DM annotation sync orchestration by optionally syncing files first and adding payload-based forcing logic for equal timestamps.
  • Align schemas/tests/utilities to use values consistently and improve key/type normalization.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
runtime/datamate-python/app/module/dataset/service/service.py Enhances tag partial-update merge logic with normalization and semantic matching.
runtime/datamate-python/app/module/dataset/schema/dataset_file.py Enables alias population and normalizes values/type lookup in tag parsing.
runtime/datamate-python/app/module/annotation/utils/test_tag_converter.py Updates tests to assert values instead of value.
runtime/datamate-python/app/module/annotation/utils/tag_converter.py Emits values wrapper and fixes simplified/full format detection.
runtime/datamate-python/app/module/annotation/service/sync.py Adds deterministic LS payload comparison and pre-sync file sync hooks for annotation sync paths.
runtime/datamate-python/app/module/annotation/service/ls_annotation_sync.py Normalizes LS result keys/types and includes to_name in extracted tags.
runtime/datamate-python/app/module/annotation/interface/task.py Threads sync_files_first through API endpoints.
runtime/datamate-python/app/module/annotation/interface/project.py Switches manual sync-db flow to the SyncService orchestrator.
runtime/datamate-python/app/module/annotation/interface/auto.py Switches auto sync-db flow to file-diff sync + bidirectional annotation sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

and ls_annotations
):
latest_ls_result_payload = latest_ls_annotation.get("result", []) if latest_ls_annotation else []
if not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the equal-timestamp conflict path, ls_result is referenced before it is constructed. Specifically, the payload comparison calls _ls_result_payload_equal(ls_result, latest_ls_result_payload) before ls_result is assigned later in the function, which will raise UnboundLocalError when timestamps are equal and overwrite_ls is true. Build the DM→LS ls_result (or a normalized equivalent) before this comparison, or compare using the raw dm_tags/a helper that doesn’t depend on ls_result being defined yet.

Suggested change
if not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):
if "ls_result" in locals() and not self._ls_result_payload_equal(ls_result, latest_ls_result_payload):

Copilot uses AI. Check for mistakes.
Comment on lines +318 to +323
def _normalize_tag(raw_tag: Dict[str, Any]) -> Dict[str, Any]:
normalized = dict(raw_tag)
if "values" not in normalized and isinstance(normalized.get("value"), dict):
normalized["values"] = normalized.get("value")
normalized.pop("value", None)

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_normalize_tag normalizes valuevalues and lower-cases type, but it doesn’t normalize camelCase field names (fromName/toName) into the canonical snake_case (from_name/to_name). Later merge/backfill logic only checks from_name/to_name, which can lead to tags being stored with both key variants (or missing the snake_case keys), making matching and downstream consumers inconsistent. Consider normalizing/remapping fromNamefrom_name and toNameto_name inside _normalize_tag (and dropping the camelCase keys).

Copilot uses AI. Check for mistakes.
Comment on lines 141 to 149
full_tag = {
"id": tag_id,
"from_name": from_name,
"to_name": to_name,
"type": control_type,
"values": {
control_type: values
}
},
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docstring still describes the internal tag format as using a value wrapper, but the converter now emits/stores values (and the rest of the codebase expects values). Please update the docstring/examples to match the current internal schema to avoid misleading future readers.

Copilot uses AI. Check for mistakes.
Comment on lines 220 to 226
tag = {
"id": r.get("id"),
"type": r_type,
"from_name": from_name,
"to_name": to_name,
"values": values,
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_tags_from_results now includes to_name in the normalized tag payload, but the docstring above still lists the output fields as id/type/from_name/values only. Update the docstring to include to_name so callers and future maintainers have an accurate contract.

Copilot uses AI. Check for mistakes.
Comment on lines +634 to +636
dm_client = DatasetManagementService(db)
sync_orchestrator = SyncService(dm_client, ls_client, mapping_service)
file_sync_result = await sync_orchestrator.sync_files(mapping, batch_size=50)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint now uses SyncService for manual sync, so LSAnnotationSyncService appears unused in this module (imported at the top but not referenced). Please remove the unused import to avoid lint/type-check failures and keep dependencies clear.

Copilot uses AI. Check for mistakes.
Comment on lines +1285 to +1288
# 3. 先执行文件差异同步,确保 LS 工程任务集合与当前自动标注任务文件集合一致
dm_client = DatasetManagementService(db)
sync_orchestrator = SyncService(dm_client, ls_client, mapping_service)

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LSAnnotationSyncService is imported in this module but is no longer referenced after switching the /sync-db flow to SyncService orchestration. Please remove the unused import to avoid lint/type-check failures.

Copilot uses AI. Check for mistakes.
@JasonW404 JasonW404 merged commit c84ad5d into main Mar 3, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants