fix: preserve created_at/updated_at on annotations imported via storage sync#9798
Open
mpesavento wants to merge 1 commit into
Open
fix: preserve created_at/updated_at on annotations imported via storage sync#9798mpesavento wants to merge 1 commit into
mpesavento wants to merge 1 commit into
Conversation
👷 Deploy request for label-studio-docs-new-theme pending review.Visit the deploys page to approve it
|
👷 Deploy request for heartex-docs pending review.Visit the deploys page to approve it
|
✅ Deploy Preview for label-studio-playground canceled.
|
✅ Deploy Preview for label-studio-storybook canceled.
|
…ge sync ImportStorage.add_task passes each annotation dict through AnnotationSerializer(...).save(). The serializer treats created_at and updated_at as read-only because of auto_now_add / auto_now on the Annotation model, so any timestamps supplied by the source JSON are silently dropped and Django overwrites both fields with now() on save. For users importing historical or migrated annotation data through a storage sync (S3, GCS, Azure, local files), this silently destroys the real annotation timestamps — the audit trail, time-based analytics, and any downstream reporting keyed on annotation time. Add an opt-in Django setting sourced from the environment variable LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS (default false, current behavior unchanged). When true: - ImportStorage.add_task uses a private _ImportAnnotationSerializer subclass that makes created_at / updated_at writable so the source values are not dropped. - Missing timestamps in the source JSON fall back to now() so suppress_autotime never writes a NULL value. - serializer.save() runs inside a suppress_autotime(Annotation, ...) context manager so auto_now_add / auto_now cannot overwrite the supplied values at the model layer. - The serializer flow is preserved end to end — AnnotationDuplicateError handling, FSM state initialisation, and feature-flag validation gates still fire. Also extract the pre-existing suppress_autotime helper out of core/old_ls_migration.py (a legacy module new code should not couple to) and into core/utils/db.py alongside the other database utilities. No behaviour change to the helper itself. Acceptance criteria: - When a user sets LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS=true and triggers a storage sync on a task JSON containing an annotation with an explicit created_at / updated_at, then the resulting Annotation row's created_at / updated_at equal the source values. - When a user sets LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS=true and the source JSON omits created_at / updated_at, then both fields on the resulting Annotation row are populated with the sync time (not NULL). - When LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS is unset or false and the source JSON contains created_at / updated_at, then both fields on the resulting Annotation row are populated with the sync time (existing behaviour, unchanged). Tests in label_studio/io_storages/tests/test_preserve_import_timestamps.py cover all three paths. Closes HumanSignal#9797
55d08ae to
981a408
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS env var (default false)
that changes ImportStorage.add_task to preserve created_at / updated_at
values supplied in the source annotation JSON, instead of overwriting
them with now() via auto_now_add / auto_now on the Annotation model.
Motivation: importing historical or migrated annotation data through a
storage sync (S3, GCS, Azure, local files) silently destroys the real
annotation timestamps — the audit trail, time-based analytics, and any
downstream reporting keyed on annotation time. Users importing legacy
data currently have no way to preserve those values.
Implementation:
core/old_ls_migration.py (a legacy module new code shouldn't couple
to) and into core/utils/db.py alongside the other database utilities.
No behavior change to the helper itself; old_ls_migration.py now
imports it from the new location.
the LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS env var.
subclass that makes created_at and updated_at writable. When the
setting is on, ImportStorage.add_task uses this subclass, pre-fills
any missing timestamps with now() to avoid NULL writes, and wraps
serializer.save() in suppress_autotime(Annotation, [...]) so
auto_now_add / auto_now don't clobber the values at the model layer.
When the setting is off, the code path is unchanged — same serializer,
same save, same validation and duplicate-detection behavior.
Adds tests covering the three code paths: setting on + timestamped
source → preserved; setting off + timestamped source → default
overwrite behavior (regression guard); setting on + missing source
timestamps → falls back to now(), never NULL.
Closes #9797