Skip to content

fix: preserve created_at/updated_at on annotations imported via storage sync#9798

Open
mpesavento wants to merge 1 commit into
HumanSignal:developfrom
mpesavento:preserve-import-timestamps
Open

fix: preserve created_at/updated_at on annotations imported via storage sync#9798
mpesavento wants to merge 1 commit into
HumanSignal:developfrom
mpesavento:preserve-import-timestamps

Conversation

@mpesavento

Copy link
Copy Markdown

Add a LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS env var (default false)
that changes ImportStorage.add_task to preserve created_at / updated_at
values supplied in the source annotation JSON, instead of overwriting
them with now() via auto_now_add / auto_now on the Annotation model.

Motivation: importing historical or migrated annotation data through a
storage sync (S3, GCS, Azure, local files) silently destroys the real
annotation timestamps — the audit trail, time-based analytics, and any
downstream reporting keyed on annotation time. Users importing legacy
data currently have no way to preserve those values.

Implementation:

  • Extract the existing suppress_autotime helper out of
    core/old_ls_migration.py (a legacy module new code shouldn't couple
    to) and into core/utils/db.py alongside the other database utilities.
    No behavior change to the helper itself; old_ls_migration.py now
    imports it from the new location.
  • Add PRESERVE_IMPORT_TIMESTAMPS to core/settings/base.py, sourced from
    the LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS env var.
  • In io_storages/base_models.py, add a private _ImportAnnotationSerializer
    subclass that makes created_at and updated_at writable. When the
    setting is on, ImportStorage.add_task uses this subclass, pre-fills
    any missing timestamps with now() to avoid NULL writes, and wraps
    serializer.save() in suppress_autotime(Annotation, [...]) so
    auto_now_add / auto_now don't clobber the values at the model layer.
    When the setting is off, the code path is unchanged — same serializer,
    same save, same validation and duplicate-detection behavior.

Adds tests covering the three code paths: setting on + timestamped
source → preserved; setting off + timestamped source → default
overwrite behavior (regression guard); setting on + missing source
timestamps → falls back to now(), never NULL.

Closes #9797

@netlify

netlify Bot commented Jul 2, 2026

Copy link
Copy Markdown

👷 Deploy request for label-studio-docs-new-theme pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 981a408

@netlify

netlify Bot commented Jul 2, 2026

Copy link
Copy Markdown

👷 Deploy request for heartex-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 981a408

@netlify

netlify Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploy Preview for label-studio-playground canceled.

Name Link
🔨 Latest commit 981a408
🔍 Latest deploy log https://app.netlify.com/projects/label-studio-playground/deploys/6a46f1e34afd0900082b3a2d

@netlify

netlify Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploy Preview for label-studio-storybook canceled.

Name Link
🔨 Latest commit 981a408
🔍 Latest deploy log https://app.netlify.com/projects/label-studio-storybook/deploys/6a46f1e3606ad600083abba5

@github-actions github-actions Bot added the feat label Jul 2, 2026
@mpesavento mpesavento changed the title feat: opt-in preservation of imported annotation timestamps fix: preserve created_at/updated_at on annotations imported via storage sync Jul 2, 2026
@github-actions github-actions Bot added the fix label Jul 2, 2026
…ge sync

ImportStorage.add_task passes each annotation dict through
AnnotationSerializer(...).save(). The serializer treats created_at and
updated_at as read-only because of auto_now_add / auto_now on the
Annotation model, so any timestamps supplied by the source JSON are
silently dropped and Django overwrites both fields with now() on save.

For users importing historical or migrated annotation data through a
storage sync (S3, GCS, Azure, local files), this silently destroys the
real annotation timestamps — the audit trail, time-based analytics,
and any downstream reporting keyed on annotation time.

Add an opt-in Django setting sourced from the environment variable
LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS (default false, current
behavior unchanged). When true:

- ImportStorage.add_task uses a private _ImportAnnotationSerializer
  subclass that makes created_at / updated_at writable so the source
  values are not dropped.
- Missing timestamps in the source JSON fall back to now() so
  suppress_autotime never writes a NULL value.
- serializer.save() runs inside a suppress_autotime(Annotation, ...)
  context manager so auto_now_add / auto_now cannot overwrite the
  supplied values at the model layer.
- The serializer flow is preserved end to end — AnnotationDuplicateError
  handling, FSM state initialisation, and feature-flag validation gates
  still fire.

Also extract the pre-existing suppress_autotime helper out of
core/old_ls_migration.py (a legacy module new code should not couple
to) and into core/utils/db.py alongside the other database utilities.
No behaviour change to the helper itself.

Acceptance criteria:
- When a user sets LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS=true and
  triggers a storage sync on a task JSON containing an annotation with
  an explicit created_at / updated_at, then the resulting Annotation
  row's created_at / updated_at equal the source values.
- When a user sets LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS=true and
  the source JSON omits created_at / updated_at, then both fields on
  the resulting Annotation row are populated with the sync time (not
  NULL).
- When LABEL_STUDIO_PRESERVE_IMPORT_TIMESTAMPS is unset or false and
  the source JSON contains created_at / updated_at, then both fields
  on the resulting Annotation row are populated with the sync time
  (existing behaviour, unchanged).

Tests in label_studio/io_storages/tests/test_preserve_import_timestamps.py
cover all three paths.

Closes HumanSignal#9797
@mpesavento mpesavento force-pushed the preserve-import-timestamps branch from 55d08ae to 981a408 Compare July 2, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve created_at / updated_at on annotations imported via storage sync

1 participant