RUM-767: Improve reading-errors telemetry for dropped events#3598
RUM-767: Improve reading-errors telemetry for dropped events#3598satween wants to merge 2 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #3598 +/- ##
===========================================
- Coverage 72.85% 72.78% -0.07%
===========================================
Files 975 975
Lines 35277 35308 +31
Branches 5972 5983 +11
===========================================
- Hits 25700 25697 -3
- Misses 7915 7931 +16
- Partials 1662 1680 +18
🚀 New features to boost your workflow:
|
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR improves internal telemetry for persistence-layer drops and read/write errors by introducing consistent attribute keys (e.g., feature.name, event.dropped_bytes) and threading feature context through the persistence stack so telemetry is easier to query across features and code paths.
Changes:
- Added
TelemetryAttributes(internal module) and updated API surface files accordingly. - Propagated
featureName/size context through core persistence components (FileEventBatchWriter,BatchFileReaderWriter/PlainBatchFileReaderWriter,SdkFeature,CoreFeature) and updated unit tests. - Added a max-item-size drop path in
RumDataWriterwith telemetry/user logging and expanded RUM unit tests for size-exceeded scenarios.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| features/dd-sdk-android-rum/src/test/kotlin/com/datadog/android/rum/internal/domain/RumDataWriterTest.kt | Adds unit coverage for RUM events dropped when serialized payload exceeds maxItemSize. |
| features/dd-sdk-android-rum/src/main/kotlin/com/datadog/android/rum/internal/RumFeature.kt | Wires storageConfiguration.maxItemSize into RumDataWriter. |
| features/dd-sdk-android-rum/src/main/kotlin/com/datadog/android/rum/internal/domain/RumDataWriter.kt | Adds size-limit drop handling with user+telemetry logging and dropped-bytes attribute. |
| dd-sdk-android-internal/src/main/java/com/datadog/android/internal/telemetry/TelemetryAttributes.kt | Introduces shared telemetry attribute keys for persistence diagnostics. |
| dd-sdk-android-internal/api/dd-sdk-android-internal.api | API dump updated for the new TelemetryAttributes public type. |
| dd-sdk-android-internal/api/apiSurface | API surface updated to include TelemetryAttributes. |
| dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlockTest.kt | Updates expectations for TLV serialization oversize logging targets/level. |
| dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlockFileReaderTest.kt | Adds/updates tests for invalid TLV declared length handling and logging. |
| dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/FileEventBatchWriterTest.kt | Updates tests for new telemetry targets and added dropped-bytes/feature attributes. |
| dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/file/batch/PlainBatchFileReaderWriterTest.kt | Updates tests for new constructor signature and feature-name telemetry attributes. |
| dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/file/batch/BatchFileReaderWriterTest.kt | Updates tests for create(..., featureName) signature. |
| dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/CoreFeatureTest.kt | Updates usage of BatchFileReaderWriter.create to include feature name. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/SdkFeature.kt | Threads featureName into persistence reader/writer creation. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlockFileReader.kt | Adds declared-length validation with telemetry/user error logging. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlock.kt | Changes oversize TLV serialization log to ERROR and USER+TELEMETRY; exposes size constant internally. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/FileEventBatchWriter.kt | Adds featureName and attaches feature.name + event.dropped_bytes on oversize drops. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/file/batch/PlainBatchFileReaderWriter.kt | Adds featureName and attaches it to read-side telemetry errors/warnings. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/file/batch/BatchFileReaderWriter.kt | Extends factory to require featureName and passes it down. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/ConsentAwareStorage.kt | Passes featureName into FileEventBatchWriter. |
| dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/CoreFeature.kt | Ensures core-created batch reader/writers include Feature.RUM_FEATURE_NAME. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| additionalProperties = mapOf( | ||
| RUM_EVENT_TYPE to rumEventType, | ||
| TelemetryAttributes.EVENT_DROPPED_BYTES to byteArray.size | ||
| ) |
| } | ||
|
|
||
| @Test | ||
| fun `M return empty array and warn W read() { declared block length exceeds limit }`( |
| } | ||
|
|
||
| @Test | ||
| fun `M return empty array and warn W read() { declared block length is negative }`( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cfc0a6226a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| additionalProperties = mapOf( | ||
| RUM_EVENT_TYPE to rumEventType, | ||
| TelemetryAttributes.EVENT_DROPPED_BYTES to byteArray.size | ||
| ) |
There was a problem hiding this comment.
Add feature name to RUM drop telemetry
When a serialized RUM event exceeds maxItemSize, this branch returns before FileEventBatchWriter.write() runs, so the telemetry emitted here is the only drop signal for RUM. Unlike the shared storage path inspected in FileEventBatchWriter, this map omits TelemetryAttributes.FEATURE_NAME, so dashboards or alerts querying feature.name will miss RUM oversized-event drops even though logs/traces include that attribute.
Useful? React with 👍 / 👎.
| level = InternalLogger.Level.ERROR, | ||
| targets = listOf(InternalLogger.Target.USER, InternalLogger.Target.TELEMETRY), | ||
| messageBuilder = { WARNING_NOT_ALL_DATA_READ.format(Locale.US, file.path) }, | ||
| additionalProperties = mapOf(TelemetryAttributes.FEATURE_NAME to featureName) |
There was a problem hiding this comment.
Report dropped bytes for corrupted batch reads
When a batch file is truncated or corrupt and this branch logs WARNING_NOT_ALL_DATA_READ, the unread bytes represented by remaining are the data that will be lost once the partially read batch is confirmed/deleted. The new write-side oversized-event telemetry includes event.dropped_bytes, but this read-side drop signal only has feature.name, so read corruption remains unqueryable by loss size despite the value being available here.
Useful? React with 👍 / 👎.
| targets = listOf(InternalLogger.Target.USER, InternalLogger.Target.TELEMETRY), | ||
| messageBuilder = { | ||
| CORRUPT_DATA_LENGTH_ERROR.format(Locale.US, maxEntrySize, lengthData) | ||
| } |
There was a problem hiding this comment.
Include feature name on TLV datastore telemetry
When a feature's DataStore file has an invalid declared TLV length, this new telemetry error is emitted without TelemetryAttributes.FEATURE_NAME, even though SdkFeature.prepareDataStoreHandler constructs the reader per wrappedFeature.name. As a result, corrupt datastore-read drops cannot be attributed to RUM/logs/tracing/session-replay in telemetry queries, unlike the batch read/write telemetry added in this change.
Useful? React with 👍 / 👎.
What does this PR do?
Improves internal telemetry for events dropped during file read/write on the persistence layer:
TelemetryAttributes(feature.name,event.dropped_bytes) so dropped-event error telemetry carries consistent, queryable attributes across features (RUM, logs, tracing) and across read/write paths.RumDataWriter,FileEventBatchWriter,BatchFileReaderWriter/PlainBatchFileReaderWriter,TLVBlock/TLVBlockFileReader,CoreFeature, andSdkFeatureso telemetry errors report which feature and how many bytes were involved when an event is dropped.Motivation
When events are dropped during persistence (e.g. oversized batches), the existing telemetry lacked enough context to diagnose which feature was affected and how large the dropped payload was. RUM-767 tracks improving this visibility.
Additional Notes
TLVBlockFileReader,RumDataWriter,FileEventBatchWriter,BatchFileReaderWriter/PlainBatchFileReaderWriter,TLVBlock.dd-sdk-android-internalpublic API surface updated (apiSurface/.apifiles) for the newTelemetryAttributesobject.Review checklist (to be filled by reviewers)