11DESCRIPTION >
2- - `activities_deduplicated_ds` contains deduplicated raw activity events without relationship data.
2+ - `activities_deduplicated_ds` contains deduplicated raw activity events without relationship data.
33 - Created via copy pipe from `activities` datasource with deduplication and field selection for performance.
44 - Since aggregations are mainly done on relationships, `activityRelations_deduplicated_cleaned_ds` should be used for reporting purposes instead.
55 - Optimized subset of activity fields focused on core analytics needs.
@@ -9,11 +9,12 @@ DESCRIPTION >
99 - `type` specifies the activity type (issues-opened, pull-request-opened, etc.) using LowCardinality.
1010 - `channel` contains the repository, channel, or forum where activity occurred.
1111 - `sourceId` is the unique identifier from the source platform.
12- - `score` is the computed importance/impact score for the activity.
12+ - `score` is the computed importance/impact score for the activity.
1313 - `attributes` contains additional JSON metadata specific to the activity type.
1414 - `body` contains the activity’s textual body/content when applicable; empty string if not applicable.
1515 - `title` contains the activity’s title/subject when applicable; empty string if not applicable.
1616 - `url` is the direct link to the activity on the source platform; empty string if not available.
17+
1718TAGS "Activity preprocessing pipeline"
1819
1920SCHEMA >
@@ -31,11 +32,10 @@ SCHEMA >
3132 `url` String DEFAULT '',
3233 `updatedAt` DateTime64(3)
3334
34- ENGINE "MergeTree"
35- ENGINE_PARTITION_KEY "toYear(timestamp)"
36- ENGINE_SORTING_KEY "id, platform, channel"
37-
38-
3935INDEXES >
40- idx_body_ngram3 body TYPE ngrambf_v1(3, 2048, 6, 0) GRANULARITY 64,
41- idx_title_ngram3 title TYPE ngrambf_v1(3, 512, 6, 0) GRANULARITY 64
36+ idx_body_ngram3 body TYPE ngrambf_v1(3, 2048, 6, 0) GRANULARITY 64
37+ idx_title_ngram3 title TYPE ngrambf_v1(3, 512, 6, 0) GRANULARITY 64
38+
39+ ENGINE MergeTree
40+ ENGINE_PARTITION_KEY toYear(timestamp)
41+ ENGINE_SORTING_KEY id, platform, channel
0 commit comments