You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: services/libs/tinybird/README.md
+6-58Lines changed: 6 additions & 58 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,18 +3,7 @@
3
3
[This image](https://uploads.linear.app/aebec7ad-5649-4758-9bed-061f7228a879/b72d9f55-8f27-4c57-81fe-729807c12ffb/36c116c2-0f88-4735-a932-0c3e6bf8ea45) shows how data flows from CM to Insights.
4
4
5
5
## Activity Preprocessing Pipeline
6
-
7
-
1.**New activities land** on `activities` and `activityRelations` datasources
8
-
2.**Deduplication** of activities via copy pipe:
9
-
-`activities_deduplicated_copy_pipe (every hour at minute 0)`
10
-
2.1. `activities` → `activities_deduplicated_ds`
11
-
3.**Preprocessing pipeline for activityRelations - Deduplicates, filters and sorts data for performant queries**:
12
-
-`activityRelations (every hour at minute 0)` → `activityRelations_deduplicated_cleaned_ds`
13
-
14
-
## Other Copy Pipes
15
-
16
-
1.**pull_request_analysis_copy_pipe (every hour at minute 15)**: Compacts activities from same PR into one, keeping state change times in the same row. Helps with serving PR related metrics
17
-
2.**issue_analysis_copy_pipe (every hour at minute 15)**: Similar to pr analysis, this time we compact issue related information into one row.
6
+
See LAMBDA_ARCHITECTURE.md for details
18
7
19
8
---
20
9
@@ -63,18 +52,6 @@ Since `activities` **don’t exist in Postgres**, schema iteration must be done
63
52
64
53
### Iterating on Datasources Replicated by Sequin
65
54
66
-
These sources exist in Postgres (i.e., all Tinybird datasources **except `activities`**):
3. (only for PROD) u need to create the topic in oracle kafka, it doesn't get created automaticly
74
+
3. (only for PROD) You need to create the topic in oracle kafka, it doesn't get created automaticly
99
75
4. Update tinybird kafka connect plugin env ( it's under crowd-kube/lf-prod-oracle(lf-staging-oracle)/kafka-connect/tinybird-sink.properties.enc ), there are list of tracked files in the decrypted file.
100
76
5. Restart kafka-connect
101
77
6. Create tinybird datasource schema and push it to tinybird
@@ -111,11 +87,11 @@ GRANT SELECT ON "tableName" to sequin;
111
87
112
88
### Downtime Consideration
113
89
114
-
Switching between old and new datasources can lead to **temporary downtime**, but only for **endpoint pipes that consume raw datasources directly**.
90
+
Switching between old and new datasources can lead to **temporary downtime**, but only for **endpoint pipes that consume raw datasources directly**.
115
91
116
-
**No Downtime** if the endpoint pipe uses a **deduplication copy pipe**:
117
-
- You can safely remove the raw datasource
118
-
- The deduplicated datasource will continue to serve data
92
+
**No Downtime** if the endpoint pipe uses a **copy pipe result**:
93
+
- You can safely remove the raw datasource after stopping the copy job
94
+
- The copy pipe result datasource will continue to serve data
119
95
- New fields will be included in the **next copy run**
120
96
121
97
**Only consider the following tips if your pipe is consuming raw datasources directly**:
@@ -127,34 +103,6 @@ Switching between old and new datasources can lead to **temporary downtime**, bu
127
103
128
104
---
129
105
130
-
### Alternative Way to Handle Datasource Iterations
131
-
132
-
You can avoid downtime entirely by **not deleting the old datasource**.
133
-
134
-
Instead of renaming the new datasource to the old one,
135
-
**Update each endpoint pipe to use the new datasource directly**
136
-
137
-
This allows your pipelines to stay active without interruption.
138
-
139
-
#### Pros:
140
-
- No downtime at all
141
-
- Safer testing of the new datasource before retiring the old one
142
-
143
-
#### Cons:
144
-
- Every pipe using the old datasource must be updated manually
145
-
- Easy to miss a reference if not done carefully
146
-
147
-
---
148
-
149
-
### Choosing the Right Approach
150
-
151
-
Until we move fully to **Tinybird Forward** (which will support migration scripts), the best practice is to **find a balance** between these two approaches:
152
-
153
-
1.**Quick rename strategy** is best when the raw datasource is only consumed by deduplication copy pipes, but no endpoints
154
-
2.**Pipe-by-pipe updates** for zero downtime where #1 is not enough
155
-
156
-
Pick the method that best fits your workflow and datasource complexity.
157
-
158
106
# Testing Tinybird Pipes Locally
159
107
160
108
This guide explains how to test a Tinybird data pipeline ("pipe") on your local Tinybird environment. We will fetch sample data (fixtures) from a staging Tinybird workspace and use it to run and verify a pipe locally. The steps below are written for a developer who may not be familiar with Tinybird, and they are organized in a clear, numbered format for easy follow-up.
0 commit comments