Commit e26c61b
feat(github): Add incremental data collection (#8858)
* fix(github): use RFC 3339 format for since= incremental API parameters
The since= query parameter passed to the GitHub API in four collectors
was formatted using Go's time.Time.String(), which produces a
human-readable string (e.g. "2024-01-15 10:30:00 +0000 UTC") rather
than the ISO 8601 / RFC 3339 format the GitHub API requires
(e.g. "2024-01-15T10:30:00Z").
The GitHub API silently ignores malformed date strings, causing these
collectors to perform full re-scans on every incremental run despite
appearing to filter correctly. Fix by using .UTC().Format(time.RFC3339)
in all four affected collectors:
- comment_collector.go
- issue_collector.go
- commit_collector.go
- pr_review_comment_collector.go
(cherry picked from commit cf3cb621462d8ae661df5fcb8e1b47c70564cd60)
* feat(github): migrate event extractor to StatefulApiExtractor
Switch Extract Events from the legacy full-scan NewApiExtractor to
NewStatefulApiExtractor, which filters _raw_github_api_events by
created_at >= last_run_start on incremental syncs. This table had
13,828 rows in a representative production run and took ~117s to
process on every run regardless of how few new events were collected.
After this change incremental runs process only newly collected rows.
(cherry picked from commit 84f324f65e8576cd2ff0ca8b74a2588ee2600e12)
* feat(github): migrate PR extractor to StatefulApiExtractor
Switch Extract Pull Requests from legacy full-scan NewApiExtractor to
NewStatefulApiExtractor. The _raw_github_api_pull_requests table had
12,448 rows in a representative production run, taking ~108s on every
incremental sync. After this change only newly collected PR rows are
processed.
SubtaskConfig captures prType and prComponent regex strings so that a
scope config change automatically triggers a full re-extract. BeforeExtract
deletes GithubPrLabel rows for the current PR before re-inserting them
in incremental mode, preventing stale labels from persisting when labels
are removed upstream.
(cherry picked from commit 80dc76d4e5f1ad6c65d48d42f60f50b87c9dad2d)
* feat(github): migrate workflow run and PR commit extractors to StatefulApiExtractor
Switch Extract Workflow Runs (~7,364 raw rows, ~65s) and Extract PR
Commits (~6,982 raw rows, ~60s) from legacy full-scan NewApiExtractor
to NewStatefulApiExtractor. Both extractors are simple mappings with
no scope-config dependency, so no SubtaskConfig or BeforeExtract needed.
(cherry picked from commit ea4f158a65ad0f7a2c23ba7bbc9932059e2ca408)
* feat(github): migrate remaining high-volume extractors to StatefulApiExtractor
Migrate Extract Jobs (~5,369 rows, ~48s), Extract PR Reviews (~3,073
rows, ~27s), and Extract PR Review Comments (~1,820 rows, ~16s) from
legacy full-scan NewApiExtractor to NewStatefulApiExtractor.
Also moves prUrlRegex compilation in pr_review_comment_extractor.go
from inside the Extract closure (recompiled on every raw row) to before
the extractor is created, eliminating redundant regexp compilation.
(cherry picked from commit 54d601587bc74ebd0f1103345c545571146739b5)
* feat(github): migrate remaining low-volume extractors to StatefulApiExtractor
Migrate the final seven GitHub extractors to NewStatefulApiExtractor:
issue, comment, account, account_org, milestone, commit, commit_stats.
issue_extractor gains SubtaskConfig (issue classification regex strings)
so scope config changes trigger automatic full re-extraction, and
BeforeExtract cleanup for GithubIssueLabel and GithubIssueAssignee rows
in incremental mode to prevent stale labels/assignees persisting after
upstream removal.
All other extractors in this commit are simple migrations with no
config-sensitivity or child record cleanup needed.
With this commit all 14 GitHub plugin extractors are now incremental.
Combined with the collector fixes in earlier commits, incremental
collection runs that previously took 9+ minutes in the extract phase
will now complete in seconds when few or no new records were collected.
(cherry picked from commit 606acea88caef63662733162faa47a6c6d3155cc)
* feat(github): migrate CICD converters to StatefulDataConverter
Convert Workflow Runs and Convert Jobs now use NewStatefulDataConverter,
skipping records unchanged since last run on incremental pipelines.
Jobs are filtered via JOIN on _tool_github_runs.github_updated_at.
(cherry picked from commit 75a909efccab1d04bfae4058f4674663b00762a6)
* feat(github): migrate PR supporting data converters to StatefulDataConverter
Convert PR Commits, Convert PR Comments, and Convert PR Reviews now use
NewStatefulDataConverter. Child-of-PR records are filtered incrementally
via JOIN on _tool_github_pull_requests.github_updated_at; PR comments
additionally filter on their own github_updated_at.
(cherry picked from commit 17cbc2e4caff7349fa2d3389d7cafcc5d0edee71)
* feat(github): migrate PR main and cross converters to StatefulDataConverter
Convert Pull Requests filters by GithubPullRequest.github_updated_at.
Convert Reviews and Convert PR Issues filter via JOIN on pull_requests
github_updated_at since reviewers and pr_issues have no own timestamp.
(cherry picked from commit 47437ae49d30b967b443c964a4b1baca34344acc)
* feat(github): migrate remaining data converters to StatefulDataConverter
Migrate the last 9 converters from DataConverter to StatefulDataConverter
so they skip already-processed records on incremental runs:
- issue_convertor: filter on github_updated_at
- issue_comment_convertor: filter on github_updated_at
- issue_label_convertor: JOIN to issues, filter on issues.github_updated_at
- issue_assignee_convertor: JOIN to issues, filter on issues.github_updated_at
- pr_label_convertor: JOIN to pull_requests, filter on pr.github_updated_at
- account_convertor: filter on updated_at
- release_convertor: filter on updated_at
- repo_convertor: filter on updated_at; retain GithubApiRepo struct (used by pr_extractor)
- commit_convertor: filter on authored_date
(cherry picked from commit 7bcc9da4676d34d9cd24e41a68765cf85723ac81)
* fix(github): keep incremental commit and PR-issue processing accurate
* fix(github): bootstrap workflow runs incremental window from tool state
* feat(runner): include processed record count in subtask finish logs
* fix(stateful): bootstrap subtask state from collector checkpoints
* fix(github): optimize incremental Convert Jobs query
* fix(github): resolve lint and staticcheck issues in converters
* fix(stateful): tolerate missing collector state table in bootstrap
* fix(github): address CI failures in state bootstrap and run tests
* fix(server): use interface{} for store handler swag annotations (#8859)
Signed-off-by: yamoyamoto <yamo7yamoto@gmail.com>
* fix(github): stabilize e2e state bootstrap and issue assignee join
* fix(stateful): skip collector bootstrap lookup when state table is absent
* test(stateful): use generic scope params in bootstrap tests
* fix(github): keep issue assignee conversion full-sync safe
---------
Signed-off-by: yamoyamoto <yamo7yamoto@gmail.com>
Co-authored-by: Tomoya Kawaguchi <68677002+yamoyamoto@users.noreply.github.com>1 parent ac73e27 commit e26c61b
44 files changed
Lines changed: 955 additions & 763 deletions
File tree
- backend
- core
- plugin
- runner
- helpers/pluginhelper/api
- impls/context
- plugins/github/tasks
- server/services/remote/bridge
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
336 | 336 | | |
337 | 337 | | |
338 | 338 | | |
339 | | - | |
| 339 | + | |
340 | 340 | | |
341 | 341 | | |
342 | 342 | | |
| |||
Lines changed: 55 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
91 | 96 | | |
92 | 97 | | |
93 | 98 | | |
| |||
127 | 132 | | |
128 | 133 | | |
129 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
130 | 185 | | |
131 | 186 | | |
132 | 187 | | |
| |||
Lines changed: 169 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
187 | 189 | | |
188 | 190 | | |
189 | 191 | | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
99 | 103 | | |
100 | 104 | | |
101 | 105 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
26 | | - | |
27 | 25 | | |
28 | 26 | | |
29 | 27 | | |
| |||
49 | 47 | | |
50 | 48 | | |
51 | 49 | | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | 50 | | |
59 | 51 | | |
60 | 52 | | |
61 | 53 | | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | 54 | | |
81 | 55 | | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
87 | 60 | | |
88 | 61 | | |
89 | 62 | | |
90 | 63 | | |
91 | | - | |
92 | 64 | | |
93 | | - | |
94 | | - | |
95 | | - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
96 | 88 | | |
97 | 89 | | |
98 | 90 | | |
| |||
0 commit comments