Commit e232766
committed
Harden bounded authority crawl: atomic dequeue + per-corpus Analysis reuse
Addresses the code review in issue #2027 of the Phase-5 BFS crawl engine.
Issue #1 (atomic dequeue): AuthorityFrontierService.dequeue_queued() was a
plain filter(discovery_state="queued") read — the in_progress transition only
happened later in discover_and_bootstrap, leaving a window where two concurrent
crawl_authorities tasks could dequeue and bootstrap the same frontier row
(wasted provider calls, distorted counters). It now claims the rows it returns
inside a single SELECT ... FOR UPDATE SKIP LOCKED transaction, flipping them to
in_progress; a second worker skips locked rows and grabs the next ones.
Issue #2 (Analysis-per-section bloat): every section of an authority
bootstraps into ONE corpus (the provider title is a constant, so all usc-*
sections land in the single "United States Code" corpus), so the BFS calls
apply() on that corpus once per ingested section. Each call previously minted a
fresh Analysis via _get_analysis. crawl() now caches the Analysis the first
apply creates per corpus and threads it back through apply(analysis=...),
capping it at one provenance row per corpus.
Issue #3 (blocked_by_bound): clarified in comments that
min_demand_or_depth is populated only on the frontier_drained stop — where
every residual queued row is provably bound-excluded — and intentionally not on
the max_authorities / token_budget early stops, whose unreached-but-eligible
rows are accounted for by the frontier_residual census instead.
Minor: crawl_authorities / acrawl_authorities tool params now apply the
C.CRAWL_DEFAULT_* constants uniformly instead of None sentinels for two of five.
Regression tests: dequeue atomically claims returned rows / leaves filtered-out
rows queued (test_authority_frontier.py); a crawl ingesting multiple sections
of one authority reuses a single provenance Analysis (test_crawl_authorities.py
ApplyAnalysisReuseTests).
Closes #20271 parent 6df5dc0 commit e232766
6 files changed
Lines changed: 290 additions & 21 deletions
File tree
- changelog.d
- opencontractserver
- enrichment/services
- llms/tools/core_tools
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
Lines changed: 31 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
143 | 144 | | |
144 | 145 | | |
145 | 146 | | |
146 | | - | |
| 147 | + | |
147 | 148 | | |
148 | 149 | | |
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
152 | 164 | | |
153 | 165 | | |
154 | 166 | | |
155 | 167 | | |
156 | 168 | | |
157 | 169 | | |
158 | | - | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
159 | 188 | | |
160 | 189 | | |
161 | 190 | | |
| |||
Lines changed: 41 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
139 | 152 | | |
140 | | - | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
141 | 160 | | |
142 | 161 | | |
143 | 162 | | |
| |||
150 | 169 | | |
151 | 170 | | |
152 | 171 | | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
157 | 179 | | |
158 | 180 | | |
159 | 181 | | |
| |||
206 | 228 | | |
207 | 229 | | |
208 | 230 | | |
209 | | - | |
210 | | - | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
211 | 235 | | |
212 | 236 | | |
213 | 237 | | |
214 | 238 | | |
215 | 239 | | |
| 240 | + | |
216 | 241 | | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
217 | 251 | | |
218 | 252 | | |
219 | 253 | | |
| |||
Lines changed: 10 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
250 | | - | |
251 | | - | |
| 250 | + | |
| 251 | + | |
252 | 252 | | |
253 | 253 | | |
254 | 254 | | |
255 | 255 | | |
256 | 256 | | |
257 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
258 | 262 | | |
259 | 263 | | |
260 | 264 | | |
| |||
266 | 270 | | |
267 | 271 | | |
268 | 272 | | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
| 273 | + | |
| 274 | + | |
277 | 275 | | |
278 | 276 | | |
279 | 277 | | |
| |||
284 | 282 | | |
285 | 283 | | |
286 | 284 | | |
287 | | - | |
288 | | - | |
| 285 | + | |
| 286 | + | |
289 | 287 | | |
290 | 288 | | |
291 | 289 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
509 | 509 | | |
510 | 510 | | |
511 | 511 | | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
512 | 565 | | |
513 | 566 | | |
514 | 567 | | |
| |||
0 commit comments