Commit 6987f64
committed
fix(operator): un-gate readiness on needs-reindex-all, REINDEX DATABASE CONCURRENTLY
The amcheck-driven smart pass introduced in #68 hits the same family
of postgres-internal pathology that wedges other vanilla DDL on the
prod dataset — bt_index_check itself burns 100% CPU forever on
specific indexes (observed via pg_stat_activity: state=active,
empty wait_event, query_start growing linearly with wall clock for
4+ minutes on a tiny system catalog index). The smart pass is
unusable on this data.
Two changes:
1. Drop bt_index_check; revert the needs-reindex-all branch to a
blind REINDEX DATABASE per user DB, CONCURRENTLY on PG ≥ 12 so
clients can keep using the old indexes during the rebuild and
the atomic swap makes corruption disappear without downtime.
REINDEX reads the heap and rebuilds the index from scratch — a
different code path from amcheck (which reads corrupt index
pages directly) — so it isn't subject to the same wedge. Slow
on prod-sized DBs (hours) but makes progress.
2. Drop the readiness probe's gate on /pgdata/needs-reindex-all.
With the reindex taking hours, gating readiness here trips the
operator's 30-minute deployment_ready_timeout and the restore is
marked Failed before postgres even has a chance to come up. The
probe still gates on /pgdata/needs-reindex (locale-only,
finishes in seconds) since that's small and proven.
Trade-off: clients hitting a not-yet-reindexed corrupt index in the
window between pod-Ready and reindex-complete get the explicit
"unexpected zero page" error from postgres. With CONCURRENTLY the
window is narrow (queries hit the old index until the atomic swap)
and clients can retry. Strictly better than the alternative
(permanently failed restore, replica stuck indefinitely on the
previous Active).
The pre-#68 behaviour for the locale-only path is unchanged.1 parent 98e1000 commit 6987f64
2 files changed
Lines changed: 74 additions & 98 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1251 | 1251 | | |
1252 | 1252 | | |
1253 | 1253 | | |
1254 | | - | |
1255 | | - | |
1256 | | - | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
1257 | 1261 | | |
1258 | | - | |
1259 | | - | |
1260 | | - | |
1261 | | - | |
1262 | | - | |
1263 | | - | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
1264 | 1267 | | |
1265 | | - | |
1266 | | - | |
1267 | | - | |
1268 | | - | |
1269 | | - | |
1270 | | - | |
1271 | | - | |
1272 | | - | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
1273 | 1276 | | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
1274 | 1287 | | |
1275 | | - | |
1276 | | - | |
1277 | | - | |
1278 | | - | |
1279 | | - | |
1280 | | - | |
1281 | | - | |
1282 | | - | |
1283 | | - | |
1284 | | - | |
1285 | | - | |
1286 | | - | |
1287 | | - | |
1288 | | - | |
1289 | | - | |
1290 | | - | |
1291 | | - | |
1292 | | - | |
1293 | | - | |
1294 | | - | |
1295 | | - | |
1296 | | - | |
1297 | | - | |
1298 | | - | |
1299 | | - | |
1300 | | - | |
1301 | | - | |
1302 | | - | |
1303 | | - | |
1304 | | - | |
1305 | | - | |
1306 | | - | |
1307 | | - | |
1308 | | - | |
1309 | | - | |
1310 | | - | |
1311 | | - | |
1312 | | - | |
1313 | | - | |
1314 | | - | |
1315 | | - | |
1316 | | - | |
1317 | | - | |
1318 | | - | |
1319 | | - | |
1320 | | - | |
1321 | | - | |
1322 | | - | |
1323 | | - | |
1324 | | - | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
| 1293 | + | |
1325 | 1294 | | |
1326 | 1295 | | |
1327 | 1296 | | |
| |||
1397 | 1366 | | |
1398 | 1367 | | |
1399 | 1368 | | |
1400 | | - | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
1401 | 1378 | | |
1402 | 1379 | | |
1403 | 1380 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
740 | 740 | | |
741 | 741 | | |
742 | 742 | | |
743 | | - | |
744 | | - | |
745 | | - | |
746 | | - | |
747 | | - | |
748 | | - | |
749 | | - | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
750 | 749 | | |
751 | 750 | | |
752 | 751 | | |
| |||
770 | 769 | | |
771 | 770 | | |
772 | 771 | | |
773 | | - | |
774 | | - | |
| 772 | + | |
| 773 | + | |
775 | 774 | | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
776 | 779 | | |
777 | | - | |
778 | | - | |
| 780 | + | |
| 781 | + | |
779 | 782 | | |
780 | 783 | | |
781 | | - | |
782 | | - | |
783 | | - | |
784 | | - | |
785 | | - | |
786 | | - | |
787 | | - | |
788 | | - | |
789 | | - | |
790 | | - | |
| 784 | + | |
| 785 | + | |
791 | 786 | | |
792 | 787 | | |
793 | 788 | | |
| |||
796 | 791 | | |
797 | 792 | | |
798 | 793 | | |
799 | | - | |
800 | | - | |
801 | | - | |
802 | | - | |
803 | | - | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
804 | 803 | | |
805 | 804 | | |
806 | 805 | | |
| |||
826 | 825 | | |
827 | 826 | | |
828 | 827 | | |
829 | | - | |
830 | | - | |
| 828 | + | |
| 829 | + | |
831 | 830 | | |
832 | 831 | | |
833 | | - | |
834 | | - | |
| 832 | + | |
| 833 | + | |
835 | 834 | | |
836 | 835 | | |
837 | 836 | | |
| |||
0 commit comments