Commit 4a0c58b
[fix](cloud) Drain txn lazy committer workers before destruction (#63876)
## What
Fix shutdown ordering in `TxnLazyCommitter` by explicitly stopping
worker pools before member destruction can invalidate state used by
worker callbacks.
## Why
Lazy commit worker jobs keep a back pointer to `TxnLazyCommitter` and
call back into `remove()`. They can also access the parallel commit pool
and resource manager during `commit()`. With the default destructor,
`running_tasks_`, `mutex_`, and `parallel_commit_pool_` are destroyed
before `worker_pool_` is joined, which can lead to shutdown-time
use-after-destruction.
## How
- Add an explicit `TxnLazyCommitter` destructor.
- Mark the committer as stopped before draining workers.
- Stop and join the lazy commit worker pool before destroying task
tracking state.
- Stop the parallel commit pool after lazy workers are quiesced.
- Make failed or post-shutdown submissions complete with an error
instead of leaving waiters blocked.
## Tests
- `sh format_code.sh cloud/src/meta-service/txn_lazy_committer.h`
- `sh format_code.sh cloud/src/meta-service/txn_lazy_committer.cpp`
- `sh run-cloud-ut.sh --run --fdb
"fdb_cluster0:cluster0@10.26.20.4:4500"`
- Build passed.
- `txn_lazy_commit_test` passed 24/24 in the full run.
- The full run had unrelated storage vault/HDFS failures in
`meta_service_test`.
- After tightening the submit/shutdown race:
- `sh run-cloud-ut.sh --run --fdb
"fdb_cluster0:cluster0@10.26.20.4:4500" --filter
"txn_lazy_commit_test:*.*"`
- Build passed; 22/24 passed, 2 tests failed due FDB `Timeout` while
committing setup transactions.
Co-authored-by: gavinchou <gavinchou@apache.org>1 parent e072997 commit 4a0c58b
2 files changed
Lines changed: 54 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
633 | 633 | | |
634 | 634 | | |
635 | 635 | | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
636 | 646 | | |
637 | 647 | | |
638 | 648 | | |
639 | | - | |
640 | | - | |
641 | | - | |
642 | | - | |
643 | | - | |
| 649 | + | |
644 | 650 | | |
645 | 651 | | |
646 | 652 | | |
| |||
965 | 971 | | |
966 | 972 | | |
967 | 973 | | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
968 | 993 | | |
969 | 994 | | |
970 | 995 | | |
| |||
978 | 1003 | | |
979 | 1004 | | |
980 | 1005 | | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
981 | 1012 | | |
982 | 1013 | | |
983 | 1014 | | |
| |||
986 | 1017 | | |
987 | 1018 | | |
988 | 1019 | | |
989 | | - | |
990 | 1020 | | |
991 | | - | |
992 | | - | |
993 | | - | |
994 | | - | |
995 | | - | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
996 | 1031 | | |
997 | 1032 | | |
998 | 1033 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
48 | 53 | | |
49 | 54 | | |
50 | 55 | | |
| |||
66 | 71 | | |
67 | 72 | | |
68 | 73 | | |
| 74 | + | |
69 | 75 | | |
70 | 76 | | |
71 | 77 | | |
| |||
82 | 88 | | |
83 | 89 | | |
84 | 90 | | |
| 91 | + | |
85 | 92 | | |
86 | | - | |
| 93 | + | |
0 commit comments