[ace6tao2] Linux Proactor Improvements#2524
[ace6tao2] Linux Proactor Improvements#2524simpsont-oci wants to merge 32 commits intoDOCGroup:ace6tao2from
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ACE/tests/Proactor_Scatter_Gather_Test.cpp (1)
1338-1365:⚠️ Potential issue | 🟡 MinorDocument the new
-toption inprint_usage().The parser accepts backend selection now, but the usage text still omits it, so the error path doesn't tell users how to invoke the new option.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp` around lines 1338 - 1365, The usage text is missing documentation for the new -t option; update the print_usage() function to include a concise description of "-t <backend>" (or similar) and explain accepted backend values (as parsed by Proactor_Test_Backend::parse_type) so users see how to select a backend when the parser hits the error path; add the flag and a short example/valid values list to the existing usage string returned/printed by print_usage().
🧹 Nitpick comments (2)
ACE/ace/Uring_Asynch_IO.h (1)
145-148: Consider ACE container alternatives tostd::set.Using
std::setforpending_results_is functional, but ACE traditionally uses its own containers likeACE_Unbounded_Setfor consistency. However,std::set's O(log n) operations for insert/erase/find may be preferable here for tracking pending async operations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ACE/ace/Uring_Asynch_IO.h` around lines 145 - 148, The declaration of pending_results_ uses std::set<ACE_Uring_Asynch_Result*> which diverges from ACE container conventions; replace it with ACE_Unbounded_Set<ACE_Uring_Asynch_Result*> (and add the corresponding `#include` <ace/Unbounded_Set.h>) and update all usages of pending_results_ (insert/erase/find iterations) to the ACE_Unbounded_Set API (e.g., insert, remove, find/seek as appropriate), or if you intentionally need ordered/O(log n) semantics retain std::set but add a comment near pending_results_ and in ACE_Uring_Asynch_Result usage explaining the performance/order requirement to justify keeping the STL container.ACE/ace/Uring_Asynch_IO.cpp (1)
51-55: Consider adding null-safety tohandler()method.If
handler_proxy_is ever an emptyProxy_Ptr, callingget()returnsnullptrand the subsequent->handler()dereference would crash. While this may follow existing ACE patterns assuming valid proxies, adding a defensive check would improve robustness.Suggested defensive check
ACE_Handler * ACE_Uring_Asynch_Result::handler (void) const { - return this->handler_proxy_.get ()->handler (); + ACE_Handler::Proxy *proxy = this->handler_proxy_.get (); + return proxy ? proxy->handler () : 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ACE/ace/Uring_Asynch_IO.cpp` around lines 51 - 55, The ACE_Uring_Asynch_Result::handler() currently dereferences this->handler_proxy_.get() without checking for null; modify ACE_Uring_Asynch_Result::handler() to first obtain a pointer from handler_proxy_.get(), check if it's null, and if so return nullptr (or an appropriate default), otherwise call and return ptr->handler(); reference the method ACE_Uring_Asynch_Result::handler and the member handler_proxy_ (and its get()) when locating the code to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ACE/ace/POSIX_Proactor.cpp`:
- Around line 1083-1085: The no-arg handle_events() was changed to use a 1000ms
timeout causing busy polling; restore its original blocking behavior so
ACE_Proactor::proactor_run_event_loop() sleeps until work arrives. Locate the
call to handle_events_i in ACE_Proactor::handle_events() (currently invoked as
handle_events_i(1000)) and change it to use the blocking/infinite-wait variant
(i.e. pass the infinite/blocking timeout or call the blocking overload) so idle
proactor threads do not poll for work.
- Around line 1093-1103: post_completion added lazy init via
ensure_notify_manager(), but cancel_aio() and start_deferred_aio() still call
putq_result() directly so notify_completion() can return success without writing
to the pipe; ensure the notify-manager is created before any enqueue. Modify
cancel_aio() and start_deferred_aio() to call ensure_notify_manager() and handle
non-zero failure before calling putq_result(), or move the
ensure_notify_manager() call into putq_result()/enqueue path so
aiocb_notify_pipe_manager_ is guaranteed initialized whenever a result is
queued; keep notify_completion() behavior consistent and return error if
ensure_notify_manager() fails.
- Around line 545-551: The change replaces the real handler proxy with a
null_handler_proxy and turns the POSIX wakeup completion into a no-op, which
prevents dispatching ACE_Handler::handle_wakeup() and diverges from the WIN32
behavior; restore the original proxy usage by constructing
ACE_POSIX_Wakeup_Completion with the actual handler proxy (not
null_handler_proxy) so wakeup_completion will invoke handle_wakeup as before
(update the same pattern at the other occurrence noted around lines 2119-2121),
i.e., pass the real ACE_Handler::Proxy_Ptr instance used for the target handler
into ACE_POSIX_Wakeup_Completion instead of the null proxy.
In `@ACE/ace/Uring_Asynch_IO.cpp`:
- Around line 1172-1185: The send path sets result->msg()->msg_name to
remote_addr.get_addr() which can point at a temporary/stack buffer and cause a
use-after-free when the io_uring operation completes; modify
ACE_Uring_Asynch_Write_Dgram_Result to own a sockaddr buffer (e.g. add a member
like remote_addr_ and remote_addr_len_), copy remote_addr.get_addr() into that
buffer in the ACE_Uring_Asynch_Write_Dgram_Result constructor, and in
ACE_Uring_Asynch_Write_Dgram::send set result->msg()->msg_name =
result->remote_addr_ and result->msg()->msg_namelen = result->remote_addr_len_
before submitting with submit_result so the async io_uring sendmsg uses the
owned copy instead of the caller's buffer.
- Around line 952-996: The error log prefixes incorrectly reference
ACE_POSIX_Asynch_Connect::connect_i; update the ACELIB_ERROR messages in this
Uring implementation to use ACE_Uring_Asynch_Connect::connect_i instead —
specifically change the three occurrences around the setsockopt (near the
result->set_error(errno) after setsockopt), the bind (after ACE_OS::bind
failure), and the set_flags (after ACE::set_flags failure) so the log lines
reflect the ACE_Uring_Asynch_Connect::connect_i prefix while keeping the
existing message format and errno handling intact.
In `@ACE/ace/Uring_Proactor.cpp`:
- Around line 65-75: The two ACE_Uring_Proactor::handle_events overloads
currently return the raw batch size from process_cqes (e.g., 2..32) which breaks
the documented contract; change both handle_events() and the other call sites
that directly return process_cqes (e.g., the variants at the other noted
locations) to map the process_cqes return value into the documented semantic:
return 1 if process_cqes returns a positive count, 0 if it returns 0 (no
completions), and propagate -1 on error; locate the calls to process_cqes in
ACE_Uring_Proactor::handle_events(Time_Value&),
ACE_Uring_Proactor::handle_events(), and the other occurrences mentioned and
replace the direct return with this normalized mapping so callers receive only
1/0/-1.
- Around line 27-38: The init path leaves a partially initialized
ACE_Uring_Proactor when ::io_uring_queue_init fails; ensure the object is not
usable by either cleaning up and marking it fully uninitialized or by failing
construction: when io_uring_queue_init returns <0, call the appropriate cleanup
for ring_ (e.g., io_uring_queue_exit or equivalent), clear/zero ring_, set
is_initialized_ to false, and propagate the failure (return an error/throw) so
callers (which call process_cqes(), post_wakeup_completions(), etc.) never
receive a half-initialized instance; update
ACE_Uring_Proactor::io_uring_queue_init error path and any factory/creation code
to handle the propagated failure.
- Around line 116-177: The code incorrectly treats two completions from the same
handler as a duplicate and returns before calling io_uring_cqe_seen(), which can
leave a CQE at the head of the ring and starve later completions; change the
duplicate-detection to track ACE_Uring_Asynch_Result* (the actual pending key)
instead of result->handler(), and ensure io_uring_cqe_seen(&this->ring_, cqe) is
always called before any early return or continue. Concretely: replace
dispatched_handlers (std::set<const void *>) with a set of
ACE_Uring_Asynch_Result* (or equivalent name), check
dispatched_results.find(result) instead of dispatched_handlers.find(handler),
call io_uring_cqe_seen(&this->ring_, cqe) before skipping/returning, and remove
the handler-based logic that returns when the same handler is seen so that
multiple CQEs for the same handler are treated as independent completions.
In `@ACE/include/makeinclude/platform_linux.GNU`:
- Around line 72-74: When $(uring) is 1 the makefile currently adds -luring but
does not define the ACE_HAS_IO_URING macro required by guarded code (e.g., files
Uring_Proactor.h and Uring_Asynch_IO.h); modify the conditional that appends
LIBS so that it also adds a preprocessor definition (e.g., append
-DACE_HAS_IO_URING to the compiler flags such as CPPFLAGS or CXXFLAGS) inside
the same ifeq ($(uring),1) block so the guarded symbols are compiled when
linking liburing.
In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp`:
- Around line 875-879: The extra call to initiate_write_file() in the
final-flush branch can fail and leave the writer alive with receiver_count_ == 0
and io_count_ == 0; after calling initiate_write_file() you must check its
return value and, on failure, explicitly perform the same completion/cleanup
path so the writer can terminate. Modify the branch that contains the
initiate_write_file() call to capture its return value and, if non-zero, invoke
the object's completion/cleanup helper (e.g., call the existing
handle_write_complete() or the code path that decrements
receiver_count_/io_count_ and triggers shutdown) so the event loop is not left
without a terminator.
In `@ACE/tests/Proactor_Stress_Test.cpp`:
- Around line 177-190: schedule_one() currently increments pending_ after
calling proactor_.schedule_timer(), which lets a zero-delay timer fire before
bookkeeping and causes wait_for_idle() races; move the bookkeeping ahead of
arming the timer: under the existing lock (ACE_GUARD_RETURN on this->lock_),
increment this->pending_ before calling this->proactor_.schedule_timer(...),
then call schedule_timer; if schedule_timer returns -1, decrement this->pending_
and increment this->schedule_failures_ to roll back, otherwise return success as
before; references: schedule_one(), proactor_.schedule_timer(), pending_,
schedule_failures_, wait_for_idle().
In `@ACE/tests/Proactor_Test_IPV6.cpp`:
- Around line 1733-1735: The current set_proactor_type uses only
Proactor_Test_Backend::parse_type(ptype, proactor_type) which merely parses
names and can allow selecting a backend that isn't build-available, causing
task1.start() to fail and run_main() to hang; restore the previous availability
gate here by checking the backend availability after parsing (e.g., call the
backend availability helper for proactor_type or
Proactor_Test_Backend::is_available/proactor_available(proactor_type)) and
return false (or perform a hard exit) when the chosen backend cannot be
instantiated so tests fail fast instead of waiting forever.
In `@ACE/tests/Proactor_UDP_Test.cpp`:
- Around line 1955-1957: set_proactor_type currently calls
Proactor_Test_Backend::parse_type(ptype, proactor_type) which only validates the
token and allows platform-incompatible backends through; update
set_proactor_type to also check backend availability (reuse the previous
availability check in Proactor_Test_Backend or call a new
is_available()/validate_availability method) and return false if the backend is
unsupported so the test fails fast, or alternatively ensure run_main() detects
startup failure (e.g., task1.start() failure) and returns immediately; reference
Proactor_Test_Backend::parse_type, set_proactor_type, run_main, and task1.start
when applying the change.
---
Outside diff comments:
In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp`:
- Around line 1338-1365: The usage text is missing documentation for the new -t
option; update the print_usage() function to include a concise description of
"-t <backend>" (or similar) and explain accepted backend values (as parsed by
Proactor_Test_Backend::parse_type) so users see how to select a backend when the
parser hits the error path; add the flag and a short example/valid values list
to the existing usage string returned/printed by print_usage().
---
Nitpick comments:
In `@ACE/ace/Uring_Asynch_IO.cpp`:
- Around line 51-55: The ACE_Uring_Asynch_Result::handler() currently
dereferences this->handler_proxy_.get() without checking for null; modify
ACE_Uring_Asynch_Result::handler() to first obtain a pointer from
handler_proxy_.get(), check if it's null, and if so return nullptr (or an
appropriate default), otherwise call and return ptr->handler(); reference the
method ACE_Uring_Asynch_Result::handler and the member handler_proxy_ (and its
get()) when locating the code to update.
In `@ACE/ace/Uring_Asynch_IO.h`:
- Around line 145-148: The declaration of pending_results_ uses
std::set<ACE_Uring_Asynch_Result*> which diverges from ACE container
conventions; replace it with ACE_Unbounded_Set<ACE_Uring_Asynch_Result*> (and
add the corresponding `#include` <ace/Unbounded_Set.h>) and update all usages of
pending_results_ (insert/erase/find iterations) to the ACE_Unbounded_Set API
(e.g., insert, remove, find/seek as appropriate), or if you intentionally need
ordered/O(log n) semantics retain std::set but add a comment near
pending_results_ and in ACE_Uring_Asynch_Result usage explaining the
performance/order requirement to justify keeping the STL container.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 60d270fc-a2dd-4cab-b7ad-bac479ebb317
📒 Files selected for processing (23)
ACE/ace/Asynch_IO.hACE/ace/POSIX_Asynch_IO.cppACE/ace/POSIX_Proactor.cppACE/ace/POSIX_Proactor.hACE/ace/POSIX_Proactor.inlACE/ace/Proactor.cppACE/ace/Uring_Asynch_IO.cppACE/ace/Uring_Asynch_IO.hACE/ace/Uring_Proactor.cppACE/ace/Uring_Proactor.hACE/ace/ace.mpcACE/include/makeinclude/platform_linux.GNUACE/tests/Proactor_Contract_Test.cppACE/tests/Proactor_File_Test.cppACE/tests/Proactor_Scatter_Gather_Test.cppACE/tests/Proactor_Stress_Test.cppACE/tests/Proactor_Test.cppACE/tests/Proactor_Test_Backend.hACE/tests/Proactor_Test_IPV6.cppACE/tests/Proactor_Timer_Test.cppACE/tests/Proactor_UDP_Test.cppACE/tests/run_test.lstACE/tests/tests.mpc
|
Does this fixed any of the open proactor bugzilla issues? Is #2479 something related? |
jwillemsen
left a comment
There was a problem hiding this comment.
did just a quick scan, not a detailed review, for master there are several places where C++17 features should be used
I'm guessing this will address a couple of the Bugzilla issues, but the fixes were primarily gear at getting the new and existing tests passing for all supported Proactor implementations. Also ran the proactor tests with ASAN, TSAN, and UBSAN enabled and fixed the issues I found there, so many of the concurrency issues are likely resolved. Though I will say that on my system TSAN and aiocb did not play nicely, causing a SEGV in the aio implementation (non-user-code) at thread startup. But the other combinations should all be clean. #2479 seems worthwhile but does not overlap with the fixes in my PR... perhaps the stress test should also have attempted to stress the cancellation pathways. |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
ACE/tests/Proactor_Test_IPV6.cpp (1)
1855-1879:⚠️ Potential issue | 🟠 MajorDon't wait forever when no IPv6 sessions come up.
Line 1876 has the same hang mode as the IPv4 test: if
acceptor->open()/connector->start()never produce a live session,testing_done()never becomes true and the test spins forever. Add a timeout or abort when setup fails before entering the drain loop.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ACE/tests/Proactor_Test_IPV6.cpp` around lines 1855 - 1879, The drain loop can hang forever if no sessions start; before sleeping to wait for sessions, check the setup result (rc) and/or implement a timeout guard: after calling acceptor->open(...) and connector->start(...), record a start timestamp and replace the while (!test.testing_done()) loop with a timed wait that breaks after a reasonable timeout (or aborts immediately if rc indicates setup failed), then call test.stop_all(); reference acceptor->open, connector->start, testing_done(), and test.stop_all() when adding the timeout check/early-abort logic.ACE/tests/Proactor_Test.cpp (1)
1798-1823:⚠️ Potential issue | 🟠 MajorBound the wait for sessions to appear.
Line 1819 can block forever now.
TestData::testing_done()stays false until at least one client or server reports*_up_, so startup/connect failures, an unreachable peer, or server-only runs never escape this loop. Fail fast or add a timeout before waiting for drain.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ACE/tests/Proactor_Test.cpp` around lines 1798 - 1823, The waiting loop can block forever because test.testing_done() may never become true; modify the loop that currently calls ACE_OS::sleep(1) (around test.testing_done() and before test.stop_all()) to enforce a maximum timeout: record a start time (e.g., via ACE_OS::gettimeofday or ACE_High_Res_Timer), loop checking test.testing_done() and elapsed time, sleep briefly between checks (ACE_OS::sleep), and if the timeout is exceeded log an error/debug message and break to call test.stop_all(); ensure you reference the existing symbols test.testing_done(), test.stop_all(), ACE_OS::sleep and the time API you choose so the change is localized and cleanup still occurs.
♻️ Duplicate comments (1)
ACE/ace/Uring_Proactor.cpp (1)
575-588:⚠️ Potential issue | 🟠 MajorMissing
is_initialized_check before accessing the ring.This method acquires only
sq_mutex_but doesn't verifyis_initialized_. Afterclose()has destroyed the ring, a subsequent call topost_wakeup_completions()will access the destroyed ring, causing undefined behavior.🔒 Proposed fix
int ACE_Uring_Proactor::post_wakeup_completions (int count) { ACE_GUARD_RETURN (ACE_Thread_Mutex, guard, this->sq_mutex_, -1); + + if (!this->is_initialized_) + return -1; + for (int i = 0; i < count; ++i) { struct io_uring_sqe *sqe = ::io_uring_get_sqe (&this->ring_);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ACE/ace/Uring_Proactor.cpp` around lines 575 - 588, Add an is_initialized_ check to ACE_Uring_Proactor::post_wakeup_completions to avoid accessing ring_ after close(): acquire the sq_mutex_ as currently done, then immediately return an error (e.g. -1) if is_initialized_ is false; ensure you check the same boolean used by close() and protect accesses to ::io_uring_get_sqe, ::io_uring_prep_nop and ::io_uring_submit accordingly so no ring_ operations occur when is_initialized_ is cleared.
🧹 Nitpick comments (1)
ACE/ace/Uring_Proactor.cpp (1)
172-178: Unnecessaryconst_cast.The
timeoutvariable is already non-const, so theconst_castis redundant.♻️ Suggested fix
struct __kernel_timespec timeout; timeout.tv_sec = local_wait_time.sec (); timeout.tv_nsec = local_wait_time.usec () * 1000; ret = ::io_uring_wait_cqe_timeout (&this->ring_, &cqe, - const_cast<__kernel_timespec *> (&timeout)); + &timeout);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ACE/ace/Uring_Proactor.cpp` around lines 172 - 178, The call to ::io_uring_wait_cqe_timeout is using a redundant const_cast on &timeout; since timeout is a non-const __kernel_timespec, remove the const_cast and pass its address directly (replace const_cast<__kernel_timespec *>(&timeout) with &timeout) in the call from Uring_Proactor::ring_/cqe handling where ACE_Time_Value local_wait_time and timeout are constructed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ACE/ace/POSIX_CB_Proactor.cpp`:
- Around line 265-280: The race occurs because
notification_state_->add_pending() is only called after start_aio_i(result) and
a fast completion may use notification_state_ before it's incremented; move the
pending-count increment so add_pending() (or equivalent increment on
notification_state_) is performed immediately after allocating the slot (after
allocate_aio_slot(result) and before calling start_aio_i(result)), and ensure
you roll it back if start_aio_i(result) fails or returns the deferred path
(i.e., decrement or call remove_pending/undo on the same notification_state_
when ret_val != 0 and on error paths), while keeping assignments to
result_list_, aiocb_list_, and aiocb_list_cur_size_ consistent with this new
ordering.
- Around line 79-103: ACE_POSIX_CB_Proactor::close currently can return while
Notification_State still references this->sema_, allowing late SIGEV_THREAD
callbacks (complete_one) to dereference a freed semaphore; fix by making the
callback state self-contained and owned until pending()==0: change the lifetime
model so notification_state_ is transferred to a heap-owned, refcounted object
(e.g., wrap Notification_State in a shared ownership unit or add an internal
refcount and ensure callbacks capture that ownership) and ensure the proactor
clears notification_state_ only after ownership is released and pending()==0;
update places that create/dispatch SIGEV_THREAD callbacks (complete_one and any
timer/notification registration) to capture/hold the Notification_State owner
rather than raw this->sema_ so callbacks become no-ops or safe when the proactor
is destroyed.
- Around line 127-133: The override ACE_POSIX_CB_Proactor::post_completion
currently only enqueues the result via putq_result(result) so threads blocked in
handle_events()/handle_events_i() (which wait on sema_.acquire() before calling
process_result_queue()) never wake; after successfully putting the result on the
queue (putq_result) you must release the semaphore (sema_.release() or
equivalent) while holding the mutex protection to wake the event loop; update
post_completion to call sema_.release() (or signal the same semaphore used by
handle_events_i()) after enqueueing the result so process_result_queue() will
run.
In `@ACE/ace/Uring_Proactor.cpp`:
- Around line 161-205: Recheck the is_initialized_ flag after acquiring
cq_mutex_ in process_cqes to avoid operating on a destroyed ring: inside the
ACE_GUARD_RETURN (ACE_Thread_Mutex, guard, this->cq_mutex_, -1) block (the
section that calls ::io_uring_peek_cqe/::io_uring_wait_cqe* and processes cqe),
add an immediate check if (!this->is_initialized_) return processed (or
appropriate sentinel) so that if close() cleared is_initialized_ and called
io_uring_queue_exit() between the earlier check and acquiring cq_mutex_,
process_cqes will bail out instead of calling io_uring on a torn-down ring; keep
the existing outer check but make this inner recheck the authoritative guard.
In `@ACE/tests/Proactor_Network_Performance_Test.cpp`:
- Around line 284-310: The UDP branch exits the wait loop on progress timeout
without marking the run as failed, allowing wait_for_completion()/validate_i()
to accept runs with incomplete endpoints; modify the branch in the loop that
currently checks if (this->config_.transport == TRANSPORT_UDP) break; to instead
set this->failed_ = true and this->stalled_ = true (the same as the non-UDP
path) before breaking, so incomplete endpoints incrementing completed_endpoints_
cannot cause validate_i()/wait_for_completion() to report success and
task.stop() to tear down active reads.
- Around line 791-802: The completion handler incorrectly advances the
ACE_Message_Block read pointer a second time by calling
mb.rd_ptr(result.bytes_transferred()), which drops data on a partial write;
remove that extra rd_ptr() call so the retry uses the existing read pointer that
ACE_Asynch_Write_Stream already advanced, and keep the subsequent write retry
and error handling (the mb.release(), report_error, and writes_inflight_
decrement) unchanged; look for the block handling result.bytes_transferred() vs
result.bytes_to_write() in the completion path to locate the offending
mb.rd_ptr(...) call.
- Around line 1311-1343: The numeric flag parsing currently casts ACE_OS::atoi()
results directly into unsigned types (cases handling config.max_aio_operations,
config.payload_size, config.messages_per_endpoint, config.sessions,
config.listen_port, config.thread_count, config.write_depth), which lets
negative or malformed inputs wrap to huge positive values; change each case to
robustly parse and validate the argument (e.g., use a signed parse like
ACE_OS::strtol or similar with errno checking), verify the value is numeric,
non-negative, and within the target type's range (for listen_port ensure it fits
u_short, for size_t fields ensure value >= 0 and below a sane limit), and return
-1 on invalid input instead of casting and proceeding. Ensure these checks are
applied in the switch cases for 'a','b','m','n','p','T','w' (and keep the
existing backend parsing for 't') so bad flags fail fast.
In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp`:
- Around line 885-896: The code deletes the Writer (delete this) while Receiver
instances may still call Receiver::writer_->on_delete_receiver(), causing null
deref; instead remove the immediate delete this in the failure paths inside
Writer (e.g., in the handle_write_file failure blocks) and only call
ACE_Proactor::instance()->end_event_loop(); set a safe flag or schedule Writer
deletion to occur after all Receivers are torn down (or from the proactor
shutdown callback) so receivers never race against destruction; additionally
make Receiver::~Receiver() defensive by checking Receiver::writer_ for nullptr
before calling on_delete_receiver() to avoid crashes if writer is already
cleared.
In `@ACE/tests/Proactor_Stress_Test.cpp`:
- Around line 256-267: The test currently calls handler.wait_for_idle(...)
unconditionally, so even when immediate_shutdown is true the code waits up to
10s for timers to drain; to fix this, make the wait_for_idle() call conditional
on !immediate_shutdown (i.e. only call handler.wait_for_idle when
immediate_shutdown is false), compute reached_idle only in that branch, and
otherwise skip waiting and proceed to the abrupt shutdown path that calls
task.stop(); update the logic around handler.stop_dispatching(),
handler.wait_for_idle(), immediate_shutdown, and task.stop() accordingly (use
the existing symbols handler.wait_for_idle, immediate_shutdown, task.stop,
Proactor_Test_Backend::name to locate the code).
In `@ACE/tests/Proactor_Test_Backend.h`:
- Around line 418-424: create_impl has allocated implementation before
ACE_NEW_RETURN; if ACE_Proactor allocation fails that implementation is leaked.
After the ACE_NEW_RETURN attempt, detect the failure path and delete or release
the implementation (e.g., delete implementation or call its appropriate cleanup
method) before returning -1; update the block around create_impl, implementation
and ACE_Proactor (symbols: create_impl, implementation, ACE_NEW_RETURN,
ACE_Proactor) so the implementation is freed whenever ACE_Proactor construction
fails.
In `@ACE/tests/Proactor_Test.cpp`:
- Around line 1682-1684: set_proactor_type currently only calls
Proactor_Test_Backend::parse_type which validates the token but does not reject
backends that are not available at build/runtime; change set_proactor_type (and
related parsing flow) to first call Proactor_Test_Backend::parse_type(ptype,
proactor_type) and then call the backend availability check (e.g., a
Proactor_Test_Backend::is_available(proactor_type) or equivalent used in
Proactor_Test_IPV6/Proactor_UDP tests) and return false if the backend is
unavailable; this prevents run_main from treating a non-instantiable backend as
a false pass when task1.start() fails and ensures proactor_type is only accepted
when both valid and instantiable.
In `@ACE/tests/Proactor_UDP_Test.cpp`:
- Around line 2105-2123: The wait loop that blocks on test.testing_done() can
hang forever; modify the logic around the while (!test.testing_done()) loop to
implement a fail-fast timeout: record the current time before the loop, loop
until either test.testing_done() returns true or a configurable timeout (e.g.,
max_wait_seconds) elapses, and on timeout log a clear error, call
test.stop_all(), shutdown master and connector as currently done, and
return/exit the test with a failure status so the hang is surfaced; reference
the existing symbols testing_done(), test.stop_all(), master->shutdown(),
connector->start() and expand the flow to bail out on timeout instead of looping
forever.
- Around line 752-760: Master::shutdown currently cancels an in-progress receive
via rd_.cancel() and immediately calls sock_.close(), which can wedge on some
POSIX platforms; modify shutdown to avoid closing the socket while
recv_in_progress_ is non-zero: after calling rd_.cancel() wait for
recv_in_progress_ to reach zero (e.g., poll/sleep or use a
condition/notification that the receive handler clears recv_in_progress_) before
calling sock_.close(), or alternatively gate the immediate sock_.close() behind
a platform check so only platforms that safely support close-after-cancel
perform it; update references in Master::shutdown, recv_in_progress_,
rd_.cancel(), and sock_.close() accordingly.
In `@ACE/tests/run_proactor_correctness_matrix.pl`:
- Around line 179-192: The default `@backends` list is hardcoded and may include
backends not built on the current system; change the initialization so `@backends`
is derived from the actual configured/available backends before applying
include_default and `@requested_backends` filtering. Specifically, replace the
static qw(aiocb sig cb uring) with a call or filter that queries the
build/runtime availability (e.g., a helper like get_available_backends() or
checking an existing availability map) then prepend 'default' if include_default
is true, and then validate `@requested_backends` against that availability using
contains_value as currently done so unsupported backends are skipped rather than
causing avoidable failures.
In `@ACE/tests/run_proactor_performance_matrix.pl`:
- Around line 433-444: The default `@backends` list (aiocb, sig, cb, uring) must
first be intersected with the build-configured available backends before running
the matrix so unavailable backends are skipped; obtain the configured
availability (e.g., from the existing availability data structure or a helper
like available_backends()), set `@backends` = grep { contains_value($_,
`@available_backends`) } `@backends`, and then when `@requested_backends` is present
validate against that filtered `@backends` (use contains_value against the
filtered list) — do not treat unavailable defaults as hard failures (remove the
unconditional exit for defaults), only error if the user explicitly requested a
backend not present in the filtered `@backends`.
In `@ACE/tests/run_test.lst`:
- Line 206: The Proactor_Timer_Test entry removed Win32 coverage by specifying
only "-t d" and excluding Win32; update the test matrix for Proactor_Timer_Test
to include the Win32 timer configuration (add "-t w" or remove the "!Win32"
exclusion) so the Win32 proactor timer path is exercised again—locate the
Proactor_Timer_Test line in ACE/tests/run_test.lst and modify the timer flags to
match neighboring proactor suites that explicitly include the "-t w" Win32 case.
---
Outside diff comments:
In `@ACE/tests/Proactor_Test_IPV6.cpp`:
- Around line 1855-1879: The drain loop can hang forever if no sessions start;
before sleeping to wait for sessions, check the setup result (rc) and/or
implement a timeout guard: after calling acceptor->open(...) and
connector->start(...), record a start timestamp and replace the while
(!test.testing_done()) loop with a timed wait that breaks after a reasonable
timeout (or aborts immediately if rc indicates setup failed), then call
test.stop_all(); reference acceptor->open, connector->start, testing_done(), and
test.stop_all() when adding the timeout check/early-abort logic.
In `@ACE/tests/Proactor_Test.cpp`:
- Around line 1798-1823: The waiting loop can block forever because
test.testing_done() may never become true; modify the loop that currently calls
ACE_OS::sleep(1) (around test.testing_done() and before test.stop_all()) to
enforce a maximum timeout: record a start time (e.g., via ACE_OS::gettimeofday
or ACE_High_Res_Timer), loop checking test.testing_done() and elapsed time,
sleep briefly between checks (ACE_OS::sleep), and if the timeout is exceeded log
an error/debug message and break to call test.stop_all(); ensure you reference
the existing symbols test.testing_done(), test.stop_all(), ACE_OS::sleep and the
time API you choose so the change is localized and cleanup still occurs.
---
Duplicate comments:
In `@ACE/ace/Uring_Proactor.cpp`:
- Around line 575-588: Add an is_initialized_ check to
ACE_Uring_Proactor::post_wakeup_completions to avoid accessing ring_ after
close(): acquire the sq_mutex_ as currently done, then immediately return an
error (e.g. -1) if is_initialized_ is false; ensure you check the same boolean
used by close() and protect accesses to ::io_uring_get_sqe, ::io_uring_prep_nop
and ::io_uring_submit accordingly so no ring_ operations occur when
is_initialized_ is cleared.
---
Nitpick comments:
In `@ACE/ace/Uring_Proactor.cpp`:
- Around line 172-178: The call to ::io_uring_wait_cqe_timeout is using a
redundant const_cast on &timeout; since timeout is a non-const
__kernel_timespec, remove the const_cast and pass its address directly (replace
const_cast<__kernel_timespec *>(&timeout) with &timeout) in the call from
Uring_Proactor::ring_/cqe handling where ACE_Time_Value local_wait_time and
timeout are constructed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 99588d15-505b-468c-86ee-4d656530780a
📒 Files selected for processing (36)
.github/workflows/linux.yml.github/workflows/windows.ymlACE/ace/Asynch_IO.cppACE/ace/Asynch_IO.hACE/ace/Asynch_IO_Impl.cppACE/ace/Asynch_IO_Impl.hACE/ace/Asynch_Pseudo_Task.cppACE/ace/Asynch_Pseudo_Task.hACE/ace/Framework_Component.cppACE/ace/INET_Addr.hACE/ace/INET_Addr.inlACE/ace/POSIX_Asynch_IO.cppACE/ace/POSIX_CB_Proactor.cppACE/ace/POSIX_CB_Proactor.hACE/ace/POSIX_Proactor.cppACE/ace/POSIX_Proactor.hACE/ace/Proactor.cppACE/ace/Thread_Exit.cppACE/ace/Uring_Asynch_IO.cppACE/ace/Uring_Asynch_IO.hACE/ace/Uring_Proactor.cppACE/ace/Uring_Proactor.hACE/tests/.gitignoreACE/tests/Proactor_File_Test.cppACE/tests/Proactor_Network_Performance_Test.cppACE/tests/Proactor_Scatter_Gather_Test.cppACE/tests/Proactor_Stress_Test.cppACE/tests/Proactor_Test.cppACE/tests/Proactor_Test_Backend.hACE/tests/Proactor_Test_IPV6.cppACE/tests/Proactor_Timer_Test.cppACE/tests/Proactor_UDP_Test.cppACE/tests/run_proactor_correctness_matrix.plACE/tests/run_proactor_performance_matrix.plACE/tests/run_test.lstACE/tests/tests.mpc
💤 Files with no reviewable changes (1)
- ACE/ace/Asynch_IO.cpp
✅ Files skipped from review due to trivial changes (5)
- ACE/ace/Framework_Component.cpp
- ACE/tests/.gitignore
- .github/workflows/linux.yml
- ACE/tests/tests.mpc
- ACE/ace/POSIX_Asynch_IO.cpp
🚧 Files skipped from review as they are similar to previous changes (3)
- ACE/ace/Uring_Asynch_IO.h
- ACE/ace/Uring_Asynch_IO.cpp
- ACE/ace/Uring_Proactor.h
|
|
||
| ACE_TRACE ("ACE_POSIX_AIOCB_Proactor::cancel_aio"); | ||
|
|
||
| if (this->get_impl_type () != ACE_POSIX_Proactor::PROACTOR_CB |
There was a problem hiding this comment.
Don't have checked the full class hierarchy, but shouldn't this be moved to a derived class?
There was a problem hiding this comment.
ACE_POSIX_CB_Proactor and ACE_POSIX_SIG_Proactor both inherit from ACE_POSIX_AIOCB_Proactor, which in turn inherits from ACE_POSIX_Proactor. I agree it looks odd for the base class to check the impl, but it's using it to determine how to control base class functionality. Perhaps if we had a boolean virtual method which only returned false from that derived class? That would look cleaner, but basically amounts to the same thing... the virtual method returns a different value for certain derived classes which influence how the base class behaves.
|
Needs a news entry, any docu extensions? |
…X Proactor Bug Fixes and Improvements
21f9cc8 to
249bee5
Compare
A few things:
Proactor_Test_Backendparsing and instantiation support and having all Proactor tests make use of it.Summary by CodeRabbit
Release Notes
New Features
Improvements
Tests
Build & CI