Skip to content

artdaq event builder timeouts in the end of the run ? #254

@pavel1murat

Description

@pavel1murat

taking data in MC2, I'm observing that an artdaq event builder which receives input from two boardreaders regularly reports a failure to communicate over XML-RPC in the end of the run, and times out after that. The event builder is labeled eb07 in the TRACE printout below.
The failure is less likely for low rates and very short runs.

I'm using artdaq v4_03_00.

3892 02-21 13:55:00.281330    22821748 2404450 2422898  14          eb07_CommandableInterface:119 NFO stop: Stop transition complete
3893 02-21 13:55:00.281295          35 2404450 2422898  14              eb07_DataReceiverCore:170 NFO stop: Completed the Stop transition for run 120386
3894 02-21 13:55:00.281251          44 2404450 2422898  14      eb07_SharedMemoryEventManager:942 NFO endOfData: EndOfData Complete. There were 1610804 buffers processed.
3895 02-21 13:55:00.281220          31 2404450 2404832  15      eb07_SharedMemoryEventManager:603 ERR RunArt: art process 2404833 was killed with signal 9 after running for 843.92 seconds, not restarting
3896 02-21 13:55:00.281213           7 2404450 2422898  14      eb07_SharedMemoryEventManager:800 NFO ShutdownArtProcesses: All art processes exited after 15 ms (SIGKILL).
3897 02-21 13:55:00.280999         214 2404450 2404832  15      eb07_SharedMemoryEventManager:570 NFO RunArt: Removing PID 2404833 from process list
3898 02-21 13:55:00.265383       15616 2404450 2422898  14     eb07_SharedMemoryEventManager:1531 ERR broadcastFragments_: Broadcast attempted but broadcast shared memory is unavailable!
3899 02-21 13:55:00.265345          38 2404450 2422898  14     eb07_SharedMemoryEventManager:1531 ERR broadcastFragments_: Broadcast attempted but broadcast shared memory is unavailable!
3999 02-21 13:54:54.264243        4218 2404450 2423329   6              eb07_xmlrpc_commander:792 ERR execute: Unable to get lock while trying to report the current state, returning busy
4231 02-21 13:54:24.043630         727 2404450 2422898  13     eb07_SharedMemoryEventManager:1006 NFO endRun: Run 120386 has ended. There were 1610802 events in this run.
4232 02-21 13:54:24.043604          26 2404450 2422898  13                  eb07_RequestSender:59 NFO ~RequestSender: Shutting down RequestSender: request_socket_: -1
4233 02-21 13:54:24.043581          23 2404450 2422898  13                  eb07_RequestSender:47 NFO ~RequestSender: Shutting down RequestSender: Waiting for 0 requests to be sent (total sent: 0)
4234 02-21 13:54:24.043543          38 2404450 2422898  13     eb07_SharedMemoryEventManager:1000 NFO endRun: Ending run 120386
4317 02-21 13:54:22.525290      576571 2404450 2422898  13              eb07_DataReceiverCore:132 NFO stop: Stopping run 120386
4318 02-21 13:54:22.525245          45 2404450 2422898  13          eb07_CommandableInterface:101 NFO stop: Stop transition started
4321 02-21 13:54:22.522512         112 2404450 2404853   8             eb07_TCPSocketTransfer:407 WRN disconnect_receive_socket_: transfer_between_114_and_207_RECV: disconnect_receive_socket_: Stop Message received. Closing socket 11 for rank 114

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions