Skip to content

Commit 8574a2e

Browse files
committed
drivers/libusb1.c: use explicit context; drop per-close libusb_exit() [#598]
The original code called libusb_init(NULL) on every nut_libusb_open() and libusb_exit(NULL) on every nut_libusb_close(), bumping and decrementing the default context's refcount. That violates the documented contract for libusb_exit, which "should be called after closing all open devices and before your application terminates", and on certain firmware it wedges indefinitely: when libusb_exit hits the 1->0 transition it tears the context down, which waits on libusb's internal sync primitives for outstanding URBs to drain. URBs orphaned by libusb_reset_device or by an unexpected device disconnect never drain, so the wait never returns. The deadlock is reachable from any reconnect path. It was first recognized by 4f84b7f ("don't libusb_exit() when closing a previously opened device"), which dropped the call in nut_libusb_open()'s rematch loop. The fallthrough reconnect path in qx_command and the rest of the driver lifecycle still hit it. With the companion change in nutdrv_qx.c (a4813f868) that escalates persistent LIBUSB_ERROR_OVERFLOW to libusb_reset_device, the deadlock window opens on every escalation and was hit reproducibly on the 0665:5161 Cypress USB-serial bridge family (Salicru SPS, Ippon, ViewPower, Voltronic Power UPSes; see #598, #993, #2453). Switch to an explicit libusb context owned end-to-end: - Initialize once on the first nut_libusb_open() and register a matching one-shot atexit handler. - Remove libusb_exit() from nut_libusb_close(); the close path now only calls libusb_close(), which is non-blocking. - Replace libusb_get_device_list(NULL, ...) with the explicit context so the default context is never touched. - Bound the atexit libusb_exit() with a SIGALRM-based 2s timeout (POSIX) so a deadlock-prone teardown at shutdown can't wedge supervisor stop sequences (e.g. systemctl stop). Sub-millisecond on a healthy context. Bump USB_DRIVER_VERSION from 0.53 to 0.54. Validated on a Salicru SPS 1500 ONE BL (0665:5161) by authorize-toggle stress reproduction (8 cycles, clean reconnect each time, same driver PID throughout) and by a natural OVERFLOW event captured in soak (counter-gated logic at 1/3, next poll succeeded, no escalation; driver PID stable for 22h+). Signed-off-by: Pedro Cunha <pedroagracio+nut@gmail.com>
1 parent 0d2c7a1 commit 8574a2e

1 file changed

Lines changed: 95 additions & 7 deletions

File tree

drivers/libusb1.c

Lines changed: 95 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,14 @@
4040
#include "usb-common.h"
4141
#include "nut_libusb.h"
4242
#include "nut_stdint.h"
43+
#include "nut_bool.h"
44+
45+
#ifndef WIN32
46+
# include <signal.h> /* sigaction(), SIGALRM for the atexit watchdog */
47+
#endif
4348

4449
#define USB_DRIVER_NAME "USB communication driver (libusb 1.0)"
45-
#define USB_DRIVER_VERSION "0.53"
50+
#define USB_DRIVER_VERSION "0.54"
4651

4752
/* driver description structure */
4853
upsdrv_info_t comm_upsdrv_info = {
@@ -55,6 +60,76 @@ upsdrv_info_t comm_upsdrv_info = {
5560

5661
#define MAX_REPORT_SIZE 0x1800
5762

63+
/* Explicit libusb context, initialized once on the first call to
64+
* nut_libusb_open() and released exactly once at process shutdown via
65+
* the atexit handler below.
66+
*
67+
* Background: the original code called libusb_init(NULL) on every open and
68+
* libusb_exit(NULL) on every close, bumping/decrementing the default
69+
* context's refcount. That pattern violates the documented contract for
70+
* libusb_exit (which "should be called after closing all open devices and
71+
* before your application terminates"), and on certain firmware can wedge
72+
* indefinitely: when libusb_exit hits the 1->0 transition it tears the
73+
* context down, which waits on its internal sync primitives for URBs to
74+
* drain -- but URBs orphaned by libusb_reset_device or by an unexpected
75+
* USB device disconnect never drain, so the wait never returns. That
76+
* deadlock has been observed on the 0665:5161 Cypress USB-serial bridge
77+
* family (Salicru SPS, Ippon, ViewPower, various Voltronic Power UPSes;
78+
* see NUT issues #598, #993, #2453) but the underlying bug class is
79+
* generic and reachable via any reconnect path.
80+
*
81+
* The fix is to use an explicit context that we own end-to-end:
82+
* initialize it once at first open, never call libusb_exit during runtime
83+
* (close_dev no longer does), and release it exactly once at process
84+
* shutdown via atexit. A SIGALRM-bounded timeout on the atexit libusb_exit
85+
* keeps a deadlock-prone teardown from wedging supervisor shutdowns
86+
* (e.g. systemctl stop) if it ever happens at exit time. Generalizes the
87+
* 2018 partial fix in commit 4f84b7f92 ("don't libusb_exit() when closing
88+
* a previously opened device").
89+
*/
90+
static libusb_context *nut_usb_ctx = NULL;
91+
static nut_bool_t nut_usb_ctx_initialized = false;
92+
93+
#ifndef WIN32
94+
static void nut_libusb_cleanup_alarm(int sig)
95+
{
96+
NUT_UNUSED_VARIABLE(sig);
97+
/* libusb_exit is taking too long, almost certainly waiting on URBs
98+
* orphaned by a recent device reset / disconnect that will never
99+
* drain. Force the process down so supervisor shutdown isn't blocked.
100+
*/
101+
_exit(0);
102+
}
103+
#endif /* !WIN32 */
104+
105+
/* Release the libusb context that nut_libusb_open() initialized.
106+
* Bounded by a SIGALRM (POSIX) so deadlock-prone teardown can't wedge
107+
* process shutdown indefinitely.
108+
*/
109+
static void nut_libusb_cleanup_atexit(void)
110+
{
111+
if (!nut_usb_ctx_initialized || !nut_usb_ctx) {
112+
return;
113+
}
114+
#ifndef WIN32
115+
{
116+
struct sigaction sa, prev;
117+
memset(&sa, 0, sizeof(sa));
118+
sa.sa_handler = nut_libusb_cleanup_alarm;
119+
sigemptyset(&sa.sa_mask);
120+
sigaction(SIGALRM, &sa, &prev);
121+
alarm(2); /* sub-ms on healthy context; 2s is generous */
122+
libusb_exit(nut_usb_ctx);
123+
alarm(0);
124+
sigaction(SIGALRM, &prev, NULL);
125+
}
126+
#else /* WIN32 */
127+
libusb_exit(nut_usb_ctx);
128+
#endif /* !WIN32 */
129+
nut_usb_ctx = NULL;
130+
nut_usb_ctx_initialized = false;
131+
}
132+
58133
static void nut_libusb_close(libusb_device_handle *udev);
59134

60135
/*! Add USB-related driver variables with addvar() and dstate_setinfo().
@@ -291,10 +366,18 @@ static int nut_libusb_open(libusb_device_handle **udevp,
291366
usb_hid_number_opts_parsed = 1;
292367
}
293368

294-
/* libusb base init */
295-
if (libusb_init(NULL) < 0) {
296-
libusb_exit(NULL);
297-
fatal_with_errno(EXIT_FAILURE, "Failed to init libusb 1.0");
369+
/* libusb base init: initialize our explicit context on the first call
370+
* here, and register the matching one-shot teardown via atexit. See the
371+
* comment near the file-scope declaration of nut_usb_ctx for the design
372+
* rationale (avoids the libusb_exit-per-reconnect misuse and the
373+
* teardown-deadlock it enables on some firmware).
374+
*/
375+
if (!nut_usb_ctx_initialized) {
376+
if (libusb_init(&nut_usb_ctx) < 0) {
377+
fatal_with_errno(EXIT_FAILURE, "Failed to init libusb 1.0");
378+
}
379+
nut_usb_ctx_initialized = true;
380+
atexit(nut_libusb_cleanup_atexit);
298381
}
299382

300383
/* TODO: Find a place for this, from Windows branch made for libusb0.c */
@@ -312,7 +395,7 @@ static int nut_libusb_open(libusb_device_handle **udevp,
312395
libusb_close(*udevp);
313396
#endif
314397

315-
devcount = libusb_get_device_list(NULL, &devlist);
398+
devcount = libusb_get_device_list(nut_usb_ctx, &devlist);
316399

317400
/* devcount may be < 0, loop will get skipped;
318401
* its SSIZE_MAX < SIZE_MAX for devnum */
@@ -1166,7 +1249,12 @@ static void nut_libusb_close(libusb_device_handle *udev)
11661249
*/
11671250
/* libusb_release_interface(udev, usb_subdriver.hid_rep_index); */
11681251
libusb_close(udev);
1169-
libusb_exit(NULL);
1252+
/* libusb_exit() is no longer called here. The libusb context is owned
1253+
* by nut_libusb_open() and released exactly once at process shutdown
1254+
* via the atexit handler; calling it per-close was a longstanding
1255+
* misuse that deadlocked on orphaned URBs after a device reset or
1256+
* disconnect (NUT #598). See the comment near nut_usb_ctx above.
1257+
*/
11701258
}
11711259

11721260
usb_communication_subdriver_t usb_subdriver = {

0 commit comments

Comments
 (0)