Skip to content

Commit 6ea5fce

Browse files
committed
drivers/nutdrv_qx.c: qx_command(): reset USB device after persistent LIBUSB_ERROR_OVERFLOW [#598]
LIBUSB_ERROR_OVERFLOW has been handled as a benign, retry-on-next-poll condition (grouped with LIBUSB_ERROR_TIMEOUT and the default case) ever since the original blazer import. For most devices a one-off oversized interrupt-IN frame is indeed transient. But some Cypress USB-serial bridge firmware (VID:PID 0665:5161; Salicru SPS, Ippon, ViewPower and various Voltronic Power UPSes) wedges the endpoint once it overruns: every subsequent interrupt read returns OVERFLOW and the driver spins in the stale-data loop until an external USB-level reset. This is the chronic hang reported in issue #598. Distinguish the two cases with a small consecutive-overflow counter: the first QX_USB_OVERFLOW_RESET_TRIES-1 overflows are retried on the next poll (so genuine transients cost nothing but a skipped cycle), and a sustained run escalates to usb_reset() + reconnect, reusing the recovery path already used for PIPE/ETIME/IO errors. Any clean read resets the counter. Captured on a Salicru SPS 1500 ONE BL: an isolated overflow recovers on the next poll, while a sustained lockup (dozens of consecutive overflows within seconds) is cleared only by the device reset, matching the behaviour of an external `usb_resetter --reset-device`. Signed-off-by: Pedro Cunha <pedroagracio+nut@gmail.com>
1 parent 8574a2e commit 6ea5fce

1 file changed

Lines changed: 33 additions & 1 deletion

File tree

drivers/nutdrv_qx.c

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3881,10 +3881,23 @@ void upsdrv_cleanup(void)
38813881

38823882
/* Generic command processing function: send a command and read a reply.
38833883
* Returns < 0 on error, 0 on timeout and the number of bytes read on success. */
3884+
/* Consecutive LIBUSB_ERROR_OVERFLOW results on the interrupt-IN endpoint
3885+
* tolerated before escalating from "retry on the next poll" to a USB-level
3886+
* device reset. A one-off oversized frame is transient; a sustained run means
3887+
* the bridge firmware has wedged the endpoint and only re-enumeration clears
3888+
* it (e.g. the 0665:5161 Cypress USB-serial family: Salicru SPS, Ippon,
3889+
* ViewPower, various Voltronic Power). See NUT issue #598. */
3890+
#define QX_USB_OVERFLOW_RESET_TRIES 3
3891+
38843892
static ssize_t qx_command(const char *cmd, size_t cmdlen, char *buf, size_t buflen)
38853893
{
38863894
#ifndef TESTING
38873895
ssize_t ret = -1;
3896+
# ifdef QX_USB
3897+
/* Persists across calls; only consecutive overflows accumulate (any clean
3898+
* read zeroes it, see the switch on `ret` below). */
3899+
static int overflow_tries = 0;
3900+
# endif
38883901
#endif
38893902

38903903
/* NOTE: Could not find in which ifdef-ed codepath, but clang complained
@@ -3918,6 +3931,7 @@ static ssize_t qx_command(const char *cmd, size_t cmdlen, char *buf, size_t bufl
39183931
ret = (*subdriver_command)(cmd, cmdlen, buf, buflen);
39193932

39203933
if (ret >= 0) {
3934+
overflow_tries = 0; /* clean read: forget any overflow streak */
39213935
return ret;
39223936
}
39233937

@@ -3979,8 +3993,26 @@ static ssize_t qx_command(const char *cmd, size_t cmdlen, char *buf, size_t bufl
39793993
udev = NULL;
39803994
break;
39813995

3996+
case LIBUSB_ERROR_OVERFLOW: /* Value too large for defined data type:
3997+
* an oversized interrupt-IN frame. A one-off is
3998+
* transient (retry on the next poll); a sustained
3999+
* run means the bridge firmware has wedged the
4000+
* endpoint and only a USB-level reset recovers it.
4001+
* See NUT issue #598. */
4002+
if (++overflow_tries < QX_USB_OVERFLOW_RESET_TRIES) {
4003+
upsdebugx(2, "Got OVERFLOW on EP 0x81 (%d/%d), retrying on next poll",
4004+
overflow_tries, QX_USB_OVERFLOW_RESET_TRIES);
4005+
break;
4006+
}
4007+
upsdebugx(1, "OVERFLOW on EP 0x81 persisted for %d polls; resetting device",
4008+
overflow_tries);
4009+
overflow_tries = 0;
4010+
if (usb_reset(udev) == 0) {
4011+
upsdebugx(1, "Device reset handled");
4012+
}
4013+
goto fallthrough_case_reconnect;
4014+
39824015
case LIBUSB_ERROR_TIMEOUT: /* Connection timed out */
3983-
case LIBUSB_ERROR_OVERFLOW: /* Value too large for defined data type */
39844016
#if EPROTO && WITH_LIBUSB_0_1 /* limit to libusb 0.1 implementation */
39854017
case -EPROTO: /* Protocol error */
39864018
#endif

0 commit comments

Comments
 (0)