Skip to content

Remove doomed attempts to reforward pinned requests#2449

Closed
rousskov wants to merge 2 commits into
squid-cache:masterfrom
measurement-factory:OSD-7-survive-ftp-Relay-sendCommand-throw
Closed

Remove doomed attempts to reforward pinned requests#2449
rousskov wants to merge 2 commits into
squid-cache:masterfrom
measurement-factory:OSD-7-survive-ftp-Relay-sendCommand-throw

Conversation

@rousskov

Copy link
Copy Markdown
Contributor
AsyncJob.cc(100) mustStop: Ftp::Relay will stop, reason: exception
...
FATAL: check failed: !request->pinnedConnection()
    exception location: FwdState.cc(1119) connectStart

Since February 2019 commit 3dde9e5, retries of "pinned" requests1
using a new Squid-to-server connections are meant for FTP and
connection-based authentication traffic only. Back then, we agreed that
such re-pinning is wrong2, and different mechanisms should be used to
handle forwarding errors in those cases, but we preserved the
corresponding functionality in fear of breaking "working" cases3. On
that preserved code path, request->pinnedConnection() is true at
pinnedCanRetry() check time.

Since September 2019 commit daf8070, connectStart() throws if
request->pinnedConnection() is true, effectively breaking retries for
nearly all preserved cases mentioned above! The only pinned retries that
could hypothetically continue working after that commit are those where
ConnStateData job had been destroyed between pinnedCanRetry() and
connectStart(), and those rare cases are not interesting since
ConnStateData destruction ought to abort request forwarding anyway.

Both 2019 commits are present in v5.0.1 and beyond. We are not aware of
any associated problems. Thus, our fears were excessive, and we can
remove code that preserves problematic functionality. If we discover a
need to retry pinned FTP or connection-authenticated requests via a new
connection, we will add mechanisms to support that functionality.

Removing this code also helps avoid FATAL errors observed during recent
FTP work.

CONNECT tunnels

Existing tunnel.cc code ignores request->flags.pinned when deciding
whether to retry a failed tunneling attempt. This change does not affect
the corresponding "check pinned connections" TODO in that tunneling
code. FWIW, that TODO is limited to tunnelStart(); switchToTunnel()
cases already do not retry due to a "committed to server" retry ban on
their code path. It is likely that pinned retries should be disabled in
tunneled cases as well, but that adjustment deserves a dedicated
analysis/change.

Footnotes

  1. Here, a "pinned" request is, roughly speaking, a request sent by
    Squid over a Squid-to-server connection that was kept/provided by a
    client-to-Squid connection manager (ConnStateData). See usePinned().

  2. "automatically re-opening a PINNED connection from the middleware
    is invalid in todays networks" at
    https://github.com/squid-cache/squid/pull/351#discussion_r250015937

  3. "keep re-pinning as is" at
    https://github.com/squid-cache/squid/pull/351#discussion_r253757986

rousskov added 2 commits June 22, 2026 15:41
... to better tolerate Ftp::Relay::sendCommand() and similar exceptions.

    ERROR: Squid BUG: assurance failed: ...
        exception location: FtpClient.cc(835) writeCommand
    AsyncJob.cc(145) callException: assurance failed ...
    AsyncJob.cc(100) mustStop: Ftp::Relay will stop, reason: exception
    ...
    FATAL: check failed: !request->pinnedConnection()
        exception location: FwdState.cc(1119) connectStart

When branch-added Ftp::Client::writeCommand() assertions throw, the
worker dies instead of just killing the affected ftp_port transaction.

`JobDialer::dial()` catches the BUG exception, as expected, but the
worker still dies because `Client::swanSong()` cleanup code calls
`FwdState::handleUnregisteredServerEnd()` which attempts to retry the
failed FTP transaction despite true `request->pinnedConnection()`, which
triggers another, this time uncaught, exception in `connectStart()`.

Also documented suspicions that `FwdState::pinnedCanRetry()` should
_always_ return false and, hence, should be removed (instead of adding
more and more code/conditions when that method returns false).

----

Cherry-picked osd-7-sanitize-ftp-command commit becd5b4 that was
reverted on that branch, so that we can use a dedicated branch to finish
this work.
Since February 2019 commit 3dde9e5, retries of "pinned" requests[^1]
using a new Squid-to-server connections are meant for FTP and
connection-authentication traffic only. Back then, we agreed that such
re-pinning is wrong[^2], and different mechanisms should be used to
handle forwarding errors in those cases, but we preserved the
corresponding functionality in fear of breaking "working" cases[^3]. On
that preserved code path, `request->pinnedConnection()` is true at
pinnedCanRetry() check time.

Since September 2019 commit daf8070, connectStart() throws if
`request->pinnedConnection()` is true, effectively breaking retries for
nearly all preserved cases mentioned above. The only pinned retries that
could hypothetically continue working after that commit are those where
ConnStateData job had been destroyed between pinnedCanRetry() and
connectStart(), and those rare cases are not interesting since
ConnStateData destruction ought to abort request forwarding anyway.

Both 2019 commits are present in v5.0.1 and beyond. We are not aware of
any associated problems. Thus, our fears were excessive, and we can
remove code that preserves problematic functionality. If we discover a
serious need to retry FTP or connection-authenticated requests via a new
connection, we will add proper mechanisms to support that functionality.

Removing this code also helps avoid FATAL errors observed during recent
FTP work.

CONNECT tunnels
---------------

Existing tunnel.cc code ignores `request->flags.pinned` when deciding
whether to retry a failed tunneling attempt. This change does not affect
the corresponding "check pinned connections" TODO in that tunneling
code. FWIW, that TODO is limited to tunnelStart(); switchToTunnel()
cases already do not retry due to a "committed to server" retry ban on
their code path. It is likely that pinned retries should be disabled in
tunneled cases as well, but that adjustment deserves a dedicated
analysis/change.

[^1]: Here, a "pinned" request is, roughly speaking, a request sent by
Squid over a Squid-to-server connection that was kept/provided by a
client-to-Squid connection manager (ConnStateData). See usePinned().

[^2]: "automatically re-opening a PINNED connection from the middleware
is invalid in todays networks" at
squid-cache#351 (comment)

[^3]: "keep re-pinning as is" at
squid-cache#351 (comment)

@rousskov rousskov left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR stems from recent commit 51d1ddb work, but it is not required for that commit changes to function properly: That commit does not cause the FATAL error quoted in this PR description. That error was observed in an earlier variation of that code.

Said that, I suspect that similar errors will sooner or later resurface as more code throws on errors. And even if that never happens, it is a good idea to remove this not-working and complex/dangerous code anyway.

Disclaimer: I tested FTP but not connection-based authentication cases. Removed code does not distinguishes the two categories though -- it ought to hit the same !request->pinnedConnection() assertion in FwdState::connectStart() AFAICT.

P.S. We are working on one more 51d1ddb followup PR.

@rousskov rousskov added S-waiting-for-more-reviewers needs a reviewer and/or a second opinion S-could-use-an-approval An approval may speed this PR merger (but is not required) labels Jun 22, 2026

@yadij yadij left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hitting "wall of text" issues reading the PR description, so doubtful it will be as helpful as intended for others less knowing of the basics involved.

The logic however is good. So approving anyway.

@yadij yadij added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Jun 24, 2026
@yadij

yadij commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@rousskov I have not marked this for v7 due to the "surprise" behaviour change it might cause. However, if those FTP changes you refer to are to go back for any reason, then please mark this one for backport as well.

@yadij yadij removed the M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels label Jun 24, 2026
@yadij

yadij commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Oops, unmarking the merge due to Anubis issue. Re-mark later when appropriate.

@kinkie kinkie left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kinkie

kinkie commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Github / Anubis integration issue seems solved. Remarking for merge

@kinkie kinkie added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-could-use-an-approval An approval may speed this PR merger (but is not required) labels Jun 24, 2026
squid-anubis pushed a commit that referenced this pull request Jun 24, 2026
    AsyncJob.cc(100) mustStop: Ftp::Relay will stop, reason: exception
    ...
    FATAL: check failed: !request->pinnedConnection()
        exception location: FwdState.cc(1119) connectStart

Since February 2019 commit 3dde9e5, retries of "pinned" requests[^1]
using a new Squid-to-server connections are meant for FTP and
connection-based authentication traffic only. Back then, we agreed that
such re-pinning is wrong[^2], and different mechanisms should be used to
handle forwarding errors in those cases, but we preserved the
corresponding functionality in fear of breaking "working" cases[^3]. On
that preserved code path, `request->pinnedConnection()` is true at
pinnedCanRetry() check time.

Since September 2019 commit daf8070, connectStart() throws if
`request->pinnedConnection()` is true, effectively breaking retries for
nearly all preserved cases mentioned above! The only pinned retries that
could hypothetically continue working after that commit are those where
ConnStateData job had been destroyed between pinnedCanRetry() and
connectStart(), and those rare cases are not interesting since
ConnStateData destruction ought to abort request forwarding anyway.

Both 2019 commits are present in v5.0.1 and beyond. We are not aware of
any associated problems. Thus, our fears were excessive, and we can
remove code that preserves problematic functionality. If we discover a
need to retry pinned FTP or connection-authenticated requests via a new
connection, we will add mechanisms to support that functionality.

Removing this code also helps avoid FATAL errors observed during recent
FTP work.

CONNECT tunnels
---------------

Existing tunnel.cc code ignores `request->flags.pinned` when deciding
whether to retry a failed tunneling attempt. This change does not affect
the corresponding "check pinned connections" TODO in that tunneling
code. FWIW, that TODO is limited to tunnelStart(); switchToTunnel()
cases already do not retry due to a "committed to server" retry ban on
their code path. It is likely that pinned retries should be disabled in
tunneled cases as well, but that adjustment deserves a dedicated
analysis/change.

[^1]: Here, a "pinned" request is, roughly speaking, a request sent by
Squid over a Squid-to-server connection that was kept/provided by a
client-to-Squid connection manager (ConnStateData). See usePinned().

[^2]: "automatically re-opening a PINNED connection from the middleware
is invalid in todays networks" at
#351 (comment)

[^3]: "keep re-pinning as is" at
#351 (comment)
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Jun 24, 2026
@kinkie

kinkie commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@rousskov highlighting that the decision whether to backport to v7 is in your hands

@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels Jun 24, 2026
@rousskov

rousskov commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Amos: @rousskov I have not marked this for v7 due to the "surprise" behaviour change it might cause. However, if those FTP changes you refer to are to go back for any reason, then please mark this one for backport as well.

The temporary FTP changes I was referring to made it trivial to trigger the bug this PR fixes. I speculate that it might also be possible to trigger the same bug in existing code, and that we will add more triggers in the foreseeable future -- any HTTP or FTP code that terminates a forwarding attempt on a previously pinned connection before sending the request is a prime candidate.

Francesco: @rousskov highlighting that the decision whether to backport to v7 is in your hands

I recommend backporting this fix, but this is a weak recommendation:

  • I am not aware of any specific use cases that can trigger the fixed bug today.
  • I do not know what "behaviour change" Amos is talking about in his comment (quoted above).
  • I have not tested connection-based authentication -- one of the two primary cases that removed code was supposed to help with.

If you can confirm that connection-based authentication continues to work as expected, I would be more confident in recommending a backport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

M-merged https://github.com/measurement-factory/anubis#pull-request-labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants