Skip to content

ircevent: add MaxMsgByteLen#114

Open
lollipopman wants to merge 5 commits into
ergochat:masterfrom
lollipopman:preamble-length
Open

ircevent: add MaxMsgByteLen#114
lollipopman wants to merge 5 commits into
ergochat:masterfrom
lollipopman:preamble-length

Conversation

@lollipopman
Copy link
Copy Markdown

Add a function to calculate the maximum message size in bytes based on the nick, userhost, command, and target of the message to be sent.

This is useful when sending a long message that may need to be wrapped into multiple messages, in order to be sent.

The implementation was inspired by Catgirl's and Konversation's.

@slingamn
Copy link
Copy Markdown
Member

Thanks, this looks useful! I need to carefully consider potential edge cases here. Here's the corresponding code in Ergo:

https://github.com/ergochat/ergo/blob/5bb94efaf6b865f5e52404bf9b9dfc3d638369d7/irc/handlers.go#L2195

and here's a bug we had historically: ergochat/ergo#2170

Comment thread ircevent/irc.go Outdated
Comment on lines +512 to +517
select {
case <-irc.userHostChan:
return irc.userHost, nil
case <-timer.C:
return "", ServerTimedOut
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a soft deadlock if UserHost or MaxMsgByteLen is called from a callback, right? Since processing of the RPL_USERHOST response cannot proceed until the current callback finishes, so we'll wait until timer.C on the first invocation.

I'm wondering if instead it might make sense to expose only MaxMsgByteLen, and if the userhost is not (yet) available return a safe fallback value (350 is a common conservative default).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, I hadn't considered calling it from another callback. What I was trying to solve for is how to get the current max value without polling. Userhost could be private, but would MaxMsgByteLen still call that function?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The querying for USERHOST could also be done in performHandshake(), but I'm not sure if that is desirable.

@lollipopman
Copy link
Copy Markdown
Author

Thanks, this looks useful! I need to carefully consider potential edge cases here.

Thanks for looking at it, I'm sure there are some edge cases. I only have novice's understanding of the IRC protocol. The IRCv3 draft for multiline adds a 10 byte safety buffer, but I left that out as I thought we could iterate once people try it against other servers.

I also considered changes to the userhost, such as someone adding obtaining cloak. But, I wasn't sure there was a way to receive those updates, outside of polling, which is what Konversation does on every ping pong.

@slingamn
Copy link
Copy Markdown
Member

This is covered by CHGHOST but it's not that widely supported.

Add a function to calculate the maximum message size in bytes based on
the nick, userhost, command, and target of the message to be sent.

This is useful when sending a long message that may need to be wrapped
into multiple messages, in order to be sent.

The implementation was inspired by Catgirl's and Konversation's.
@lollipopman
Copy link
Copy Markdown
Author

@slingamn I pushed a new version based on your feedback. Only MaxMsgByteLen() is exposed and the function now returns a sensible value, even if we have not yet received a reply to our UESRHOST query. Would love your thoughts when you have the time.

@lollipopman
Copy link
Copy Markdown
Author

I also double checked that the accounting is the same as ergo's

@slingamn
Copy link
Copy Markdown
Member

Just checking in: I'm still looking at this issue. I did a refactoring pass on this implementation, but I'm also talking to the IRCv3 community about other possibilities for a fix. (The continued existence of this problem is kind of an embarrassment for IRC as a protocol --- it's really not great that there's no good way to tell how long of a message you can send.)

@lollipopman
Copy link
Copy Markdown
Author

Thanks for the update @slingamn and thanks for reviewing my latest rev. I'm happy to iterate, if there are parts you don't like. I agree that the need to calculate the max byte length is a bit of a wart on the IRC protocol and it would be great if it was solved in a more elegant manner as part of the IRCv3 work. However, I do think a version that works for existing servers in the wild has value as an interim step. Anecdotally, I started working on this after our incident bot choked on a long line, \o/.

@slingamn
Copy link
Copy Markdown
Member

I'm curious, what was the failure mode you saw with your bot? Did the line just get dropped?

One of the reasons I'm being cautious here is that computing the exact length limit is always going to be somewhat fallible (the server can unilaterally change the userhost at any time, and even with a mechanism like CHGHOST for getting updates, there is still a race window). So 350 bytes is generally a safe default for most client implementations, and the closer we get to the exact byte limit, the more chance there is of something going wrong.

@slingamn
Copy link
Copy Markdown
Member

I pushed a round of changes. Two things I'm thinking about:

  1. In terms of overall reliability, how does this compare to just advising people to choose a 350-byte or 400-byte limit (which will be wrong in rare cases, but will leave a comfortable safety margin in typical cases --- whereas with this exact computation strategy, any change from the server side will leave the bot in a broken state)
  2. Would it make sense to trigger the USERHOST query lazily the first time MaxMsgByteLen is called? (Before that, it would just return a conservative fallback value. With the current eager query behavior, there is still a window where the fallback value is returned, but it is short.)

@lollipopman
Copy link
Copy Markdown
Author

I'm curious, what was the failure mode you saw with your bot? Did the line just get dropped?

The line was dropped, because I didn't want truncated lines.

One of the reasons I'm being cautious here is that computing the exact length limit is always going to be somewhat fallible (the server can unilaterally change the userhost at any time, and even with a mechanism like CHGHOST for getting updates, there is still a race window). So 350 bytes is generally a safe default for most client implementations, and the closer we get to the exact byte limit, the more chance there is of something going wrong.

True, but I think we can adjust if people see failures. Given that Konversation and catgirl use a similar algorithm, I would assume it is pretty reliable?

@lollipopman
Copy link
Copy Markdown
Author

lollipopman commented Apr 20, 2026

I pushed a round of changes. Two things I'm thinking about:

Thanks, questions:

  1. Why did you drop the target from MaxMsgByteLen?
  2. Why did you drop the ISupport calls for defaults?
  3. Any reason you switched away from the fmt.Sprintf method for calculating the length?
  1. In terms of overall reliability, how does this compare to just advising people to choose a 350-byte or 400-byte limit (which will be wrong in rare cases, but will leave a comfortable safety margin in typical cases --- whereas with this exact computation strategy, any change from the server side will leave the bot in a broken state)

I'm not sure, but as I said, it seems to be a common practice, so I would assume it is pretty robust. Also, since we are only exposing MaxMsgByteLen we could switch to a fixed
limit if need be.

  1. Would it make sense to trigger the USERHOST query lazily the first time MaxMsgByteLen is called? (Before that, it would just return a conservative fallback value. With the current eager query behavior, there is still a window where the fallback value is returned, but it is short.)

That would guarantee that you never receive the max length on the first call to MaxMsgByteLen. I would rather keep the racy behavior, since then in practice you will be more likely to get the max value on the first call.

@slingamn
Copy link
Copy Markdown
Member

  1. I didn't drop target, I dropped the command (privmsg vs notice), which costs at most a byte (applying the PRIVMSG limit to NOTICE), and makes it somewhat less error-prone to use
  2. AFAIK these ISUPPORT parameters are not widely supported (Ergo, Unreal, and Solanum do not publish them, although Insp does)
  3. fmt.Sprintf allocates memory, it seems better to just do the math

re.

Given that Konversation and catgirl use a similar algorithm, I would assume it is pretty reliable?

I would not assume this, most IRC software exists in a state of delicate semi-brokenness :-)

@slingamn
Copy link
Copy Markdown
Member

Libera's help says:

If you use USERHOST on yourself, the hostname
is replaced with the IP you are connecting from.
This is needed to provide DCC support for spoofed
hostnames.

:-(

@slingamn
Copy link
Copy Markdown
Member

Running WHO on yourself wouldn't have this problem, however I was advised not to send an unconditional self-WHO on connect because servers will rate-limit or ration WHO for performance reasons.

@lollipopman
Copy link
Copy Markdown
Author

Running WHO on yourself wouldn't have this problem, however I was advised not to send an unconditional self-WHO on connect because servers will rate-limit or ration WHO for performance reasons.

ugh, Konversation made a similar decision, they issue a WHO request, rather than a USERHOST request.

@slingamn
Copy link
Copy Markdown
Member

I think a reasonable compromise would be:

  1. Use WHO instead of USERHOST
  2. Make it lazy by default, but add a boolean config to make it eager
  3. If the boolean config is enabled, add chghost to the requested caps and add a CHGHOST handler

@slingamn
Copy link
Copy Markdown
Member

(I can work on this later)

* Use WHO / RPL_WHOREPLY instead of USERHOST
* Make the WHO request only if needed
* Prepare for a future cap that will send the userhost without a request
* Handle CHGHOST
@slingamn
Copy link
Copy Markdown
Member

This version implements the proposal at ircv3/ircv3-specifications#603

@lollipopman
Copy link
Copy Markdown
Author

This version implements the proposal at ircv3/ircv3-specifications#603

Looks good, thanks for working on this.

Comment thread ircevent/irc_callback.go Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants