02:14:55.803009 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 300: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 286)
0.0.0.0.68 > 255.255.255.255.67: [no cksum] BOOTP/DHCP, Request from 3a:30:25:24:fe:7a, length 258, xid 0xcaabede, Flags [Broadcast] (0x8000)
02:14:55.805299 b4:fb:e4:20:ca:6b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0xc0, ttl 64, id 1029, offset 0, flags [none], proto UDP (17), length 328)
172.29.7.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0xcaabede, Flags [Broadcast] (0x8000)
02:14:55.817799 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 307: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 293)
0.0.0.0.68 > 255.255.255.255.67: [no cksum] BOOTP/DHCP, Request from 3a:30:25:24:fe:7a, length 265, xid 0xcaabede, Flags [Broadcast] (0x8000)
02:14:56.269416 b4:fb:e4:20:ca:6b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 367: (tos 0xc0, ttl 64, id 1034, offset 0, flags [none], proto UDP (17), length 353)
172.29.7.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 325, xid 0xcaabede, Flags [Broadcast] (0x8000)
FreeRTOS_AddEndPoint: MAC: fe-7a IPv4: c0a801f8ip
ip_thread_start
ip_thread_entry starting, thread ID is 0x1
prvIPTask started
vIPSetDHCP_RATimerEnableState: Off
prvCloseDHCPSocket[fe-7a]: closed, user count 0
vDHCPProcessEndPoint: enter 0
DHCP-socket[fe-7a]: DHCP Socket Create
prvCreateDHCPSocket[fe-7a]: open, user count 1
prvInitialiseDHCP: start after 25 ticks
vDHCP_RATimerReload: 25
vDHCPProcessEndPoint: exit 1
vDHCPProcessEndPoint: enter 1
vDHCPProcess: discover
vDHCPProcessEndPoint: exit 2
vDHCPProcessEndPoint: enter 2
vDHCPProcess: discover
vDHCPProcess: timeout 1000 ticks
vDHCPProcess: offer ac1d0761ip for MAC address fe-7a
vDHCPProcess: reply ac1d0761ip
vDHCPProcessEndPoint: exit 3
vDHCPProcessEndPoint: enter 3
vDHCPProcess: offer ac1d0761ip for MAC address fe-7a
vDHCPProcess: acked ac1d0761ip
prvCloseDHCPSocket[fe-7a]: closed, user count 0
vDHCP_RATimerReload: 4320000
vDHCPProcessEndPoint: exit 5
Describe the bug
FreeRTOS-Plus-TCP (tested with v4.2.0, but v4.3, which I'll cite herein, looks from inspection to behave identically here) fails to trigger an ARP request for its gateway when sending UDPv4 datagrams to off-segment addresses. As a result, the system is effectively unable to send such UDP packets until, AFAICT, either
Specifically, assuming I understand the packet traces and debug logs below and am reading the code correctly, the UDP/IPv4 egress path of course queries the ARP table and that has handling for gateways, but when a gateway is needed, the UDPv4 code, despite the comment naming the correct variable, queries for the same address as the gateway handling did and so takes the wrong path rather than the one that generates an ARP query.
Incidentally, there's code in the IP packet ingress handling path to refresh the ARP table or trigger an ARP request, but... it's disabled for UDP packets, which DHCP uses. Perhaps it should be disabled for {multi,broad}cast packets instead, and allow unicast (UDP and otherwise) packets to trigger it? This would have masked the above issue in my case, and in many common cases, because often the DHCP server and the default gateway are one and the same. I'm not sure if that's an argument for or against adopting this behavior!
Target
The curious are welcome to see the CHERIoT network stack interfaces to FreeRTOS-Plus-TCP, but I do not believe that our interface code differs significantly for the purposes of this bug from any other application.
Host
To Reproduce
Run a FreeRTOS-Plus-TCP application that performs DHCP and attempts to send a UDP packet (perhaps specifically not DNS).
Expected behavior
I expect FreeRTOS-Plus-TCP to either
sendto.Wireshark logs
Here's an example application running on Sonata and trying to do DHCP followed by SNTP. We can see that there's a while a startup where the system won't generate UDP packets destined for off-segment addresses (via the gateway).
The system initializes and performs DHCP successfully:
The system logs this startup and DHCP state machine traversal thus:
Immediately thereafter, the system does a gratuitous ARP announcement as duplicate check:
No response is received, as no duplicate exists on the network.
The application now attempts to send a UDP packet (SNTP, specifically, using coreSNTP). No packets, UDP or ARP or otherwise, are emitted and the system logs the following:
Note in particular that the 2nd
FindEndPointOnNetMaskcall is for the same, off-segment address as the first! (Some additional instrumentation shows that thesendtohas returned the expected48, indicating success, which is a bit rude.) coreSNTP eventually times out and reports failure to the application, which goes to sleep (before retrying).While the application is asleep, the gateway sends an ARP request to refresh its cache entry for the system, and the system responds:
The system logs this and the insertion into its ARP cache:
This happens only because the gateway and the DHCP server are one and the same. Were the gateway a different node, it might never issue an ARP request for the system.
The application wakes up and retries SNTP. At this point, there is a hit in the ARP cache and a packet is sent:
The system logs: