Skip to content

Await until next second for block timestamp validation#5037

Open
Frozen wants to merge 9 commits into
devfrom
fix/await-time-to-produce-blocks
Open

Await until next second for block timestamp validation#5037
Frozen wants to merge 9 commits into
devfrom
fix/await-time-to-produce-blocks

Conversation

@Frozen
Copy link
Copy Markdown
Collaborator

@Frozen Frozen commented Apr 21, 2026

This pull request makes several improvements to the block proposing logic in the consensus mechanism, primarily focusing on more precise block timestamp handling and improved sleep timing. The changes ensure that block timestamps are strictly increasing and reflect accurate wall-clock time, especially when using NTP, and refactor the code to pass the current time explicitly.

Block timestamp handling and accuracy:

  • The logic for determining the block timestamp is moved from ProposeNewBlock to the proposer loop (WaitForConsensusReadyV2), ensuring that the block timestamp is always strictly greater than the parent block's timestamp. The code now explicitly sleeps until the next second if necessary.
  • The now time value is passed explicitly to ProposeNewBlock, rather than being determined inside the method, improving testability and clarity. [1] [2] [3]
  • fix ntp servers parsing
  • add random pick of ntp server to get time
  • lower future block tolerance to the 1s, we are not Ethereum with 12s block finality
  • exit if no ntp servers provided
  • update ntp list to the broader set of servers, remove example server from the beevik library
  • block proposal readiness is checked with consensus.registry.Now() guaranteeing that blocks will have only increasing timestamps

@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented Apr 22, 2026

✔️ tested on the devnet nodes, and we don't have this problem anymore, see the screenshot:

image

@mur-me mur-me requested review from GheisMohammadi and mur-me April 22, 2026 15:00
@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented Apr 24, 2026

Hey @Frozen, please fix

harmony/cmd/harmony/main.go

Lines 250 to 270 in ddc96ae

var ntpService ntptime.NTPTime = ntptime.LocalTime{}
// Check NTP and time accuracy
// It skips the time accuracy check on the localnet since all nodes are running on the same machine
switch hc.Network.NetworkType {
case nodeconfig.Localnet:
ntpService = ntptime.LocalTime{}
default:
impl, err := ntptime.TryNew(nodeConfig.NtpServer, 3)
if err != nil {
fmt.Fprintf(os.Stderr, "Error initializing NTP time provider: %v\n", err)
} else {
go impl.Run(context.Background(), time.Minute)
ntpService = impl
}
}
if hc.Network.NetworkType != nodeconfig.Localnet {
clockAccuracyResp, err := ntp.CheckLocalTimeAccurate(nodeConfig.NtpServer)
if !clockAccuracyResp.IsAccurate() {
if clockAccuracyResp.AllNtpServersTimedOut() {
fmt.Fprintf(os.Stderr, "Error: querying NTP servers timed out, Continuing.\n")
} else if clockAccuracyResp.NtpFailed() {
because it is interpreted as string without any slicing into separate dns names.

From the node log:

Error initializing NTP time provider: lookup 1.pool.ntp.org,0.beevik-ntp.pool.ntp.org: no such host

See the

func CheckLocalTimeAccurate(ntpServers string) (ClockStatus, error) {
options := beevik_ntp.QueryOptions{Timeout: 30 * time.Second}
servers := strings.Split(ntpServers, ",")
// store error messages for each failed NTP server
// It doesn't print out any error if querying one of the servers is successful
var errorMessages []string
allServersTimedOut := true
for _, ntpServer := range servers {
response, err := beevik_ntp.QueryWithOptions(strings.TrimSpace(ntpServer), options)
if err != nil {
as example in our codebase

Comment thread cmd/harmony/main.go
@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented Apr 27, 2026

Hey @Frozen after the last commit I've seeing time to time the following error message and a view change after it:

[finalCommit] channel not received after 6s for commitSigAndBitmap

I believe the reason is the following:

  1. NTP provider actually works with multi-server config
  2. registry.Now() uses real ClockOffset
  3. same-second timestamp guard triggers more often
  4. proposer delays before creating receiver
  5. finalCommit sometimes times out sending commitSigAndBitmap

How is look like on the Hooray dashboard, see the 1 on the few validators:
image

@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented May 4, 2026

Hey @Frozen, I've tried to debug the issue on the current leader and found the diff between linux ntp aka local vs golang ntp time aka registry and found that there is a "registryMinusLocal": -4.554831 difference between them.
So I was right that this 2 time sources are different.

{
  "level": "warn",
  "shardID": 1,
  "myBlock": 40411052,
  "myViewID": 40422315,
  "phase": "Commit",
  "mode": "Normal",
  "parentTimeUnix": 1777905324,
  "registryNow": "2026-05-04T09:35:24.910835186-05:00",
  "localNow": "2026-05-04T09:35:24.915390017-05:00",
  "registryMinusLocal": -4.554831,
  "targetTime": "2026-05-04T09:35:25-05:00",
  "sleepByRegistry": 89.164814,
  "sleepByLocalUntil": 84.609983,
  "registryTimestampUnix": 1777905324,
  "retryCount": 0,
  "proposalBlockNum": 40411053,
  "isLeader": true,
  "asyncProposal": true,
  "called": "/home/um/harmony/consensus/consensus_v2.go:628",
  "caller": "/home/um/harmony/consensus/proposer.go:67",
  "time": "2026-05-04T09:35:24.915410385-05:00",
  "message": "[timestamp-guard] waiting for next second before proposing block"
}

Will add more after this comment

Comment thread consensus/proposer.go Outdated
// Current time is within the same second as the parent block.
// Sleep until the next second so the child timestamp is strictly
// greater, satisfying the engine.go validation check.
time.Sleep(time.Until(time.Unix(parentTime+1, 0)))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are mixing 2 time sources together.

And my debug got the following:

registryNow = 09:35:24.999764573
localNow    = 09:35:25.004318402
targetTime  = 09:35:25.000000000

Meaning:

consensus clock says: 09:35:24.999764573
local clock says:     09:35:25.004318402
target is:            09:35:25.000000000

So at the exact same moment:

registry.Now() says: not yet, wait 0.235 ms
time.Now() says:     already passed, no need to wait

On the last retry, this happened:

  1. retryCount = 2
  2. registry clock: still too early
  3. local clock: already past target
  4. sleep: 0ms
  5. continue
  6. loop ends
  7. the code never reached the proposal path

mur-me and others added 2 commits May 4, 2026 20:09
Block proposal readiness is checked with consensus.registry.Now(), but the
sleep duration was previously calculated with time.Until(), which uses raw
local time.Now().

When local time is slightly ahead of the registry/NTP-adjusted clock, the
sleep can resolve to zero or negative near the next-second boundary while the
registry clock is still within the parent block timestamp second. This can burn
proposal retries and skip proposal setup.

Compute the sleep duration against consensus.registry.Now() instead, then
refresh the timestamp after sleeping before continuing.
@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented May 5, 2026

So after test screw in both ways - -2 minutes and +2 minutes node is still validating and producing blocks using golang NTP time ✔️

@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented May 5, 2026

@Frozen, please:

  1. remove c5d782a because this is not working + overhead

  2. Additionally, please add more ntp servers here: https://github.com/harmony-one/harmony/blob/dev/cmd/config/default.go#L146-L148
    Why?

  • remove 0.beevik-ntp.pool.ntp.org is just a demo for the library
  • add time.cloudflare.com:

Today, Cloudflare announces its free time service to anyone on the Internet. We intend to solve the limitations with the existing public time services, in particular by increasing availability, robustness and security. Source: https://blog.cloudflare.com/secure-time/#use-it

@Frozen
Copy link
Copy Markdown
Collaborator Author

Frozen commented May 7, 2026

removed 0.beevik-ntp.pool.ntp.org

…eevik-ntp.pool.ntp.org ntp - this one can be used just for library demo
@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented May 7, 2026

@Frozen, I've amend commit message and force pushed, please update your local 🙏

@mur-me mur-me force-pushed the fix/await-time-to-produce-blocks branch from fc6c0b5 to 3ea9944 Compare May 7, 2026 07:55
@mur-me mur-me self-requested a review May 7, 2026 10:46
@mur-me
Copy link
Copy Markdown
Collaborator

mur-me commented May 7, 2026

I've updated the PR description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants