Skip to content

Flexible merge ranges #20878

@wmitsuda

Description

@wmitsuda

Consider a node with collated files up to step 2047 (it doesn't matter the step size). it should have the following file ranges:

[0, 1024), [1024, 1536), [1536, 1792), [1792, 1920), [1920, 1984), [1984, 2016), [2016, 2032), [2032, 2040), [2040, 2044), [2044, 2046), [2046, 2047)

then we apply a transformation of stepsize/4 (1 step becomes 4):

[0, 4096), [4096, 6144), [6144, 7168), [7168, 7680), [7680, 7936), [7936, 8064), [8064, 8128), [8128, 8160), [8160, 8176), [8176, 8184), [8184, 8188)

now apply a second transformation stepsize/1000 (1 step becomes 1000):

[0, 1024000), [1024000, 1536000), [1536000, 1792000), [1792000, 1920000), [1920000, 1984000), [1984000, 2016000), [2016000, 2032000), [2032000, 2040000), [2040000, 2044000), [2044000, 2046000), [2046000, 2047000)

💡 then add a couple of steps, it should evolve like this:

F0 = [0, 1024000), [1024000, 1536000), [1536000, 1792000), [1792000, 1920000),
       [1920000, 1984000), [1984000, 2016000), [2016000, 2032000), [2032000, 2040000),
       [2040000, 2044000), [2044000, 2046000), [2046000, 2047000)

  First check: does any existing file want to merge? For each, calculateMergeStartTxNum(endTxNum, 1, ∞) = endTxNum − rightmost_bit(endTxNum), e.g. 2047000 − 8 = 2046992 ≥ 2046000 →
  all skipped. So F0 is stable.

  Now adding [N-1, N) for N = 2_047_001 … 2_047_024, span = N & -N:

  N=2_047_001  span=1    F0, [2047000, 2047001)
  N=2_047_002  span=2    F0, [2047000, 2047002)
  N=2_047_003  span=1    F0, [2047000, 2047002), [2047002, 2047003)
  N=2_047_004  span=4    F0, [2047000, 2047004)
  N=2_047_005  span=1    F0, [2047000, 2047004), [2047004, 2047005)
  N=2_047_006  span=2    F0, [2047000, 2047004), [2047004, 2047006)
  N=2_047_007  span=1    F0, [2047000, 2047004), [2047004, 2047006), [2047006, 2047007)
  N=2_047_008  span=32*  F0, [2047000, 2047008)
  N=2_047_009  span=1    F0, [2047000, 2047008), [2047008, 2047009)
  N=2_047_010  span=2    F0, [2047000, 2047008), [2047008, 2047010)
  N=2_047_011  span=1    F0, [2047000, 2047008), [2047008, 2047010), [2047010, 2047011)
  N=2_047_012  span=4    F0, [2047000, 2047008), [2047008, 2047012)
  N=2_047_013  span=1    F0, [2047000, 2047008), [2047008, 2047012), [2047012, 2047013)
  N=2_047_014  span=2    F0, [2047000, 2047008), [2047008, 2047012), [2047012, 2047014)
  N=2_047_015  span=1    F0, [2047000, 2047008), [2047008, 2047012), [2047012, 2047014), [2047014, 2047015)
  N=2_047_016  span=8    F0, [2047000, 2047008), [2047008, 2047016)
  N=2_047_017  span=1    F0, [2047000, 2047008), [2047008, 2047016), [2047016, 2047017)
  N=2_047_018  span=2    F0, [2047000, 2047008), [2047008, 2047016), [2047016, 2047018)
  N=2_047_019  span=1    F0, [2047000, 2047008), [2047008, 2047016), [2047016, 2047018), [2047018, 2047019)
  N=2_047_020  span=4    F0, [2047000, 2047008), [2047008, 2047016), [2047016, 2047020)
  N=2_047_021  span=1    F0, [2047000, 2047008), [2047008, 2047016), [2047016, 2047020), [2047020, 2047021)
  N=2_047_022  span=2    F0, [2047000, 2047008), [2047008, 2047016), [2047016, 2047020), [2047020, 2047022)
  N=2_047_023  span=1    F0, [2047000, 2047008), [2047008, 2047016), [2047016, 2047020), [2047020, 2047022), [2047022, 2047023)
  N=2_047_024  span=16   F0, [2047000, 2047008), [2047008, 2047024)

when it reaches N 2048000:

F0   = [0, 1024000), [1024000, 1536000), [1536000, 1792000), [1792000, 1920000),
         [1920000, 1984000), [1984000, 2016000), [2016000, 2032000),
         [2032000, 2040000), [2040000, 2044000), [2044000, 2046000), [2046000, 2047000)

  Sub  = [2047000, 2047488), [2047488, 2047744), [2047744, 2047872), [2047872, 2047936),
         [2047936, 2047968), [2047968, 2047984), [2047984, 2047992), [2047992, 2047996)

  P9   = [0, 1024000), [1024000, 1536000), [1536000, 1792000), [1792000, 1920000),
         [1920000, 1984000), [1984000, 2016000), [2016000, 2032000)

  (Sub is what the sub-tree past 2047000 has collapsed to by N = 2_047_996. The size-512 chunk [2047000, 2047488) was created at N = 2_047_488 by the pathology-clipped merge;
  everything else past it is the natural binary expansion.)

  Three before, the big point, three after:

  N=2_047_997  span=1   F0, Sub, [2047996, 2047997)
  N=2_047_998  span=2   F0, Sub, [2047996, 2047998)
  N=2_047_999  span=1   F0, Sub, [2047996, 2047998), [2047998, 2047999)
  N=2_048_000  span=16384 (clipped to 16000 by [2016000, 2032000) straddle)
                         P9, [2032000, 2048000)
  N=2_048_001  span=1   P9, [2032000, 2048000), [2048000, 2048001)
  N=2_048_002  span=2   P9, [2032000, 2048000), [2048000, 2048002)
  N=2_048_003  span=1   P9, [2032000, 2048000), [2048000, 2048002), [2048002, 2048003)

  The N = 2_048_000 row shows the swallow: the last four F0 files ([2032000, 2040000), [2040000, 2044000), [2044000, 2046000), [2046000, 2047000)) plus all 9 ranges of Sub plus the
  trailing [2047996, 2047998), [2047998, 2047999), [2047999, 2048000) — 16 ranges total — all collapse into the single new [2032000, 2048000). State count goes 22 → 8.

the current implementation assumes every range boundary is tied to N^2, which is leading to incorrect file merges after applying stepsize/N, where N is NOT power of 2.

this issue proposes a flexible boundary, where merges swallows ranges back as long as its boundary fits into ( >= ) span. in the previous example, that is represented by the ranges >= 2047000 being merged only when it reached step 2048000, whose span of 16384 made it swallow the range back to [2032000, -), but not the [2016000, 2032000) because it is partially contained into the merge range.

that should lead to a little unbalanced merge ranges long immediately, but it should allow us to keep reducing the stepsize without reexecuting everything from genesis.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions