[Registry] Switch `{:duplicate, :key}` key_ets to ordered_set with composite keys by studzien · Pull Request #15304 · elixir-lang/elixir

studzien · 2026-04-23T13:26:55Z

Motivation

With {:duplicate, :key} partitioning, all entries for a given key land in a single partition's duplicate_bag ETS table. At high churn rates (many processes joining/leaving the same key), unregister becomes a bottleneck — :ets.match_delete/2 on a duplicate_bag must scan all entries for that key to find the ones belonging to the leaving process. This causes the partition GenServer's message queue to grow unboundedly.

We hit this in production with Phoenix.PubSub (backed by Registry) for livestreams: a single key with ~150k subscribers on one node (~800k across a 6-node cluster) at ~1k joins and ~1k leaves per second per node (~6k each cluster-wide) was enough to cause the partition GenServer's message queue to run away. This change eliminates the issue.

Approach

Replace the duplicate_bag with an ordered_set using composite keys {key, pid, counter}. Since ordered_set sorts entries, all entries for a {key, pid} prefix are adjacent in the tree, so:

Unregister can seek directly to {key, pid, _} instead of scanning all entries for that key — O(log N) vs O(N).
Lookup uses ets:select/2 with a bound key prefix, which leverages the tree ordering — O(log N + m) seek + scan of matching entries.

Benchmark summary

With multiple partitions, {:duplicate, :key} outperforms {:duplicate, :pid} across all operations in both the "many keys" and "hot key" scenarios. The most dramatic difference is unregister on a hot key (200k subscribers): 146–1172x faster because the ordered_set can seek directly to {key, pid, _} instead of scanning 200k entries.

With a single partition and many keys, {:duplicate, :pid} wins lookup/dispatch by ~4-5x due to the direct hash lookup (lookup_element) vs ordered_set match spec — but this advantage disappears with multiple partitions because :pid must fan out across all of them.

For the hot-key scenario with a single partition, lookup/dispatch throughput is comparable between the two at the ETS level, but the duplicate_bag exhibits higher tail latency (~30% deviation and 2x worse p99 vs ~9% for the ordered_set).

Full benchmark results and the benchmark script are in the comments below.

Caveats

No parallel dispatch: parallel: true in dispatch/4 is silently ignored for {:duplicate, :key} — all entries for a given key live in one partition, so there is nothing to parallelize across. This is inherent to key-based partitioning, not a limitation of the ordered_set change. In practice, even without parallelization, :key performs better on dispatch for hot keys because :pid partitioning must fan out across all partitions and merge results, which adds more overhead than it saves through parallelism.
:"$_" behavior in select/2 and count_select/2: The internal ETS representation changes from {key, {pid, value}} to {{key, pid, counter}, value}. Match specs that reference :"$_" (the whole object) will see the new shape. This is documented in the start_link/1 docs with guidance to use named match variables (:"$1", :"$2", etc.) instead. This could be improved with a runtime error when :"$_" is used in select specs for {:duplicate, :key} registries, but that is left for a follow-up.
Reserved-atom keys (e.g., :_, :"$1", :"$2"): When one of these atoms is used as a registry key, it clashes with ETS match spec syntax (:_ is a wildcard, :"$N" are capture variables). The implementation falls back to using a wildcard in the match head and a guard-based equality check, which means ETS cannot use the partial key bound to seek in the tree — it must do a full table scan. This only affects users who register with these specific atoms as keys, which is uncommon in practice.

All 216 parameterized duplicate registry tests pass (3 key types × 2 partition counts × 36 tests).

Change init_key_ets to create an ordered_set table for {:duplicate, :key} and add ordered boolean to Partition state. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

The match spec body needs triple braces {{{A, B}}} in Elixir to produce a tuple result in ETS. Also preserve parallel dispatch contract for {:duplicate, :key} when parallel: true option is passed. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

All entries for a given key are in a single partition, so parallel dispatch has no effect. Simplify by always dispatching in-process and document this in the :parallel option. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Handle {:duplicate, :key} (ordered) first, then {:duplicate, :pid} multi-partition, then collapse the remaining cases (:unique and {:duplicate, :pid} single-partition) into a single branch that builds the bag spec once. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Add an optional `ordered` parameter to __unregister__/4 so callers don't need to branch between __unregister__ and ordered_unregister_key. When ordered is true and pos is 1, the function builds the composite key match pattern internally. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Remove the duplicated bag_unregister_match/ordered_unregister_match 9-argument functions. The only difference was the match specs, so extract a single unregister_match_specs/5 that returns {total_spec, delete_spec} for both ordered and bag formats. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

The rewrite is needed because existing code (including tests) uses {:element, 1, :"$_"} in select body to extract the key. Without rewriting, this returns the composite tuple {key, pid, counter} instead of just the key in ordered_set tables. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Users of {:duplicate, :key} registries should use capture variables (:"$1", :"$2") instead of :"$_" in select specs. The internal ETS layout differs between registry kinds, and we don't paper over that. Updated docs to note this explicitly and fixed the test that relied on {:element, 1, :"$_"} to use capture variables instead. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Users switching from {:duplicate, :pid} to {:duplicate, :key} should know that match specs using :"$_" in select/count_select may behave differently due to the ordered_set internal layout. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Replace the two separate lookup helpers with a single lookup_second/3 that dispatches on kind. This also allows collapsing the single-partition {:duplicate, :key} and {:duplicate, :pid} branches in dispatch, lookup, and values. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Replace ordered_match_spec/3 and ordered_count_match_spec/3 with match_spec/4 and count_match_spec/4 that dispatch on kind. The bag spec construction moves from inline in match/4 and count_match/4 into the helpers, simplifying both functions from 5 branches to 2. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

studzien · 2026-04-23T13:27:24Z

Benchmark: `{:duplicate, :key}` vs `{:duplicate, :pid}`

Tested on 16 schedulers with 2s warmup and 10s measurement window per benchmark. Numbers below are averages from Benchee. Deviation is noted where it tells an important story.

Scenario A: 10,000 keys × 10 subscribers (100,000 total)

Operation	1 part / `:key`	1 part / `:pid`	16 part / `:key`	16 part / `:pid`
Lookup	1.09 μs	0.26 μs (4.3x)	1.13 μs	1.86 μs (1.6x)
Dispatch	1.13 μs	0.27 μs (4.2x)	1.16 μs	1.82 μs (1.6x)
Register	1.74 μs	1.53 μs (1.1x)	1.69 μs	2.35 μs (1.4x)
Unregister	2.65 μs	2.34 μs (1.1x)	2.53 μs	2.92 μs (1.2x)

With 1 partition, :pid wins lookup/dispatch by ~4x (direct hash lookup via lookup_element vs ordered_set match spec, no multi-partition overhead). With 16 partitions, :key wins across the board — the fan-out cost of querying all partitions erases :pid's raw ETS advantage.

Scenario B: 1 key × 200,000 subscribers

Operation	1 part / `:key`	1 part / `:pid`	16 part / `:key`	16 part / `:pid`
Lookup	10.79 ms (±17%)	15.29 ms (±34%)	10.01 ms (±9%)	24.51 ms (±6%)
Dispatch	10.40 ms (±9%)	14.40 ms (±34%)	10.31 ms (±9%)	28.32 ms (±6%)
Register	1.82 μs	3.38 μs (1.9x)	1.96 μs	3.43 μs (1.8x)
Unregister	2.71 μs	3.18 ms (1172x)	5.19 μs	759 μs (146x)

For lookup/dispatch with 1 partition, the averages suggest :key is ~1.4x faster. However, a raw ETS microbenchmark (stripping out Registry overhead) shows the median throughput of ets:lookup_element on the bag and ets:select on the ordered_set is comparable at this scale. The average difference is driven by tail latency: the duplicate_bag shows ~34% deviation and ~2x worse p99 compared to ~9-17% for the ordered_set.

With 16 partitions, the :pid fan-out overhead across all partitions adds a clear and consistent penalty on top of the variance.

The unregister gap (146–1172x) is the critical result and the primary motivation for this change — with 200k entries in a duplicate_bag, unregister must scan all entries for the key to find the matching pid. The ordered_set seeks directly to {key, pid, _}.

studzien · 2026-04-23T13:27:37Z

Benchmark script

Mix.install([{:benchee, "~> 1.0"}])

# Compare {:duplicate, :key} vs {:duplicate, :pid} registries.
#
# Scenario A: many keys, few subscribers each (10k × 10)
#   → :key wins on lookup/dispatch (single partition hit vs all partitions)
#
# Scenario B: one key, many subscribers (1 × 200k)
#   → :pid may win on lookup (bag lookup_element vs ordered_set select)
#
# Run: elixir bench/registry_key_vs_pid_bench.exs

defmodule Sub do
  @moduledoc """
  Minimal GenServer worker. Registry.register/3 must be called by the owning
  process, so each "subscriber" is its own GenServer that takes instructions
  via call.
  """
  use GenServer

  def start do
    {:ok, pid} = GenServer.start(__MODULE__, nil)
    pid
  end

  def register(pid, reg, key, val \\ :v) do
    GenServer.call(pid, {:register, reg, key, val})
  end

  def unregister(pid, reg, key) do
    GenServer.call(pid, {:unregister, reg, key})
  end

  def stop(pid), do: GenServer.stop(pid, :normal, :infinity)

  @impl true
  def init(nil), do: {:ok, nil}

  @impl true
  def handle_call({:register, reg, key, val}, _from, state) do
    {:ok, _} = Registry.register(reg, key, val)
    {:reply, :ok, state}
  end

  def handle_call({:unregister, reg, key}, _from, state) do
    Registry.unregister(reg, key)
    {:reply, :ok, state}
  end
end

defmodule Env do
  def start_registry(name, keys, partitions) do
    if pid = Process.whereis(name) do
      GenServer.stop(pid)
      Process.sleep(20)
    end

    {:ok, _} = Registry.start_link(keys: keys, name: name, partitions: partitions)
  end

  def populate(reg, n_keys, subs_per_key) do
    total = n_keys * subs_per_key
    agents = for _ <- 1..total, do: Sub.start()

    Enum.with_index(agents, fn pid, i ->
      key = "key_#{rem(i, n_keys)}"
      Sub.register(pid, reg, key)
    end)

    agents
  end

  def teardown(agents, name) do
    Enum.each(agents, &Sub.stop/1)
    if pid = Process.whereis(name), do: GenServer.stop(pid)
    Process.sleep(50)
  end
end

schedulers = System.schedulers_online()
IO.puts("Registry benchmark: {:duplicate, :key} vs {:duplicate, :pid}")
IO.puts("Schedulers: #{schedulers}\n")

bench_opts = [warmup: 2, time: 10, print: [configuration: false]]

run_scenario = fn title, n_keys, subs_per_key, partitions ->
  total = n_keys * subs_per_key
  key = "key_0"

  IO.puts(String.duplicate("=", 70))
  IO.puts("#{title}, #{partitions} partition(s)")
  IO.puts(String.duplicate("=", 70))

  Env.start_registry(:bench_key, {:duplicate, :key}, partitions)
  Env.start_registry(:bench_pid, {:duplicate, :pid}, partitions)

  IO.puts("Populating #{total} entries...")
  agents_key = Env.populate(:bench_key, n_keys, subs_per_key)
  agents_pid = Env.populate(:bench_pid, n_keys, subs_per_key)
  IO.puts("Done.\n")

  # -- Lookup --
  Benchee.run(
    %{
      "duplicate_key" => fn -> Registry.lookup(:bench_key, key) end,
      "duplicate_pid" => fn -> Registry.lookup(:bench_pid, key) end
    },
    [{:title, "Lookup"} | bench_opts]
  )

  # -- Dispatch --
  Benchee.run(
    %{
      "duplicate_key" => fn ->
        Registry.dispatch(:bench_key, key, fn entries -> length(entries) end)
      end,
      "duplicate_pid" => fn ->
        Registry.dispatch(:bench_pid, key, fn entries -> length(entries) end)
      end
    },
    [{:title, "Dispatch"} | bench_opts]
  )

  # -- Register --
  Benchee.run(
    %{
      "duplicate_key" =>
        {fn pid ->
           Sub.register(pid, :bench_key, key)
           pid
         end,
         before_each: fn _ -> Sub.start() end,
         after_each: fn pid ->
           Sub.unregister(pid, :bench_key, key)
           Sub.stop(pid)
         end},
      "duplicate_pid" =>
        {fn pid ->
           Sub.register(pid, :bench_pid, key)
           pid
         end,
         before_each: fn _ -> Sub.start() end,
         after_each: fn pid ->
           Sub.unregister(pid, :bench_pid, key)
           Sub.stop(pid)
         end}
    },
    [{:title, "Register"} | bench_opts]
  )

  # -- Unregister --
  Benchee.run(
    %{
      "duplicate_key" =>
        {fn pid ->
           Sub.unregister(pid, :bench_key, key)
           pid
         end,
         before_each: fn _ ->
           pid = Sub.start()
           Sub.register(pid, :bench_key, key)
           pid
         end,
         after_each: fn pid -> Sub.stop(pid) end},
      "duplicate_pid" =>
        {fn pid ->
           Sub.unregister(pid, :bench_pid, key)
           pid
         end,
         before_each: fn _ ->
           pid = Sub.start()
           Sub.register(pid, :bench_pid, key)
           pid
         end,
         after_each: fn pid -> Sub.stop(pid) end}
    },
    [{:title, "Unregister"} | bench_opts]
  )

  Env.teardown(agents_key, :bench_key)
  Env.teardown(agents_pid, :bench_pid)
  IO.puts("")
end

for partitions <- [1, schedulers] do
  # ── Scenario A: many keys, few subscribers ──────────────────────
  run_scenario.(
    "Scenario A: 10,000 keys x 10 subscribers (100,000 total)",
    10_000,
    10,
    partitions
  )

  # ── Scenario B: one key, many subscribers ───────────────────────
  run_scenario.("Scenario B: 1 key x 200,000 subscribers", 1, 200_000, partitions)
end

IO.puts("Done.")

studzien and others added 23 commits April 23, 2026 12:29

Switch {:duplicate, :key} key_ets to ordered_set

6208b33

Change init_key_ets to create an ordered_set table for {:duplicate, :key} and add ordered boolean to Partition state. Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Build composite key entry for {:duplicate, :key} registration

2f0a038

Use ordered_lookup_second for {:duplicate, :key} lookups

5c9527d

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Use ordered_unregister_key for {:duplicate, :key} unregistration

014dc9f

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Update match/4 and count_match/4 for ordered_set key_ets

47ed260

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Update select/2 and count_select/2 for ordered_set key_ets

56c6c14

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Update unregister_match/4 for ordered_set key_ets

009d93b

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Restore comment explaining total_spec vs delete_spec

7e75641

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Fix formatting in unregister_match_specs

88a961d

Assisted by AI Co-Authored-By: Claude <noreply@anthropic.com>

Add layout comment and tricky-key tests for match/count_match

82a42a1

Add tricky-key tests for select/count_select

688142c

Cover "$1" reserved-atom key in duplicate registry tests

53b44ec

Group reserved-atom key tests together in duplicate registry tests

564f548

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Registry] Switch `{:duplicate, :key}` key_ets to ordered_set with composite keys#15304

[Registry] Switch `{:duplicate, :key}` key_ets to ordered_set with composite keys#15304
studzien wants to merge 23 commits intoelixir-lang:mainfrom
studzien:registry/duplicate-ordered-set-redo

studzien commented Apr 23, 2026

Uh oh!

studzien commented Apr 23, 2026

Uh oh!

studzien commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

studzien commented Apr 23, 2026

Motivation

Approach

Benchmark summary

Caveats

Uh oh!

studzien commented Apr 23, 2026

Benchmark: {:duplicate, :key} vs {:duplicate, :pid}

Scenario A: 10,000 keys × 10 subscribers (100,000 total)

Scenario B: 1 key × 200,000 subscribers

Uh oh!

studzien commented Apr 23, 2026

Benchmark script

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Benchmark: `{:duplicate, :key}` vs `{:duplicate, :pid}`