Skip to content

Commit cae40bc

Browse files
committed
Refine comments and moduledoc for deferred log_lines indexing
Trim hindsight/diff-narrating comments down to what's non-obvious, and rewrite the SearchVectorWorker moduledoc to read as documentation of the mechanism rather than a justification of the change.
1 parent 348bc47 commit cae40bc

5 files changed

Lines changed: 41 additions & 70 deletions

File tree

lib/lightning/log_lines/search_vector_worker.ex

Lines changed: 27 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,43 @@
11
defmodule Lightning.LogLines.SearchVectorWorker do
22
@moduledoc """
3-
Asynchronously backfills `log_lines.search_vector` for rows that were left
4-
`NULL` at insert time.
5-
6-
## Why defer the tsvector?
7-
8-
Computing the full-text `search_vector` synchronously (via an insert trigger)
9-
put `to_tsvector` on the hot path of every log line write. Under heavy run
10-
load that work serialises behind the worker's log firehose and slows
11-
ingestion. A sibling migration removes the synchronous trigger, leaving the
12-
column `NULL` on insert, and adds:
13-
14-
* a `safe_to_tsvector(regconfig, text)` SQL function (tolerant of bad input);
15-
* a partial index `... WHERE search_vector IS NULL` so finding pending rows
16-
stays cheap.
17-
18-
This worker then fills `search_vector` out-of-band. The read side
19-
(`Lightning.Invocation`) queries with `to_tsquery('english_nostop', ...)`, so
20-
this worker MUST build vectors with the matching `english_nostop` config,
21-
otherwise searches would silently miss freshly-written log lines.
22-
23-
## Draining and snowballing
24-
25-
Each run drains pending rows in bounded batches (`@batch_size` rows, up to
26-
`@max_batches` per run). When a run consumes its full budget there is almost
27-
certainly more backlog, so it enqueues an immediate follow-up job (a
28-
"snowball") rather than waiting for the next 1-minute cron tick. This lets the
29-
worker keep pace with bursty load while the dedicated `search_indexing` queue
30-
(concurrency 1) plus job uniqueness keep the snowball self-limiting.
31-
32-
The cron entry enqueues with default args; the snowball uses
33-
`%{"trigger" => "snowball"}`. The differing `trigger` key produces a distinct
34-
uniqueness key, so a queued snowball is never swallowed by the cron job (and
35-
vice versa).
3+
Backfills the full-text `search_vector` on `log_lines` rows.
4+
5+
Log lines are inserted with `search_vector` left `NULL`; the vector is built
6+
here rather than on the insert path, keeping `to_tsvector` off the hot path of
7+
high-volume log ingestion. Search is eventually consistent as a result,
8+
typically catching up within a minute.
9+
10+
Two database objects support this: `safe_to_tsvector(regconfig, text)`, which
11+
builds the vector while tolerating NULL and oversized input, and a partial
12+
index over `search_vector IS NULL`, which keeps locating pending rows cheap as
13+
the table grows. Vectors use the `english_nostop` config to match the read
14+
side (`Lightning.Invocation`), which queries with
15+
`to_tsquery('english_nostop', ...)`.
16+
17+
Each run drains pending rows newest-first, in batches of `@batch_size` up to
18+
`@max_batches` per run. A run that exhausts its budget leaves backlog behind
19+
and enqueues an immediate follow-up ("snowball"); otherwise the minute-ly cron
20+
tick keeps pace. The worker runs on the dedicated `search_indexing` queue at
21+
concurrency 1, so only one job executes at a time, and the cron tick and the
22+
snowball carry distinct `trigger` args, so job uniqueness allows one of each to
23+
queue but never a duplicate.
3624
"""
3725

3826
use Oban.Worker,
3927
queue: :search_indexing,
4028
priority: 1,
4129
max_attempts: 10,
42-
# `states` is restricted to the queued states on purpose. Oban's default
43-
# unique states include `:executing` and `:completed`, which would make a
44-
# running snowball job match *itself* when it tries to enqueue its
45-
# successor, silently dedup the insert, and break the chain after a single
46-
# hop. Limiting uniqueness to `:available`/`:scheduled` still guarantees at
47-
# most one queued snowball (and one queued cron heartbeat, via the distinct
48-
# `:trigger` key) while letting the executing job enqueue the next link.
30+
# Restrict uniqueness to queued states. Oban's defaults also dedup against
31+
# :executing/:completed, so a running snowball would match itself and fail
32+
# to enqueue its successor — breaking the chain after one hop.
4933
unique: [period: 55, keys: [:trigger], states: [:available, :scheduled]]
5034

5135
alias Lightning.Repo
5236

5337
require Logger
5438

55-
# Rows to fill per batch.
5639
@batch_size 2_500
57-
# Maximum batches to drain in a single run (per-run budget).
40+
# Per-run budget.
5841
@max_batches 10
5942

6043
@drain_sql """
@@ -80,9 +63,8 @@ defmodule Lightning.LogLines.SearchVectorWorker do
8063
end)
8164

8265
if budget_exhausted? do
83-
# The run hit its per-run budget, so more backlog almost certainly
84-
# remains. Snowball an immediate follow-up with a distinct uniqueness key
85-
# so the cron job's uniqueness does not swallow it.
66+
# Budget exhausted, so backlog likely remains: enqueue an immediate
67+
# follow-up rather than waiting for the next cron tick.
8668
Oban.insert(Lightning.Oban, __MODULE__.new(%{"trigger" => "snowball"}))
8769
end
8870

priv/repo/migrations/20260530091125_add_safe_to_tsvector_function.exs

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,9 @@ defmodule Lightning.Repo.Migrations.AddSafeToTsvectorFunction do
22
use Ecto.Migration
33

44
def up do
5-
# Not STRICT: a STRICT function returns NULL (without running) when `doc` is
6-
# NULL, which would leave the row's search_vector NULL forever and stuck in
7-
# the pending index. COALESCE the doc instead so the function always yields
8-
# a non-NULL tsvector. CREATE OR REPLACE keeps the migration re-runnable.
5+
# Deliberately not STRICT: a STRICT function returns NULL for a NULL doc,
6+
# which would leave search_vector NULL forever and stuck in the pending
7+
# index. COALESCE instead so the result is always a non-NULL tsvector.
98
execute("""
109
CREATE OR REPLACE FUNCTION safe_to_tsvector(config regconfig, doc text) RETURNS tsvector
1110
LANGUAGE plpgsql IMMUTABLE AS $$

priv/repo/migrations/20260530091126_add_log_lines_pending_search_index.exs

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ defmodule Lightning.Repo.Migrations.AddLogLinesPendingSearchIndex do
77
@num_partitions 100
88

99
def up do
10-
# Create the partial index on the parent table (ONLY), unattached so far.
10+
# Partial index on the parent (ONLY); attached per-partition below.
1111
execute("""
1212
CREATE INDEX IF NOT EXISTS log_lines_pending_search_idx
1313
ON ONLY log_lines (timestamp)
@@ -31,11 +31,9 @@ defmodule Lightning.Repo.Migrations.AddLogLinesPendingSearchIndex do
3131
end
3232

3333
defp create_partition_index(_num_partitions, part_num) do
34-
# A failed CREATE INDEX CONCURRENTLY leaves an INVALID index behind. The
35-
# IF NOT EXISTS below would then skip rebuilding it, and the subsequent
36-
# ATTACH would leave the parent permanently INVALID. Drop any invalid
37-
# leftover first so a re-run rebuilds it cleanly. (A valid index is kept;
38-
# only invalid ones are dropped.)
34+
# A failed CREATE INDEX CONCURRENTLY leaves an INVALID index that IF NOT
35+
# EXISTS would skip, so the ATTACH below would mark the parent INVALID too.
36+
# Drop any invalid leftover first so a re-run rebuilds cleanly.
3937
execute("""
4038
DO $$
4139
BEGIN

test/lightning/log_lines/search_vector_worker_test.exs

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,8 @@ defmodule Lightning.LogLines.SearchVectorWorkerTest do
1818
%{run: run}
1919
end
2020

21-
# Inserts a log line via the public API. With the synchronous trigger removed
22-
# this leaves `search_vector` NULL, which is exactly the pending state the
23-
# worker drains.
21+
# Inserts via the public API, which leaves `search_vector` NULL — the pending
22+
# state the worker drains.
2423
defp append_log(run, message) do
2524
{:ok, log_line} =
2625
Runs.append_run_log(run, %{
@@ -74,14 +73,12 @@ defmodule Lightning.LogLines.SearchVectorWorkerTest do
7473
append_log(run, "logline number #{n} doing work").id
7574
end
7675

77-
# Freshly inserted lines start out unindexed (deferred computation).
7876
for id <- ids do
7977
assert %{null?: true, matches?: false} = search_vector_state(id)
8078
end
8179

8280
assert {:ok, 5} = perform_job(SearchVectorWorker, %{})
8381

84-
# After draining, every row has a populated, matching search_vector.
8582
for id <- ids do
8683
assert %{null?: false, matches?: true} = search_vector_state(id)
8784
end
@@ -145,11 +142,9 @@ defmodule Lightning.LogLines.SearchVectorWorkerTest do
145142
end
146143

147144
describe "snowball uniqueness" do
148-
# Regression: Oban's default unique states include :executing and :completed,
149-
# so a running snowball job (state :executing) matched *itself* when it tried
150-
# to enqueue its successor — the insert was silently deduped and the chain
151-
# died after one hop. The worker restricts uniqueness to the queued states so
152-
# an executing job can always enqueue the next link.
145+
# Guards the snowball chain: an executing job must be able to enqueue its
146+
# successor. Oban's default unique states include :executing, so a snowball
147+
# would otherwise match itself and the chain would die after one hop.
153148
test "an executing snowball does not block enqueuing its successor" do
154149
Oban.Testing.with_testing_mode(:manual, fn ->
155150
{:ok, running} =

test/lightning/runs_test.exs

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -921,9 +921,8 @@ defmodule Lightning.RunsTest do
921921
timestamp: DateTime.utc_now() |> DateTime.to_unix(:millisecond)
922922
})
923923

924-
# The synchronous trigger is gone: search_vector is computed out-of-band
925-
# by Lightning.LogLines.SearchVectorWorker, so it starts NULL and is not
926-
# yet matched by a full-text query.
924+
# search_vector is computed out-of-band by SearchVectorWorker, so it
925+
# starts NULL and isn't matched by a full-text query yet.
927926
assert search_vector_null?(log_line.id)
928927
refute log_line_searchable?(log_line.id, "searchable")
929928
end
@@ -953,12 +952,10 @@ defmodule Lightning.RunsTest do
953952
assert length(log_lines) == 3
954953
assert Enum.map(log_lines, & &1.message) == Enum.map(entries, & &1.message)
955954

956-
# Each inserted line broadcasts a LogAppended event.
957955
for _ <- entries do
958956
assert_received %Runs.Events.LogAppended{}
959957
end
960958

961-
# All rows are persisted with a NULL search_vector (deferred indexing).
962959
for log_line <- log_lines do
963960
assert search_vector_null?(log_line.id)
964961
refute log_line_searchable?(log_line.id, "logline")

0 commit comments

Comments
 (0)