Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
__init__.py	__init__.py
baseline.py	baseline.py
gen_data.py	gen_data.py
solution.py	solution.py
test_dna.py	test_dna.py

Name

Last commit message

Last commit date

Round 3 (final challenge): DNA sequence matcher

Problem

Given a FASTA-like file (genome.fasta) containing DNA sequences using only A, C, G, T, find every record whose sequence contains a target pattern and return the positions of each occurrence inside that record.

A FASTA record starts with a > header line containing the record id, followed by one or more lines of sequence data. The file packs many records back-to-back:

>seq_000001
ACGTACGTACGT
ACGTACGTACGT
>seq_000002
TGCATGCATGCA

Input: data/genome.fasta (default ~512 MB; scale with --size-mb).
Target pattern: b"AGTCCGTA" (recorded in data/truth.json).
Output: list[tuple[record_id, list[int positions]]] in file order.

You are encouraged to combine techniques from rounds 1 and 2.

Files

File	Purpose
`baseline.py`	Intentionally slow starting point. Don't edit: it is the reference for the comparison.
`solution.py`	Edit this. Starts out delegating to `baseline.py`; replace with your faster implementation.
`gen_data.py`	Generates the FASTA file and a `truth.json` with expected matches.
`test_dna.py`	Correctness tests and the pytest-codspeed benchmark. Every test is parametrized over both the baseline and your solution.

Generate the data

uv run rounds/3_dna/gen_data.py             # default ~512 MB.
uv run rounds/3_dna/gen_data.py --size-mb 100

Or run uv run scripts/setup.py to generate every round's data in one go.

Verify correctness

uv run pytest rounds/3_dna/

Benchmark

Walltime, locally:

uv run pytest --codspeed rounds/3_dna/

Same benchmarks, run through the CodSpeed CLI for low-noise instrumented measurements:

codspeed run --mode walltime -- uv run pytest --codspeed rounds/3_dna/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Round 3 (final challenge): DNA sequence matcher

Problem

Files

Generate the data

Verify correctness

Benchmark

Uh oh!

FilesExpand file tree

3_dna

Directory actions

More options

Directory actions

More options

Latest commit

History

3_dna

Folders and files

parent directory

README.md

Round 3 (final challenge): DNA sequence matcher

Problem

Files

Generate the data

Verify correctness

Benchmark