Skip to content

Commit 783b1d0

Browse files
TimoLassmannclaude
andcommitted
update docs for AlignedSequences, positive gap penalties, kalign-py CLI
- align_from_file now returns AlignedSequences, not list[str] - gap penalties are positive values (not negative) - document kalign-py CLI entry point - document kalign.compare() function - add benchmark instructions to README.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 01254c8 commit 783b1d0

7 files changed

Lines changed: 121 additions & 41 deletions

File tree

README-python.md

Lines changed: 60 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -104,12 +104,12 @@ aligned = kalign.align(sequences)
104104
# Specify sequence type explicitly
105105
aligned = kalign.align(sequences, seq_type="protein")
106106

107-
# Use custom gap penalties
107+
# Use custom gap penalties (positive values = penalty magnitude)
108108
aligned = kalign.align(
109109
sequences,
110110
seq_type="dna",
111-
gap_open=-10.0,
112-
gap_extend=-1.0,
111+
gap_open=10.0,
112+
gap_extend=1.0,
113113
terminal_gap_extend=0.0
114114
)
115115

@@ -312,16 +312,21 @@ Align sequences directly from files:
312312
```python
313313
import kalign
314314

315-
# Read sequences from file and align
316-
aligned = kalign.align_from_file("sequences.fasta", seq_type="protein")
315+
# Read sequences from file and align — returns AlignedSequences(names, sequences)
316+
result = kalign.align_from_file("sequences.fasta", seq_type="protein")
317+
for name, seq in zip(result.names, result.sequences):
318+
print(f"{name}: {seq}")
317319

318320
# With custom parameters
319-
aligned = kalign.align_from_file(
321+
result = kalign.align_from_file(
320322
"sequences.fasta",
321323
seq_type="protein",
322-
gap_open=-10.0,
324+
gap_open=10.0,
323325
n_threads=4
324326
)
327+
328+
# Tuple unpacking also works
329+
names, sequences = kalign.align_from_file("sequences.fasta")
325330
```
326331

327332
Supported input formats:
@@ -330,6 +335,18 @@ Supported input formats:
330335
- Clustal
331336
- Aligned FASTA
332337

338+
### Comparing Alignments
339+
340+
Score a test alignment against a reference using the Sum-of-Pairs (SP) score:
341+
342+
```python
343+
import kalign
344+
345+
# Returns SP score from 0 (no match) to 100 (identical)
346+
score = kalign.compare("reference.msf", "test_alignment.fasta")
347+
print(f"SP score: {score:.1f}")
348+
```
349+
333350
### Advanced Usage
334351

335352
```python
@@ -344,8 +361,8 @@ protein_sequences = [
344361
aligned = kalign.align(
345362
protein_sequences,
346363
seq_type="protein",
347-
gap_open=-10.0,
348-
gap_extend=-1.0,
364+
gap_open=10.0,
365+
gap_extend=1.0,
349366
terminal_gap_extend=0.0,
350367
n_threads=2
351368
)
@@ -356,31 +373,53 @@ print(f"Alignment length: {len(aligned[0])}")
356373

357374
## API Reference
358375

359-
### `kalign.align(sequences, seq_type="auto", gap_open=None, gap_extend=None, terminal_gap_extend=None, n_threads=1)`
376+
### `kalign.align(sequences, seq_type="auto", gap_open=None, gap_extend=None, terminal_gap_extend=None, n_threads=None)`
360377

361378
Align a list of sequences.
362379

363380
**Parameters:**
364381
- `sequences` (list of str): Sequences to align
365382
- `seq_type` (str or int): Sequence type specification
366-
- `gap_open` (float, optional): Gap opening penalty
367-
- `gap_extend` (float, optional): Gap extension penalty
368-
- `terminal_gap_extend` (float, optional): Terminal gap extension penalty
369-
- `n_threads` (int): Number of threads to use
383+
- `gap_open` (float, optional): Gap opening penalty (positive value)
384+
- `gap_extend` (float, optional): Gap extension penalty (positive value)
385+
- `terminal_gap_extend` (float, optional): Terminal gap extension penalty (positive value)
386+
- `n_threads` (int, optional): Number of threads (default: `get_num_threads()`)
370387

371388
**Returns:**
372389
- `list of str`: Aligned sequences
373390

374-
### `kalign.align_from_file(input_file, seq_type="auto", gap_open=None, gap_extend=None, terminal_gap_extend=None, n_threads=1)`
391+
### `kalign.align_from_file(input_file, seq_type="auto", gap_open=None, gap_extend=None, terminal_gap_extend=None, n_threads=None)`
375392

376-
Align sequences from a file.
393+
Align sequences from a file, preserving sequence names.
377394

378395
**Parameters:**
379-
- `input_file` (str): Path to input file
396+
- `input_file` (str): Path to input file (FASTA, MSF, Clustal)
380397
- Other parameters same as `align()`
381398

382399
**Returns:**
383-
- `list of str`: Aligned sequences
400+
- `AlignedSequences`: Named tuple with `.names` (list of str) and `.sequences` (list of str)
401+
402+
### `kalign.compare(reference_file, test_file)`
403+
404+
Score a test alignment against a reference alignment using the Sum-of-Pairs score.
405+
406+
**Parameters:**
407+
- `reference_file` (str): Path to reference alignment
408+
- `test_file` (str): Path to test alignment
409+
410+
**Returns:**
411+
- `float`: SP score from 0.0 (no match) to 100.0 (identical)
412+
413+
### Command-line interface
414+
415+
Installing the Python package provides a `kalign-py` command that uses the Python bindings directly (the C binary is installed separately as `kalign`):
416+
417+
```bash
418+
kalign-py -i sequences.fasta -o aligned.fasta --format fasta --type protein
419+
kalign-py -i sequences.fasta -o - --format clustal # stdout
420+
cat input.fa | kalign-py -i - -o aligned.fasta # stdin
421+
kalign-py --version
422+
```
384423

385424
## 🚀 Performance & Scalability
386425

@@ -410,9 +449,9 @@ python examples/performance_benchmarks.py
410449

411450
# Use recommended settings
412451
kalign.set_num_threads(8) # Based on benchmark results
413-
aligned = kalign.align(sequences,
414-
gap_open=-10.0, # Optimized gap penalties
415-
gap_extend=-1.0)
452+
aligned = kalign.align(sequences,
453+
gap_open=10.0, # Optimized gap penalties
454+
gap_extend=1.0)
416455
```
417456

418457
## Examples

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,12 @@ for seq in aligned:
167167
print(seq)
168168
```
169169

170+
The Python package also provides a `kalign-py` CLI that uses the Python bindings directly (does not require the C binary):
171+
172+
```bash
173+
kalign-py -i sequences.fasta -o aligned.fasta --format fasta
174+
```
175+
170176
For comprehensive Python documentation, see [README-python.md](README-python.md) and the [python-docs directory](python-docs/).
171177

172178
## Examples
@@ -285,9 +291,20 @@ To install from TestPyPI while resolving dependencies from PyPI:
285291
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple <package-name>
286292
```
287293

288-
## Performance
294+
## Benchmarks
295+
296+
The repository includes an automated benchmark suite that scores kalign against the BAliBASE reference alignments using both the C binary and the Python API. Results are tracked across commits via GitHub Actions.
297+
298+
```bash
299+
# Run benchmarks locally (requires: pip install -e . and a CMake build)
300+
make benchmark # full BAliBASE suite
301+
make benchmark BENCH_MAX_CASES=5 # quick smoke test
302+
303+
# Or via Python directly
304+
python -m benchmarks --dataset balibase --method python_api cli -v
305+
```
289306

290-
### Benchmark Results
307+
### Historical Results
291308

292309
Kalign performs well for both speed and accuracy:
293310

python-docs/python-api.md

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ print(aligned)
6969
aligned = kalign.align(
7070
sequences,
7171
seq_type="dna",
72-
gap_open=-10.0,
73-
gap_extend=-1.0,
72+
gap_open=10.0,
73+
gap_extend=1.0,
7474
n_threads=4
7575
)
7676

@@ -91,7 +91,7 @@ print(type(aln_sk))
9191

9292
### `kalign.align_from_file()`
9393

94-
Align sequences directly from files.
94+
Align sequences from a file, preserving sequence names.
9595

9696
```python
9797
def align_from_file(
@@ -100,23 +100,47 @@ def align_from_file(
100100
gap_open: Optional[float] = None,
101101
gap_extend: Optional[float] = None,
102102
terminal_gap_extend: Optional[float] = None,
103-
n_threads: int = 1
104-
) -> List[str]
103+
n_threads: Optional[int] = None
104+
) -> AlignedSequences
105105
```
106106

107+
Returns an `AlignedSequences` named tuple with `.names` (list of str) and `.sequences` (list of str).
108+
107109
**Supported formats:** FASTA, MSF, Clustal, aligned FASTA
108110

109111
**Example:**
110112

111113
```python
112114
import kalign
113115

114-
# Align from FASTA file
115-
aligned = kalign.align_from_file(
116+
# Align from FASTA file — returns AlignedSequences(names, sequences)
117+
result = kalign.align_from_file(
116118
"sequences.fasta",
117119
seq_type="protein",
118120
n_threads=4
119121
)
122+
for name, seq in zip(result.names, result.sequences):
123+
print(f"{name}: {seq}")
124+
125+
# Tuple unpacking also works
126+
names, sequences = kalign.align_from_file("sequences.fasta")
127+
```
128+
129+
### `kalign.compare()`
130+
131+
Score a test alignment against a reference using the Sum-of-Pairs (SP) score.
132+
133+
```python
134+
def compare(reference_file: str, test_file: str) -> float
135+
```
136+
137+
**Returns:** SP score from 0.0 (no match) to 100.0 (identical).
138+
139+
**Example:**
140+
141+
```python
142+
score = kalign.compare("reference.msf", "test_alignment.fasta")
143+
print(f"SP score: {score:.1f}")
120144
```
121145

122146
### `kalign.write_alignment()`

python-docs/python-performance.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -454,8 +454,8 @@ def optimize_gap_penalties(sequences, penalty_ranges=None):
454454

455455
if penalty_ranges is None:
456456
penalty_ranges = {
457-
'gap_open': np.arange(-20, -2, 2),
458-
'gap_extend': np.arange(-5, -0.5, 0.5)
457+
'gap_open': np.arange(2, 20, 2),
458+
'gap_extend': np.arange(0.5, 5, 0.5)
459459
}
460460

461461
print("🔧 Optimizing gap penalties...")

python-docs/python-quickstart.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -266,22 +266,22 @@ import kalign
266266
# Conservative alignment (fewer gaps)
267267
aligned = kalign.align(
268268
sequences,
269-
gap_open=-15.0, # Higher penalty for opening gaps
270-
gap_extend=-2.0 # Higher penalty for extending gaps
269+
gap_open=15.0, # Higher penalty for opening gaps
270+
gap_extend=2.0 # Higher penalty for extending gaps
271271
)
272272

273273
# Aggressive alignment (more gaps)
274274
aligned = kalign.align(
275275
sequences,
276-
gap_open=-5.0, # Lower penalty for opening gaps
277-
gap_extend=-0.5 # Lower penalty for extending gaps
276+
gap_open=5.0, # Lower penalty for opening gaps
277+
gap_extend=0.5 # Lower penalty for extending gaps
278278
)
279279

280280
# Custom terminal gap handling
281281
aligned = kalign.align(
282282
sequences,
283-
gap_open=-10.0,
284-
gap_extend=-1.0,
283+
gap_open=10.0,
284+
gap_extend=1.0,
285285
terminal_gap_extend=0.0 # No penalty for terminal gaps
286286
)
287287
```

python-examples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ sequences = [
185185
aligned = kalign.align(
186186
sequences,
187187
seq_type="protein", # Change sequence type
188-
gap_open=-12.0, # Adjust gap penalties
188+
gap_open=12.0, # Adjust gap penalties
189189
n_threads=8 # Set thread count
190190
)
191191

python-examples/basic_usage.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ def example_2_protein_alignment():
6262
aligned = kalign.align(
6363
protein_sequences,
6464
seq_type="protein",
65-
gap_open=-10.0,
66-
gap_extend=-1.0
65+
gap_open=10.0,
66+
gap_extend=1.0
6767
)
6868

6969
print("\nAligned sequences:")

0 commit comments

Comments
 (0)