|
| 1 | +# Changelog |
| 2 | + |
| 3 | +## [3.5.0] - Unreleased |
| 4 | + |
| 5 | +### Headline: Three named modes |
| 6 | + |
| 7 | +Kalign v3.5 introduces three named modes that package the best-performing |
| 8 | +configurations into simple flags: |
| 9 | + |
| 10 | +| Mode | CLI | Python | Description | |
| 11 | +|------|-----|--------|-------------| |
| 12 | +| **default** | `kalign` | `mode="default"` or omit | Consistency anchors + VSM. Best general-purpose. | |
| 13 | +| **fast** | `kalign --fast` | `mode="fast"` | VSM only. Fastest, equivalent to kalign v3.4. | |
| 14 | +| **precise** | `kalign --precise` | `mode="precise"` | Ensemble(3) + VSM + realign. Highest precision. | |
| 15 | + |
| 16 | +Explicit parameters always override mode defaults. |
| 17 | + |
| 18 | +**Breaking change:** The default behavior now includes consistency anchors |
| 19 | +(K=5) and VSM. To get the old v3.4 behavior, use `--fast` or `mode="fast"`. |
| 20 | + |
| 21 | +### New features |
| 22 | + |
| 23 | +- **Three named modes** (`--fast`, `--precise`): Simple entry points inspired |
| 24 | + by minimap2's `-x preset` pattern. Most users only need to pick a mode. |
| 25 | + Expert parameters remain available and override mode defaults. |
| 26 | +- **Ensemble alignment** (`--ensemble N`, or `--precise`): Runs N alignments |
| 27 | + with varied gap penalties and tree noise, combines results via POAR |
| 28 | + (Pairs of Aligned Residues) consensus. Improves F1 by ~5 points on |
| 29 | + BAliBASE at ~10x time cost. |
| 30 | +- **POAR consensus save/load** (`--save-poar`, `--load-poar`): Save the |
| 31 | + consensus table after an ensemble run, then instantly re-threshold |
| 32 | + with different `--min-support` values without re-alignment. |
| 33 | +- **Per-column and per-residue confidence scores** from ensemble mode. |
| 34 | + Accessible in Python via `result.column_confidence` and |
| 35 | + `result.residue_confidence`. Writable as Stockholm PP annotations. |
| 36 | +- **Anchor consistency**: Computes K fast BPM anchor alignments and feeds |
| 37 | + consistency signals into the substitution matrix. Enabled by default |
| 38 | + (K=5) in the default mode. |
| 39 | +- **Variable Scoring Matrix (VSM)**: Adapts substitution scores based on |
| 40 | + estimated pairwise distance. Enabled by default for protein; disabled |
| 41 | + for DNA/RNA. |
| 42 | +- **Alignment-guided realignment**: Rebuilds the guide tree from pairwise |
| 43 | + identities in the aligned MSA, then re-aligns. Used by `--precise` mode. |
| 44 | +- **Sequence weight rebalancing**: Pseudocount-based profile rebalancing |
| 45 | + to reduce bias from redundant sequences. Auto-disabled in ensemble |
| 46 | + mode where POAR consensus already handles imbalance. |
| 47 | +- **Alignment refinement** (`--refine`): Post-alignment column-level |
| 48 | + refinement with modes "all", "confident", and "inline". |
| 49 | +- **PFASUM substitution matrices**: New `--type pfasum`, `pfasum43`, |
| 50 | + `pfasum60` options for PFASUM-family scoring. |
| 51 | +- **Python `align()` parity**: The in-memory `align()` function now |
| 52 | + supports `mode`, `vsm_amax`, `realign`, and `ensemble_seed` parameters |
| 53 | + (previously only available on file-based APIs). |
| 54 | +- **New Python constants**: `MODE_DEFAULT`, `MODE_FAST`, `MODE_PRECISE`, |
| 55 | + `REFINE_INLINE`, `PROTEIN_PFASUM43`, `PROTEIN_PFASUM60`, |
| 56 | + `PROTEIN_PFASUM_AUTO`. |
| 57 | +- **Container support**: Podman/Docker files for reproducible benchmarks. |
| 58 | +- **Downstream benchmark suite** (`benchmarks/downstream/`): Measures |
| 59 | + alignment quality on downstream tasks: HMMER profile search, |
| 60 | + phylogenetic tree accuracy (RF distance), and positive selection |
| 61 | + detection (HyPhy BUSTED). |
| 62 | + |
| 63 | +### Changed |
| 64 | + |
| 65 | +- **License changed from GPL-3.0-or-later to Apache-2.0.** All |
| 66 | + dependencies are permissive-licensed, so no compatibility issues. |
| 67 | +- **Default mode now uses consistency anchors (K=5) and VSM.** The |
| 68 | + previous default (no consistency, no VSM) is now `--fast`. |
| 69 | +- Refine default is `none` (Python docstring previously said |
| 70 | + `confident` incorrectly; now fixed). |
| 71 | +- `seq_weights` auto-disabled in ensemble mode. |
| 72 | +- CLI `--ensemble` without a value defaults to 5 runs. |
| 73 | +- C library API expanded with `kalign_run_seeded()`, |
| 74 | + `kalign_run_realign()`, `kalign_ensemble()`, |
| 75 | + `kalign_consensus_from_poar()`, `kalign_msa_compare_detailed()`, |
| 76 | + `kalign_msa_compare_with_mask()`. |
| 77 | + |
| 78 | +### Fixed |
| 79 | + |
| 80 | +- `kalign_arr_to_msa()` now initializes all MSA struct fields (biotype, |
| 81 | + alnlen, seq_distances, col_confidence, run_parallel, seq->confidence, |
| 82 | + seq->rank). Previously caused `free(): invalid pointer` on Linux |
| 83 | + where glibc does not zero malloc'd memory. |
| 84 | +- Python `align()` (in-memory path) no longer crashes on Linux due to |
| 85 | + the uninitialized fields above. |
| 86 | + |
| 87 | +### BAliBASE results (218 cases, protein) |
| 88 | + |
| 89 | +| Mode | Recall | Precision | F1 | TC | Time | |
| 90 | +|------|--------|-----------|-----|-----|------| |
| 91 | +| fast (v3.4 equiv) | 0.804 | 0.656 | 0.716 | 0.466 | 10s | |
| 92 | +| **default** | 0.810 | 0.665 | 0.724 | 0.472 | 29s | |
| 93 | +| **precise** | 0.796 | 0.752 | 0.768 | 0.467 | 193s | |
| 94 | +| ClustalO | 0.840 | 0.710 | 0.764 | 0.559 | -- | |
| 95 | +| MAFFT | 0.867 | 0.715 | 0.778 | 0.590 | -- | |
| 96 | +| MUSCLE5 | 0.870 | 0.721 | 0.783 | 0.581 | -- | |
0 commit comments