Skip to content

Commit d937577

Browse files
committed
Fix erroneous INFO/END recalculation; New --no-realign option
- INFO/END was calculated incorrectly for <DUP> symbolic alleles - Added `--no-realign[=NUM]` to skip realignment during normalization; when NUM is given, only events longer than NUM bp are skipped. This deprecates the option `--do-not-normalize` (equivalent to `--no-realign=0`). Resolves #2500
1 parent 381e693 commit d937577

11 files changed

Lines changed: 333 additions & 261 deletions

NEWS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,12 @@ Changes affecting specific commands:
5050

5151
- Fix a bug in splitting Type=String FORMAT fields via -a and -m (#2476)
5252

53+
- Fix erroneous INFO/END recalculation of symbolic alleles (#2500).
54+
55+
- Added `--no-realign[=NUM]` to skip realignment during normalization; when NUM is given,
56+
only events longer than NUM bp are skipped. This deprecates the option `--do-not-normalize`
57+
(equivalent to `--no-realign=0`). See also #2500.
58+
5359
* bcftools +prune
5460

5561
- When pruning by window length in base pairs, one out-of-window record would

doc/bcftools.1

Lines changed: 244 additions & 237 deletions
Large diffs are not rendered by default.

doc/bcftools.html

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<meta charset="UTF-8">
55
<meta http-equiv="X-UA-Compatible" content="IE=edge">
66
<meta name="viewport" content="width=device-width, initial-scale=1.0">
7-
<meta name="generator" content="Asciidoctor 2.0.16.dev">
7+
<meta name="generator" content="Asciidoctor 2.0.23">
88
<title>bcftools(1)</title>
99
<link rel="stylesheet" href="./docbook-xsl.css">
1010
</head>
@@ -50,7 +50,7 @@ <h2 id="_description">DESCRIPTION</h2>
5050
<div class="sect2">
5151
<h3 id="_version">VERSION</h3>
5252
<div class="paragraph">
53-
<p>This manual page was last updated <strong>2026-03-18</strong> and refers to bcftools git version <strong>1.23.1</strong>.</p>
53+
<p>This manual page was last updated <strong>2026-04-07 10:58 BST</strong> and refers to bcftools git version <strong>1.23.1-61-gb64ebe7b+</strong>.</p>
5454
</div>
5555
</div>
5656
<div class="sect2">
@@ -2822,7 +2822,8 @@ <h3 id="merge">bcftools merge [<em>OPTIONS</em>] <em>A.vcf.gz</em> <em>B.vcf.gz<
28222822
</div>
28232823
<div class="listingblock">
28242824
<div class="content">
2825-
<pre>-m none .. no new multiallelics, output multiple records instead
2825+
<pre>-m exact .. require the exact same alleles
2826+
-m none .. no new multiallelics, output multiple records instead
28262827
-m snps .. allow multiallelic SNP records
28272828
-m indels .. allow multiallelic indel records
28282829
-m both .. both SNP and indel records can be multiallelic
@@ -3548,7 +3549,8 @@ <h3 id="norm">bcftools norm [<em>OPTIONS</em>] <em>file.vcf.gz</em></h3>
35483549
cannot be stressed enough, that <em>s</em> will NOT fix strand issues in
35493550
your VCF, do NOT use it for that purpose!!! (Instead see
35503551
<a href="http://samtools.github.io/bcftools/howtos/plugin.af-dist.html" class="bare">http://samtools.github.io/bcftools/howtos/plugin.af-dist.html</a> and
3551-
&lt;<a href="http://samtools.github.io/bcftools/howtos/plugin.fixref.html&gt;" class="bare">http://samtools.github.io/bcftools/howtos/plugin.fixref.html&gt;</a>.)</p>
3552+
<a href="http://samtools.github.io/bcftools/howtos/plugin.fixref.html" class="bare">http://samtools.github.io/bcftools/howtos/plugin.fixref.html</a>.)
3553+
See also the option <strong>-N, --no-realign</strong>.</p>
35523554
</dd>
35533555
<dt class="hdlist1"><strong>-d, --rm-dup</strong> <em>snps</em>|<em>indels</em>|<em>both</em>|<em>all</em>|<em>exact</em></dt>
35543556
<dd>
@@ -3568,7 +3570,7 @@ <h3 id="norm">bcftools norm [<em>OPTIONS</em>] <em>file.vcf.gz</em></h3>
35683570
<dt class="hdlist1"><strong>-f, --fasta-ref</strong> <em>FILE</em><a id="fasta_ref"></a></dt>
35693571
<dd>
35703572
<p>reference sequence. Supplying this option will turn on left-alignment
3571-
and normalization, however, see also the <strong><a href="#do_not_normalize">--do-not-normalize</a></strong>
3573+
and normalization, however, see also the <strong><a href="#do_not_normalize">--no-realign</a></strong>
35723574
option below.</p>
35733575
</dd>
35743576
<dt class="hdlist1"><strong>--force</strong></dt>
@@ -3615,11 +3617,12 @@ <h3 id="norm">bcftools norm [<em>OPTIONS</em>] <em>file.vcf.gz</em></h3>
36153617
<dd>
36163618
<p>see <strong><a href="#common_options">Common Options</a></strong></p>
36173619
</dd>
3618-
<dt class="hdlist1"><strong>-N, --do-not-normalize</strong><a id="do_not_normalize"></a></dt>
3620+
<dt class="hdlist1"><strong>-N, --no-realign</strong> [<em>NUM</em>]<a id="do_not_normalize"></a></dt>
36193621
<dd>
3620-
<p>the <em>-c s</em> option can be used to fix or set the REF allele from the
3621-
reference <em>-f</em>. The <em>-N</em> option will not turn on indel normalisation
3622-
as the <em>-f</em> option normally implies</p>
3622+
<p>do not realign indels and symbolic alleles, which is normally performed
3623+
when <strong>-f, --fasta-ref</strong> option is provided. If <em>NUM</em> is given, only
3624+
events longer than <em>NUM</em> bp are not realigned.
3625+
Previously named <strong>--do-not-normalize</strong>.</p>
36233626
</dd>
36243627
<dt class="hdlist1"><strong>--old-rec-tag</strong> <em>STR</em></dt>
36253628
<dd>
@@ -4024,7 +4027,7 @@ <h4 id="_list_of_plugins_coming_with_the_distribution">List of plugins coming wi
40244027
<p>Convert between similar tags, such as GL,PL,GP or QR,QA,QS or tags with localized alleles e.g. LPL,LAD.
40254028
See <a href="http://samtools.github.io/bcftools/howtos/plugin.tag2tag.html" class="bare">http://samtools.github.io/bcftools/howtos/plugin.tag2tag.html</a> for more.</p>
40264029
</dd>
4027-
<dt class="hdlist1"><strong>trio-dnm2</strong></dt>
4030+
<dt class="hdlist1"><strong>trio-dnm3</strong></dt>
40284031
<dd>
40294032
<p>screen variants for possible de-novo mutations in trios</p>
40304033
</dd>
@@ -4049,6 +4052,10 @@ <h4 id="_list_of_plugins_coming_with_the_distribution">List of plugins coming wi
40494052
<dd>
40504053
<p>print the variants as a set of tables</p>
40514054
</dd>
4055+
<dt class="hdlist1"><strong>vrfs</strong></dt>
4056+
<dd>
4057+
<p>assess site noisiness (variant read frequency score) from a large number of reference samples</p>
4058+
</dd>
40524059
</dl>
40534060
</div>
40544061
</div>
@@ -5829,7 +5836,7 @@ <h2 id="_copying">COPYING</h2>
58295836
</div>
58305837
<div id="footer">
58315838
<div id="footer-text">
5832-
Last updated 2026-03-18 14:50:59 UTC
5839+
Last updated 2026-04-07 10:58:12 +0100
58335840
</div>
58345841
</div>
58355842
</body>

doc/bcftools.txt

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2704,6 +2704,7 @@ the *<<fasta_ref,--fasta-ref>>* option is supplied.
27042704
your VCF, do NOT use it for that purpose!!! (Instead see
27052705
<http://samtools.github.io/bcftools/howtos/plugin.af-dist.html> and
27062706
<http://samtools.github.io/bcftools/howtos/plugin.fixref.html>.)
2707+
See also the option *-N, --no-realign*.
27072708

27082709
*-d, --rm-dup* 'snps'|'indels'|'both'|'all'|'exact'::
27092710
If a record is present multiple times, output only the first instance.
@@ -2719,7 +2720,7 @@ the *<<fasta_ref,--fasta-ref>>* option is supplied.
27192720

27202721
*-f, --fasta-ref* 'FILE'[[fasta_ref]]::
27212722
reference sequence. Supplying this option will turn on left-alignment
2722-
and normalization, however, see also the *<<do_not_normalize,--do-not-normalize>>*
2723+
and normalization, however, see also the *<<do_not_normalize,--no-realign>>*
27232724
option below.
27242725

27252726
*--force*::
@@ -2759,10 +2760,12 @@ the *<<fasta_ref,--fasta-ref>>* option is supplied.
27592760
*--no-version*::
27602761
see *<<common_options,Common Options>>*
27612762

2762-
*-N, --do-not-normalize*[[do_not_normalize]]::
2763-
the '-c s' option can be used to fix or set the REF allele from the
2764-
reference '-f'. The '-N' option will not turn on indel normalisation
2765-
as the '-f' option normally implies
2763+
*-N, --no-realign* ['NUM'][[do_not_normalize]]::
2764+
do not realign indels and symbolic alleles (normally performed
2765+
when *-f, --fasta-ref* option is provided). If 'NUM' is given,
2766+
skip realignment only for events longer than 'NUM' bp. Note
2767+
that 'NUM' must follow with no space, e.g. *-N1000* or *--no-realign=1000*.
2768+
Previously named *--do-not-normalize* and did not accept an argument.
27662769

27672770
*--old-rec-tag* 'STR'::
27682771
Add INFO/STR annotation with the original record. The format of the

test/norm.dup-end.1.out

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
##fileformat=VCFv4.2
2+
##FILTER=<ID=PASS,Description="All filters passed">
3+
##contig=<ID=chr>
4+
##INFO=<ID=END,Number=1,Type=Integer,Description="">
5+
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SVLEN">
6+
#CHROM POS ID REF ALT QUAL FILTER INFO
7+
chr 56 . G <DUP> . . END=61;SVLEN=5

test/norm.dup-end.2.out

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
##fileformat=VCFv4.2
2+
##FILTER=<ID=PASS,Description="All filters passed">
3+
##contig=<ID=chr>
4+
##INFO=<ID=END,Number=1,Type=Integer,Description="">
5+
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SVLEN">
6+
#CHROM POS ID REF ALT QUAL FILTER INFO
7+
chr 57 . G <DUP> . . END=62;SVLEN=5

test/norm.dup-end.3.out

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
##fileformat=VCFv4.2
2+
##FILTER=<ID=PASS,Description="All filters passed">
3+
##contig=<ID=chr>
4+
##INFO=<ID=END,Number=1,Type=Integer,Description="">
5+
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SVLEN">
6+
#CHROM POS ID REF ALT QUAL FILTER INFO
7+
chr 57 . N <DUP> . . END=62;SVLEN=5

test/norm.dup-end.fa

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
>chr
2+
ACAAGTTTGTTCTCATCATCTAATCATGGTCCTCCCGCAAGGTGCTGCCTGATGAGGACTTGGATCATTC
3+
AGAGGCAGTCCCATTTTAGGCTCAGTCCTTT

test/norm.dup-end.vcf

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
##fileformat=VCFv4.2
2+
##contig=<ID=chr>
3+
##INFO=<ID=END,Number=1,Type=Integer,Description="">
4+
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SVLEN">
5+
#CHROM POS ID REF ALT QUAL FILTER INFO
6+
chr 57 . N <DUP> . . END=62;SVLEN=5

test/test.pl

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,12 @@
294294
run_test(\&test_vcf_query,$opts,in=>'query.header',out=>'query.98.2.out',args=>q[-HH -f'%CHROM %POS[ %SAMPLE][ %DP][ %GT]']);
295295
run_test(\&test_vcf_query,$opts,in=>'query.filter-or',out=>'query.filter-or.1.out',args=>q[-f'[%SAMPLE %DP\\n]' -i'DP=1 || DP=2']);
296296
run_test(\&test_vcf_query,$opts,in=>'query.filter-or',out=>'query.filter-or.2.out',args=>q[-f'[%SAMPLE %DP\\n]' -i'DP=1 | DP=2']);
297+
run_test(\&test_vcf_norm,$opts,in=>'norm.dup-end',fai=>'norm.dup-end',out=>'norm.dup-end.1.out',args=>qq[-c s]);
298+
run_test(\&test_vcf_norm,$opts,in=>'norm.dup-end',fai=>'norm.dup-end',out=>'norm.dup-end.1.out',args=>qq[-c s -N10]);
299+
run_test(\&test_vcf_norm,$opts,in=>'norm.dup-end',fai=>'norm.dup-end',out=>'norm.dup-end.2.out',args=>qq[-c s -N]);
300+
run_test(\&test_vcf_norm,$opts,in=>'norm.dup-end',fai=>'norm.dup-end',out=>'norm.dup-end.2.out',args=>qq[-c s -N0]);
301+
run_test(\&test_vcf_norm,$opts,in=>'norm.dup-end',fai=>'norm.dup-end',out=>'norm.dup-end.2.out',args=>qq[-c s -N3]);
302+
run_test(\&test_vcf_norm,$opts,in=>'norm.dup-end',fai=>'norm.dup-end',out=>'norm.dup-end.3.out',args=>qq[-c s -e 'type~"other"']);
297303
run_test(\&test_vcf_norm,$opts,in=>'norm.check-ref',fai=>'norm.check-ref',out=>'norm.check-ref.1.out',args=>qq[-c s]);
298304
run_test(\&test_vcf_norm,$opts,in=>'norm.filter',out=>'norm.filter.1.out',args=>qq[-m +both -i 'ID=\@{PATH}/norm.filter.txt']);
299305
run_test(\&test_vcf_norm,$opts,in=>'norm.filter',out=>'norm.filter.1.out',args=>qq[-m +both -i 'ALT!="C"']);
@@ -318,7 +324,7 @@
318324
run_test(\&test_vcf_norm,$opts,in=>'norm.merge.2',out=>'norm.merge.2.out',args=>'-m+');
319325
run_test(\&test_vcf_norm,$opts,in=>'norm.merge.3',out=>'norm.merge.3.out',args=>'-m+');
320326
run_test(\&test_vcf_norm,$opts,in=>'norm.merge',out=>'norm.merge.strict.out',args=>'-m+ -s');
321-
run_test(\&test_vcf_norm,$opts,in=>'norm.setref',out=>'norm.setref.out',args=>'-Nc s',fai=>'norm');
327+
run_test(\&test_vcf_norm,$opts,in=>'norm.setref',out=>'norm.setref.out',args=>'-N -c s',fai=>'norm');
322328
run_test(\&test_vcf_norm,$opts,in=>'norm.telomere',out=>'norm.telomere.out',fai=>'norm');
323329
run_test(\&test_vcf_norm,$opts,in=>'norm.rmdup',out=>'norm.rmdup.1.out',args=>'-d snps');
324330
run_test(\&test_vcf_norm,$opts,in=>'norm.rmdup',out=>'norm.rmdup.2.out',args=>'-d indels');

0 commit comments

Comments
 (0)