Skip to content

Commit f21eff2

Browse files
jmarshallkloetzl
andauthored
Minor kb and gigabytes notation improvements for all VCF specs (#840)
* Write 21100 bases as ~21 kilobases rather than kibibases [minor] * Write out GB as "gigabytes" in this prose [minor] --------- Co-authored-by: Fabian Klötzl <fabian@kloetzl.info>
1 parent b5341fb commit f21eff2

5 files changed

Lines changed: 10 additions & 10 deletions

File tree

VCFv4.1.tex

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -517,7 +517,7 @@ \subsection{Encoding Structural Variants}
517517
\item An imprecise deletion of approximately 105 bp.
518518
\item An imprecise deletion of an ALU element relative to the reference.
519519
\item An imprecise insertion of an L1 element relative to the reference.
520-
\item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
520+
\item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
521521
\item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known).
522522
\end{enumerate}
523523

@@ -957,8 +957,8 @@ \section{BCF specification}
957957
VCF is very expressive, accommodates multiple samples, and is widely used
958958
in the community. Its biggest drawback is that it is big and slow.
959959
Files are text and therefore require a lot of space on disk. A normal batch
960-
of \~100~exomes is a few GB, but large-scale VCFs with thousands of exome
961-
samples quickly become hundreds of GBs. Because the file is text, it is
960+
of \~100~exomes is a few gigabytes, but large-scale VCFs with thousands of exome
961+
samples quickly become hundreds of gigabytes. Because the file is text, it is
962962
extremely slow to parse.
963963

964964
Overall, the idea behind is BCF2 is simple. BCF2 is a binary, compressed

VCFv4.2.tex

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -534,7 +534,7 @@ \subsection{Encoding Structural Variants}
534534
\item An imprecise deletion of approximately 205 bp.
535535
\item An imprecise deletion of an ALU element relative to the reference.
536536
\item An imprecise insertion of an L1 element relative to the reference.
537-
\item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
537+
\item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
538538
\item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known).
539539
\end{enumerate}
540540

@@ -974,8 +974,8 @@ \section{BCF specification}
974974
VCF is very expressive, accommodates multiple samples, and is widely used
975975
in the community. Its biggest drawback is that it is big and slow.
976976
Files are text and therefore require a lot of space on disk. A normal batch
977-
of \~100~exomes is a few GB, but large-scale VCFs with thousands of exome
978-
samples quickly become hundreds of GBs. Because the file is text, it is
977+
of \~100~exomes is a few gigabytes, but large-scale VCFs with thousands of exome
978+
samples quickly become hundreds of gigabytes. Because the file is text, it is
979979
extremely slow to parse.
980980

981981
Overall, the idea behind is BCF2 is simple. BCF2 is a binary, compressed

VCFv4.3.tex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -874,7 +874,7 @@ \subsection{Encoding Structural Variants}
874874
\item An imprecise deletion of approximately 205 bp.
875875
\item An imprecise deletion of an ALU element relative to the reference.
876876
\item An imprecise insertion of an L1 element relative to the reference.
877-
\item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
877+
\item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
878878
\item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known).
879879
\end{enumerate}
880880
@@ -1454,7 +1454,7 @@ \section{BCF specification}
14541454
VCF is very expressive, accommodates multiple samples, and is widely used in the community.
14551455
Its biggest drawback is that it is big and slow.
14561456
Files are text and therefore require a lot of space on disk.
1457-
A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs.
1457+
A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes.
14581458
Because the file is text, it is extremely slow to parse.
14591459
14601460
Overall, the idea behind is BCF2 is simple.

VCFv4.4.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1882,7 +1882,7 @@ \section{BCF specification}
18821882
VCF is very expressive, accommodates multiple samples, and is widely used in the community.
18831883
Its biggest drawback is that it is big and slow.
18841884
Files are text and therefore require a lot of space on disk.
1885-
A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs.
1885+
A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes.
18861886
Because the file is text, it is extremely slow to parse.
18871887
18881888
Overall, the idea behind is BCF2 is simple.

VCFv4.5.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2050,7 +2050,7 @@ \section{BCF specification}
20502050
VCF is very expressive, accommodates multiple samples, and is widely used in the community.
20512051
Its biggest drawback is that it is big and slow.
20522052
Files are text and therefore require a lot of space on disk.
2053-
A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs.
2053+
A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes.
20542054
Because the file is text, it is extremely slow to parse.
20552055
20562056
Overall, the idea behind is BCF2 is simple.

0 commit comments

Comments
 (0)