Skip to content

Commit 807ea94

Browse files
committed
Typos workflow
1 parent 7ab198a commit 807ea94

11 files changed

Lines changed: 28 additions & 22 deletions

File tree

CRAMv2.1.tex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -694,7 +694,7 @@ \subsubsection*{Encoding tag values}
694694
keys composed of the two letter tag abbreviation followed by the tag type as defined
695695
in the SAM specification, for example `OQZ' for `OQ:Z'. The three bytes form a
696696
big endian integer and are written as ITF8. For example, 3-byte representation
697-
of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer 0x004F515A.
697+
of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer 0x004F515A.
698698
The integer is finally written as ITF8.
699699

700700
\begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
@@ -1640,7 +1640,7 @@ \subsubsection*{BYTE\_ARRAY\_LEN }
16401640

16411641
\subsubsection*{BYTE\_ARRAY\_STOP }
16421642

1643-
Byte arrays are captured as a sequence of bytes teminated by a special stop byteFor
1643+
Byte arrays are captured as a sequence of bytes terminated by a special stop byteFor
16441644
example this could be a golomb encoding. The parameter for BYTE\_ARRAY\_STOP are
16451645
listed below:
16461646

CRAMv3.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -850,7 +850,7 @@ \subsubsection*{Tag values}
850850
The encodings used for different tags are stored in a map.
851851
The key is 3 bytes formed from the BAM tag id and type code, matching the TD dictionary described above.
852852
Unlike the Data Series Encoding Map, the key is stored in the map as an ITF8 encoded integer, constructed using $(char1<<16) + (char2<<8) + type$.
853-
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
853+
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
854854

855855
\begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
856856
\hline

SAMtags.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -494,7 +494,7 @@ \subsection{Base modifications}
494494

495495
Following the base modification codes is a recommended but optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools.
496496
When this flag is `{\tt ?}' there is no information about the modification status of the skipped bases provided.
497-
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilites below a threshold to provide a more compact modification tag.}
497+
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilities below a threshold to provide a more compact modification tag.}
498498

499499
This is then followed by a comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, starting from the uncomplemented 5' end of the {\sf SEQ} field.
500500
This number series is comparable to the numbers in an {\tt MD} tag,

VCFv4.4.draft.tex

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -609,12 +609,12 @@ \subsubsection{Genotype fields}
609609
\end{tabular}
610610
611611
\item PSO (List of integers): List of phase set ordinals.
612-
For each phase-set name, defines the order in which variants are encountered when traversing a derivate chromosome.
612+
For each phase-set name, defines the order in which variants are encountered when traversing a derivative chromosome.
613613
The missing value '$.$' should be used when the corresponding PSO value is missing.
614614
For each phase-set name, PSO should be defined if any allele with that phase-set name on any record is symbolic structural variant or in breakpoint notation.
615615
Variants in breakpoint notation must have the same PSL and PSO on both records.
616616
617-
Without explicitly specifying the derivate chromosome traversal order, multiple derivate chromosome reconstructions are possible.
617+
Without explicitly specifying the derivative chromosome traversal order, multiple derivative chromosome reconstructions are possible.
618618
Take for example this tandem duplication in a triploid organism with SNVs (ID/QUAL/FILTER columns removed for clarity):
619619
620620
\vspace{0.5em}
@@ -831,7 +831,7 @@ \section{INFO keys used for structural variants}
831831
\item BFB - breakage fusion bridge
832832
\item DOUBLEMINUTE - Double minute
833833
\end{itemize}
834-
The sematics of other $EVENTTYPE$ values is implementation-defined.
834+
The semantics of other $EVENTTYPE$ values is implementation-defined.
835835
The use of $EVENT$ is not restricted to structural variation and can also be used to associate non-symbolic alleles.
836836
Such linking is useful for scenarios such as kataegis or when there is variant position ambiguity in segmentally duplicated regions.
837837
@@ -926,7 +926,7 @@ \section{INFO keys used for structural variants}
926926
Used by $<$CNV:TR$>$ tandem repeat alleles to encode information about the nature of the tandem repeats contained for ALT alleles.
927927
Conceptually, these fields each contain a list of values for each ALT allele.
928928
The length of these inner lists are determined by the RN field for that ALT allele and the length must match the sum of RN for the record.
929-
These fields contain the flattened and concatentated list contents in the same order as either corresponding ALT allele.
929+
These fields contain the flattened and concatenated list contents in the same order as either corresponding ALT allele.
930930
931931
Each $<$CNV:TR$>$ allele consistents of $RN$ repeat sequences each containing $RUC$ repeat units with sequence $RUS$.
932932
@@ -1770,9 +1770,9 @@ \subsection{Representing tandem repeats}
17701770
It is not the length of the $<$CNV:TR$>$ allele.
17711771
\item The SVLEN of the $<$CNV:TR$>$ allele of a novel (with respect to the reference) tandem repeat should be 1.
17721772
\item The POS of the $<$CNV:TR$>$ allele of a novel (with respect to the reference) tandem repeat should be the base immediately preceding the inserted tandem repeat sequence.
1773-
\item Both a $<$CNV:TR$>$ and one or more non-symoblic records encoding the tandem repeat can be present.
1774-
\item $<$CNV:TR$>$ and the non-symoblic records encoding the tandem repeat should be phased if possible.
1775-
\item When both $<$CNV:TR$>$ and the equivalent non-symoblic records are present, the $<$CNV:TR$>$ should approximately encode the sequence but is not required to encode the sequence exactly.
1773+
\item Both a $<$CNV:TR$>$ and one or more non-symbolic records encoding the tandem repeat can be present.
1774+
\item $<$CNV:TR$>$ and the non-symbolic records encoding the tandem repeat should be phased if possible.
1775+
\item When both $<$CNV:TR$>$ and the equivalent non-symbolic records are present, the $<$CNV:TR$>$ should approximately encode the sequence but is not required to encode the sequence exactly.
17761776
For example, SNVs and indels may be omitted in the $<$CNV:TR$>$ record.
17771777
\item Variant callers which do not report allele-specific tandem repeats should use a single $<$CNV:TR$>$ ALT allele and the missing genotype for the GT field (for example, $./.$ if diploid).
17781778
\item The INFO and FORMAT CN fields should be present for $<$CNV:TR$>$ records (as they are $<$CNV$>$ records) and, when present, must correspond to the sample allelic length divided by the reference allelic length.
@@ -1805,7 +1805,7 @@ \subsection{Representing tandem repeats}
18051805
\begin{itemize}
18061806
\item RN was omitted as it is only required if at least one $<$CNV:TR$>$ allele has RN greater than 1.
18071807
\item The confidence interval bounds are relative to the nominal value.
1808-
\item A missing upper bouns indicates the maximum length is not known.
1808+
\item A missing upper bounds indicates the maximum length is not known.
18091809
\end{itemize}
18101810
18111811
Exactly representing nested repeats results in the loss of some repeat information when representing with a $<$CNV:TR$>$ record.
@@ -1964,7 +1964,7 @@ \subsubsection{Site encoding}
19641964
CHROM & int32\_t & Given as an offset into the mandatory contig dictionary \\ \hline
19651965
POS & int32\_t & 0-based leftmost coordinate \\ \hline
19661966
rlen & int32\_t & Length of the record as projected onto the reference sequence.
1967-
Must be the maxmimum of the length of the REF allele and the length
1967+
Must be the maximum of the length of the REF allele and the length
19681968
inferred from the SVLEN/END of any symbolic alleles \\ \hline
19691969
QUAL & float & Variant quality; 0x7F800001 for a missing value \\ \hline
19701970
n\_info & uint16\_t & The number of INFO fields in this record \\ \hline
@@ -2506,7 +2506,7 @@ \subsection{Changes between VCFv4.4 and VCFv4.3}
25062506
25072507
\begin{itemize}
25082508
\item Added tandem repeat support ($<$CNV:TR$>$, RN, RUS, RUL, RB, CIRB, RUC, CIRUC, RUB)
2509-
\item Added support for phasing and derivate chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ)
2509+
\item Added support for phasing and derivative chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ)
25102510
\item Added SVCLAIM to disambiguate copy number based $<$DEL$>$ and $<$DUP$>$ variants from breakpoint based ones.
25112511
\item Conceptually separated variant detection and interpretation.
25122512
\item Added EVENTTYPE/EVENT to enable the multiple records encoding complex genomic rearrangements to be grouped together.

_typos.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[default.extend-words]
2+
FO="FO"
3+
BA="BA"
4+
nd="nd"
5+
Hsi="Hsi"
6+
Apon="Apon"

crypt4gh.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ \subsection{File Structure}
271271
\draw (header packet.four split south) to (data encryption packet.north west);
272272
\draw (header packet.five split south) to (data encryption packet.north east);
273273
\node (data encryption packet notes) at (data encryption packet -| file notes) [notes] {
274-
\textbf{Data Encyption Packet (plain-text)} \\
274+
\textbf{Data Encryption Packet (plain-text)} \\
275275
Stores $K_{data}$
276276
};
277277

refget.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -571,7 +571,7 @@ Key to generating reproducible checksums is the normalisation algorithm applied
571571
- VMC
572572
- VMC requires sequence to be a string of IUPAC codes for either nucelotide or protein sequence
573573

574-
Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorthim would require a new checksum identifier to be used.
574+
Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorithm would require a new checksum identifier to be used.
575575

576576
### Checksum Choice
577577

test/SAMtags/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Mm and Ml auxiliary tags
22
========================
33

44
The purpose of these test files is to test parsing of the Mm and Ml
5-
tags. These succint Mm and Ml tags are present in the .sam files,
5+
tags. These succinct Mm and Ml tags are present in the .sam files,
66
with a more human readable expanded form in the .txt files.
77
Developers should check whether their implementation is able to
88
convert between the two forms.

test/SAMtags/parse_mm.pl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ sub rc {
7373

7474
my $i = 0; # I^{th} bosition in sequence
7575
foreach my $delta (split(",", $pos)) {
76-
# Skip $delta occurences of $base
76+
# Skip $delta occurrences of $base
7777
do {
7878
$delta-- if ($base eq "N" || $base eq $seq[$i]);
7979
$i++;

test/sam/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ CIGAR
234234
- Reads entirely consisting of insertions (no bases on ref)
235235
- At pos 1; every base is prior to start of ref
236236
- Neighbouring matching ops, eg 1D1D, 10M10M
237-
- (Cicular genomes? needs more work.)
237+
- (Circular genomes? needs more work.)
238238
- Very large CIGAR strings (BAM has a 64K limit so tools that parse
239239
SAM into in-memory BAM may fail).
240240

@@ -403,7 +403,7 @@ Aux
403403
- General syntax
404404
- Other types (including case change variants of above; I, z, etc)
405405
- Aux tag not 2 chars
406-
- Aux tag occuring multiple times
406+
- Aux tag occurring multiple times
407407

408408

409409
Todo

0 commit comments

Comments
 (0)