You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CRAMv3.tex
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -850,7 +850,7 @@ \subsubsection*{Tag values}
850
850
The encodings used for different tags are stored in a map.
851
851
The key is 3 bytes formed from the BAM tag id and type code, matching the TD dictionary described above.
852
852
Unlike the Data Series Encoding Map, the key is stored in the map as an ITF8 encoded integer, constructed using $(char1<<16) + (char2<<8) + type$.
853
-
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
853
+
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
Following the base modification codes is a recommended but optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools.
496
496
When this flag is `{\tt ?}' there is no information about the modification status of the skipped bases provided.
497
-
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilites below a threshold to provide a more compact modification tag.}
497
+
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilities below a threshold to provide a more compact modification tag.}
498
498
499
499
This is then followed by a comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, starting from the uncomplemented 5' end of the {\sf SEQ} field.
500
500
This number series is comparable to the numbers in an {\tt MD} tag,
\item PSO (List of integers): List of phase set ordinals.
612
-
For each phase-set name, defines the order in which variants are encountered when traversing a derivate chromosome.
612
+
For each phase-set name, defines the order in which variants are encountered when traversing a derivative chromosome.
613
613
The missing value '$.$' should be used when the corresponding PSO value is missing.
614
614
For each phase-set name, PSO should be defined if any allele with that phase-set name on any record is symbolic structural variant or in breakpoint notation.
615
615
Variants in breakpoint notation must have the same PSL and PSO on both records.
616
616
617
-
Without explicitly specifying the derivate chromosome traversal order, multiple derivate chromosome reconstructions are possible.
617
+
Without explicitly specifying the derivative chromosome traversal order, multiple derivative chromosome reconstructions are possible.
618
618
Take for example this tandem duplication in a triploid organism with SNVs (ID/QUAL/FILTER columns removed for clarity):
619
619
620
620
\vspace{0.5em}
@@ -831,7 +831,7 @@ \section{INFO keys used for structural variants}
831
831
\item BFB - breakage fusion bridge
832
832
\item DOUBLEMINUTE - Double minute
833
833
\end{itemize}
834
-
The sematics of other $EVENTTYPE$ values is implementation-defined.
834
+
The semantics of other $EVENTTYPE$ values is implementation-defined.
835
835
The use of $EVENT$ is not restricted to structural variation and can also be used to associate non-symbolic alleles.
836
836
Such linking is useful for scenarios such as kataegis or when there is variant position ambiguity in segmentally duplicated regions.
837
837
@@ -926,7 +926,7 @@ \section{INFO keys used for structural variants}
926
926
Used by $<$CNV:TR$>$ tandem repeat alleles to encode information about the nature of the tandem repeats contained for ALT alleles.
927
927
Conceptually, these fields each contain a list of values for each ALT allele.
928
928
The length of these inner lists are determined by the RN field for that ALT allele and the length must match the sum of RN for the record.
929
-
These fields contain the flattened and concatentated list contents in the same order as either corresponding ALT allele.
929
+
These fields contain the flattened and concatenated list contents in the same order as either corresponding ALT allele.
930
930
931
931
Each $<$CNV:TR$>$ allele consistents of $RN$ repeat sequences each containing $RUC$ repeat units with sequence $RUS$.
932
932
@@ -1770,9 +1770,9 @@ \subsection{Representing tandem repeats}
1770
1770
It is not the length of the $<$CNV:TR$>$ allele.
1771
1771
\item The SVLEN of the $<$CNV:TR$>$ allele of a novel (with respect to the reference) tandem repeat should be 1.
1772
1772
\item The POS of the $<$CNV:TR$>$ allele of a novel (with respect to the reference) tandem repeat should be the base immediately preceding the inserted tandem repeat sequence.
1773
-
\item Both a $<$CNV:TR$>$ and one or more non-symoblic records encoding the tandem repeat can be present.
1774
-
\item$<$CNV:TR$>$ and the non-symoblic records encoding the tandem repeat should be phased if possible.
1775
-
\item When both $<$CNV:TR$>$ and the equivalent non-symoblic records are present, the $<$CNV:TR$>$ should approximately encode the sequence but is not required to encode the sequence exactly.
1773
+
\item Both a $<$CNV:TR$>$ and one or more non-symbolic records encoding the tandem repeat can be present.
1774
+
\item$<$CNV:TR$>$ and the non-symbolic records encoding the tandem repeat should be phased if possible.
1775
+
\item When both $<$CNV:TR$>$ and the equivalent non-symbolic records are present, the $<$CNV:TR$>$ should approximately encode the sequence but is not required to encode the sequence exactly.
1776
1776
For example, SNVs and indels may be omitted in the $<$CNV:TR$>$ record.
1777
1777
\item Variant callers which do not report allele-specific tandem repeats should use a single $<$CNV:TR$>$ ALT allele and the missing genotype for the GT field (for example, $./.$ if diploid).
1778
1778
\item The INFO and FORMAT CN fields should be present for $<$CNV:TR$>$ records (as they are $<$CNV$>$ records) and, when present, must correspond to the sample allelic length divided by the reference allelic length.
@@ -1805,7 +1805,7 @@ \subsection{Representing tandem repeats}
1805
1805
\begin{itemize}
1806
1806
\item RN was omitted as it is only required if at least one $<$CNV:TR$>$ allele has RN greater than 1.
1807
1807
\item The confidence interval bounds are relative to the nominal value.
1808
-
\item A missing upper bouns indicates the maximum length is not known.
1808
+
\item A missing upper bounds indicates the maximum length is not known.
1809
1809
\end{itemize}
1810
1810
1811
1811
Exactly representing nested repeats results in the loss of some repeat information when representing with a $<$CNV:TR$>$ record.
Copy file name to clipboardExpand all lines: refget.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -571,7 +571,7 @@ Key to generating reproducible checksums is the normalisation algorithm applied
571
571
- VMC
572
572
- VMC requires sequence to be a string of IUPAC codes for either nucelotide or protein sequence
573
573
574
-
Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorthim would require a new checksum identifier to be used.
574
+
Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorithm would require a new checksum identifier to be used.
0 commit comments