Skip to content

Commit 346a94a

Browse files
authored
Add CIGAR =/X entry to version history appendix (PR #743)
These CIGAR operations were added in 07dc1c6 in July 2010, which was the initial addition of the TeX specification; they were not present in the previous Pages document. It's so long ago as to be barely relevant now, but it's worth mentioning them as requiring VN:1.3 rather than VN:1.0. In the SAM regexp in §1.4, write the operations in the familiar canonical order (though it doesn't affect the meaning of the regexp). Define \cigarops{...} to improve the formatting of lists of CIGAR operations like "M/I/D" by making the slashes non-\tt, and also use this in SAMtags.tex.
1 parent 7554e7c commit 346a94a

2 files changed

Lines changed: 16 additions & 7 deletions

File tree

SAMtags.tex

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@
99
\newcommand{\tagregex}[1]{{\tt #1}}
1010
\newcommand{\metavar}[1]{{\rm\emph{#1}}}
1111

12+
% Use as, e.g., \cigarops{MID} to produce M/I/D with the operators in \tt
13+
\newcommand*{\cigarops}[1]{\cigaropsAux#1*}
14+
\def\cigaropsAux#1#2*{{\tt #1}\if\relax\detokenize{#2}\relax\else/\cigaropsAux#2*\fi}
15+
1216
\begin{document}
1317

1418
\input{SAMtags.ver}
@@ -439,7 +443,7 @@ \subsection{Annotation and Padding}
439443
Each tag consists of \emph{start}, \emph{end}, \emph{strand},
440444
\emph{type} and zero or more \emph{key}{\tt =}\emph{value} pairs, each
441445
separated with semicolons. \emph{Start} and \emph{end} are 1-based
442-
positions between one and the sum of the {\tt M/I/D/P/S/=/X}
446+
positions between one and the sum of the \cigarops{MIDPS=X}
443447
{\sf CIGAR} operators, i.e., {\sf SEQ} length plus any pads. Note
444448
any editing of the CIGAR string may require updating the {\tt PT}
445449
tag coordinates, or even invalidate them.

SAMv1.tex

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@
3838

3939
\newcommand*{\memlimited}{\textcolor{gray}{\footnotesize\it limited}}
4040

41+
% Use as, e.g., \cigarops{MID} to produce M/I/D with the operators in \tt
42+
\newcommand*{\cigarops}[1]{\cigaropsAux#1*}
43+
\def\cigaropsAux#1#2*{{\tt #1}\if\relax\detokenize{#2}\relax\else/\cigaropsAux#2*\fi}
44+
4145
\begin{document}
4246

4347
\input{SAMv1.ver}
@@ -423,7 +427,7 @@ \subsection{The alignment section: mandatory fields}\label{sec:alnrecord}
423427
3 & {\sf RNAME} & String & {\tt \verb"\*"|\rnameRegexp} & Reference sequence NAME\footnotemark \\
424428
4 & {\sf POS} & Int & $[0,\,2^{31}-1]$ & 1-based leftmost mapping POSition \\
425429
5 & {\sf MAPQ} & Int & $[0,\,2^8-1]$ & MAPping Quality \\
426-
6 & {\sf CIGAR} & String & {\tt \char92*|([0-9]+[MIDNSHPX=])+} & CIGAR string \\
430+
6 & {\sf CIGAR} & String & {\tt \char92*|([0-9]+[MIDNSHP=X])+} & CIGAR string \\
427431
7 & {\sf RNEXT} & String & {\tt \verb"\*"|=|\rnameRegexp} & Reference name of the mate/next read \\
428432
8 & {\sf PNEXT} & Int & $[0,\,2^{31}-1]$ & Position of the mate/next read \\
429433
9 & {\sf TLEN} & Int & $[-2^{31}+1,\,2^{31}-1]$ & observed Template LENgth \\
@@ -554,7 +558,7 @@ \subsection{The alignment section: mandatory fields}\label{sec:alnrecord}
554558
\item For mRNA-to-genome alignment, an {\tt N} operation represents an
555559
intron. For other types of alignments, the interpretation of {\tt N}
556560
is not defined.
557-
\item Sum of lengths of the {\tt M/I/S/=/X} operations shall equal
561+
\item Sum of lengths of the \cigarops{MIS=X} operations shall equal
558562
the length of {\sf SEQ}.
559563
\end{itemize}
560564
\item {\sf RNEXT}: Reference sequence name of the primary alignment of the NEXT read in the
@@ -638,7 +642,7 @@ \subsection{The alignment section: mandatory fields}\label{sec:alnrecord}
638642
639643
\item {\sf SEQ}: segment SEQuence. This field can be a `*' when the
640644
sequence is not stored. If not a `*', the length of the sequence must
641-
equal the sum of lengths of {\tt M/I/S/=/X} operations in {\sf CIGAR}.
645+
equal the sum of lengths of \cigarops{MIS=X} operations in {\sf CIGAR}.
642646
An `=' denotes the base is identical to the reference base. No
643647
assumptions can be made on the letter cases.
644648
\item {\sf QUAL}: ASCII of base QUALity plus 33 (same as the quality
@@ -725,7 +729,7 @@ \section{Recommended Practice for the SAM Format}
725729
identical to its mate.
726730
\item If all segments in a template are unmapped, their {\sf RNAME}
727731
should be set as `*' and {\sf POS} as 0.
728-
\item If {\sf POS} plus the sum of lengths of {\tt M/=/X/D/N}
732+
\item If {\sf POS} plus the sum of lengths of \cigarops{M=XDN}
729733
operations in {\sf CIGAR} exceeds the length specified in the {\tt
730734
LN} field of the {\tt @SQ} header line (if exists) with an SN
731735
equal to {\sf RNAME}, the alignment should be unmapped, unless the
@@ -757,7 +761,7 @@ \section{Recommended Practice for the SAM Format}
757761
Mappings that cross the coordinate `join' in circular reference sequences (i.e., those whose {\tt @SQ} headers specify {\tt TP:circular}) may be represented as follows:
758762
\begin{enumerate}[label=\arabic*]
759763
\item (Preferred)
760-
As usual {\sf POS} should be between 1 and the {\tt @SQ} header's {\tt LN} value, but {\sf POS} plus the sum of the lengths of {\tt M/=/X/D/N} {\sf CIGAR} operations may exceed {\tt LN}.
764+
As usual {\sf POS} should be between 1 and the {\tt @SQ} header's {\tt LN} value, but {\sf POS} plus the sum of the lengths of \cigarops{M=XDN} {\sf CIGAR} operations may exceed {\tt LN}.
761765
Coordinates greater than~{\tt LN} are interpreted by subtracting {\tt LN} so that bases at $\texttt{LN}+1, \texttt{LN}+2, \texttt{LN}+3, \ldots$ are considered to be mapped at positions $1,2,3,\ldots$; thus each (1-based) position $p$ is interpreted as $((p-1)\bmod\texttt{LN})+1$.%
762766
\footnote{The impact of this representation on indexing and random access is yet to be explored by implementations.}
763767
@@ -1063,7 +1067,7 @@ \subsection{The BAM format}
10631067
& \multicolumn{2}{l|}{\sf next\_pos} & 0-based leftmost pos of the next segment ($=\underline{\sf PNEXT}-1$) & {\tt int32\_t} & [-1] \\\cline{2-6}
10641068
& \multicolumn{2}{l|}{\sf tlen} & Template length ($=\underline{\sf TLEN}$) & {\tt int32\_t} & [0] \\\cline{2-6}
10651069
& \multicolumn{2}{l|}{\sf read\_name} & Read name, {\tt NUL}-terminated (\underline{\sf QNAME} with trailing `{\tt\verb"\0"}')\footnotemark & {\tt char[{\sf l\_read\_name}]} & \\\cline{2-6}
1066-
& \multicolumn{2}{l|}{\sf cigar} & CIGAR: {\tt {\sf op\_len}\char60\char60 4\char124{\sf op}}. `{\tt MIDNSHP\char61X}'$\to$`012345678' & {\tt uint32\_t[{\sf n\_cigar\_op}]} & \\\cline{2-6}
1070+
& \multicolumn{2}{l|}{\sf cigar} & CIGAR: {\tt {\sf op\_len}\char60\char60 4\char124{\sf op}}. `{\tt MIDNSHP=X}'$\to$`012345678' & {\tt uint32\_t[{\sf n\_cigar\_op}]} & \\\cline{2-6}
10671071
& \multicolumn{2}{l|}{\sf seq} & 4-bit encoded read: `{\tt =ACMGRSVTWYHKDBN}'$\to[0,15]$. See Section~\ref{sec:seq} & {\tt uint8\_t[({\sf l\_seq}+1)/2]} & \\\cline{2-6}
10681072
& \multicolumn{2}{l|}{\sf qual} & Phred-scaled base qualities. See Section~\ref{sec:seq} & {\tt char[{\sf l\_seq}]} & \\\cline{2-6}
10691073
& \multicolumn{5}{c|}{\textcolor{gray}{\it List of auxiliary data (until the end of the alignment block)}} \\\cline{3-6}
@@ -1513,6 +1517,7 @@ \subsection*{1.3: July 2010 to April 2011}
15131517
\begin{itemize}
15141518
\item Add {\tt RG PG} header field. (Nov 2010)
15151519
\item Add BAM description and index sections. (Nov 2010)
1520+
\item \textbf{Add `{\tt =}' and `{\tt X}' CIGAR operations.} (July 2010)
15161521
\item \textbf{Removal of FLAG letters.} (July 2010)
15171522
\item The {\tt SM} header field, previously mandatory for {\tt @RG}, is now
15181523
optional. (July 2010)

0 commit comments

Comments
 (0)