Skip to content

Commit 3d3da3a

Browse files
committed
[DRAFT] Clarify SAM file encoding (ASCII, UTF-8 "subset")
1 parent 59a0d0c commit 3d3da3a

1 file changed

Lines changed: 6 additions & 2 deletions

File tree

SAMv1.tex

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,12 @@ \section{The SAM Format Specification}
6767
BAM file may optionally specify the version being used via the
6868
{\tt @HD VN} tag. For full version history see Appendix~\ref{sec:history}.
6969

70-
Unless explicitly specified elsewhere, all fields are encoded using 7-bit US-ASCII \footnote{Charset ANSI\_X3.4-1968 as defined in RFC1345.} in using the POSIX / C locale.
71-
Regular expressions listed use the POSIX / IEEE Std 1003.1 extended syntax.
70+
SAM files are encoded in UTF-8.
71+
They must not begin with a byte order mark, and non-ASCII characters are permitted only in certain field values as individually specified.%
72+
\footnote{Equivalently, SAM files primarily contain US-ASCII characters in the usual single-byte encoding; certain field values as specified may contain other Unicode characters and are encoded as UTF-8.}
73+
SAM file contents should be read and written using the POSIX / C locale.%
74+
\footnote{For example, floating-point values in SAM always use `{\tt .}' (\textsc{Full Stop}) for the decimal-point character.}
75+
The regular expressions in this specification have been written using the POSIX / IEEE Std 1003.1 extended syntax.
7276

7377
\subsection{An example}\label{sec:example}
7478
Suppose we have the following alignment with bases in lowercase

0 commit comments

Comments
 (0)