Skip to content

Commit e1acf3f

Browse files
committed
#642 recommended but not required meta-information structured header field order
1 parent 6100896 commit e1acf3f

1 file changed

Lines changed: 13 additions & 9 deletions

File tree

VCFv4.4.draft.tex

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -122,18 +122,21 @@ \subsection{Meta-information lines}
122122
\verb|##|\emph{key}\verb|=<|\emph{key}\verb|=|\emph{value}\verb|,|\emph{key}\verb|=|\emph{value}\verb|,|\emph{key}\verb|=|\emph{value}\verb|,|\ldots\verb|>|
123123
\end{quote}
124124
All structured lines require an ID which must be unique within their type, i.e., within all the meta-information lines with the same ``\verb|##|\emph{key}\verb|=|'' prefix.
125-
For all of the structured lines (\verb|##INFO|, \verb|##FORMAT|, \verb|##FILTER|, etc.) described in this specification, extra fields can be included after the default fields.
125+
For all of the structured lines (\verb|##INFO|, \verb|##FORMAT|, \verb|##FILTER|, etc.) described in this specification, optional fields can be included.
126126
For example:
127127
\begin{verbatim}
128128
##INFO=<ID=ALLELEID,Number=A,Type=String,Description="Allele ID",Source="ClinVar",Version="20220804">
129129
\end{verbatim}
130-
In the above example, the extra fields of ``Source'' and ``Version'' are provided.
130+
In the above example, the optional fields of ``Source'' and ``Version'' are provided.
131131
The values of optional fields must be written as quoted strings, even for numeric values.
132-
Other structured lines not defined by this specification may also be used; the only default field for such lines is the required \verb|ID| field.
132+
Other structured lines not defined by this specification may also be used; the only required field for such lines is the required \verb|ID| field.
133133

134134
It is recommended in VCF and required in BCF that the header includes tags describing the reference and contigs backing the data contained in the file.
135135
These tags are based on the SQ field from the SAM spec; all tags are optional (see the VCF example above).
136136

137+
To aid human readability, the order of fields should be ID, Number, Type, Description, then any optional fields.
138+
Implementation must not rely on the order of the fields within structured lines and are not required to preserve field ordering.
139+
137140
Meta-information lines are optional, but if they are present then they must be completely well-formed.
138141
Other than \verb|##fileformat|, they may appear in any order.
139142
Note that BCF, the binary counterpart of VCF, requires that all entries are present.
@@ -150,7 +153,7 @@ \subsubsection{File format}
150153

151154

152155
\subsubsection{Information field format}
153-
INFO fields are described as follows (first four keys are required, source and version are recommended):
156+
INFO meta-information lines are structured lines with require fields of ID, Number, Type, and Description, and Source and Version recommended optional fields:
154157

155158
\begin{verbatim}
156159
##INFO=<ID=ID,Number=number,Type=type,Description="description",Source="source",Version="version">
@@ -177,29 +180,31 @@ \subsubsection{Information field format}
177180
Source and Version values likewise must be surrounded by double-quotes and specify the annotation source (case-insensitive, e.g.\ \verb|"dbsnp"|) and exact version (e.g.\ \verb|"138"|), respectively for computational use.
178181

179182
\subsubsection{Filter field format}
180-
FILTERs that have been applied to the data are described as follows:
183+
FILTER meta-information lines are structured lines with require fields of ID and Description that define the possible content of the FILTER column in the VCF records:
181184

182185
\begin{verbatim}
183186
##FILTER=<ID=ID,Description="description">
184187
\end{verbatim}
185188

186189
\subsubsection{Individual format field format}
187-
Genotype fields specified in the FORMAT field are described as follows:
190+
FORMAT meta-information lines are structured lines with require fields of ID, Number, Type, and Description that define the possible content of the per-sample/genotype columns in the VCF records:
188191

189192
\begin{verbatim}
190193
##FORMAT=<ID=ID,Number=number,Type=type,Description="description">
191194
\end{verbatim}
192195

193196
Possible Types for FORMAT fields are: Integer, Float, Character, and String (this field is otherwise defined precisely as the INFO field).
197+
The Number field is defined as per the INFO Number field.
194198

195199
\subsubsection{Alternative allele field format} \label{altfield}
196-
Symbolic alternate alleles are described as follows:
200+
ALT meta-information lines are structured lines with require fields of ID and Description that describe the possible symbolic alternate alleles in the ALT column of the VCF records:
201+
197202
\begin{verbatim}
198203
##ALT=<ID=type,Description="description">
199204
\end{verbatim}
200205

201206
\noindent \textbf{Structural Variants} \newline
202-
In symbolic alternate alleles for imprecise structural variants, the ID field indicates the type of structural variant, and can be a colon-separated list of types and subtypes.
207+
In symbolic alternate alleles for structural variants, the ID field indicates the type of structural variant, and can be a colon-separated list of types and subtypes.
203208
ID values are case sensitive strings and must not contain whitespace, commas or angle brackets.
204209
The first level type must be one of the following:
205210
\begin{itemize}
@@ -232,7 +237,6 @@ \subsubsection{Alternative allele field format} \label{altfield}
232237
##ALT=<ID=M,Description="IUPAC code M = A/C">
233238
\end{verbatim}
234239

235-
236240
\subsubsection{Assembly field format}
237241
Breakpoint assemblies for structural variations may use an external file:
238242
\begin{verbatim}

0 commit comments

Comments
 (0)