I believe this is related to samtools/hts-specs#593
I'm working on BCF 2.2 writing for htsjdk, and we are trying to verify that our codec is compatible with bcftools' codec.
Following how we interpreted the wording of the spec, we output a vector of EOV values when a certain genotype key is missing entirely for a sample, as opposed to present but containing missing values. When bcftools reads our BCF output and writes it as VCF, it writes the empty string for that particular sample, where we would expect either . or .,., and this does not seem to be valid VCF.
An example of what we're seeing is in the CNL attribute of the second sample where we have 1|0::8
GT:CNL:DP:GQ:HQ 0|0:10,20:1:48:25,30 1|0::8:48:49,51 ./.:1:5:43:.,.
The original VCF from which the data was obtained had the following line. The order of the genotype keys is different because htsjdk sorts them. You can see CNL for the second sample is just a missing value ..
GT:GQ:DP:HQ:CNL 0|0:48:1:25,30:10,20 1|0:48:8:49,51:. ./.:43:5:.,.:1
This original VCF was converted to BCF using htsjdk's (work in progress) BCF 2.2 writer, which writes [EOV, EOV] for the missing CNL attribute, and then fed to bcftools, which is producing the incorrect VCF.
I believe this is related to samtools/hts-specs#593
I'm working on BCF 2.2 writing for htsjdk, and we are trying to verify that our codec is compatible with bcftools' codec.
Following how we interpreted the wording of the spec, we output a vector of
EOVvalues when a certain genotype key is missing entirely for a sample, as opposed to present but containing missing values. When bcftools reads our BCF output and writes it as VCF, it writes the empty string for that particular sample, where we would expect either.or.,., and this does not seem to be valid VCF.An example of what we're seeing is in the
CNLattribute of the second sample where we have1|0::8The original VCF from which the data was obtained had the following line. The order of the genotype keys is different because htsjdk sorts them. You can see
CNLfor the second sample is just a missing value..This original VCF was converted to BCF using htsjdk's (work in progress) BCF 2.2 writer, which writes
[EOV, EOV]for the missingCNLattribute, and then fed to bcftools, which is producing the incorrect VCF.