- implementing the spec change in #1238
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1238 +/- ##
===============================================
+ Coverage 69.323% 69.676% +0.353%
- Complexity 8105 8321 +216
===============================================
Files 543 547 +4
Lines 32480 33069 +589
Branches 5501 5685 +184
===============================================
+ Hits 22516 23041 +525
- Misses 7755 7803 +48
- Partials 2209 2225 +16
|
| * to mean "same reference as RNAME field." | ||
| */ | ||
| public static final String RESERVED_MRNM_SEQUENCE_NAME = "="; | ||
| public static final String RESERVED_RNEXT_SEQUENCE_NAME = "="; |
There was a problem hiding this comment.
deprecate this instead of just renaming it
|
|
||
| // Split on any whitespace | ||
| private static final Pattern SEQUENCE_NAME_SPLITTER = Pattern.compile("\\s"); | ||
| private static final Pattern SEQUENCE_NAME_SPLITTER = Pattern.compile("[\\s]"); |
| // These are the chars matched by \\s. | ||
| private static final char[] WHITESPACE_CHARS = {' ', '\t', '\n', '\013', '\f', '\r'}; // \013 is vertical tab | ||
|
|
||
| private static Pattern LEGAL_RNAME_PATTERN = Pattern.compile("[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*"); |
| if (name != null) { | ||
| if (SEQUENCE_NAME_SPLITTER.matcher(name).find()) { | ||
| throw new SAMException("Sequence name contains invalid character: " + name); | ||
| if (!LEGAL_RNAME_PATTERN.matcher(name).useAnchoringBounds(true).matches()) { |
There was a problem hiding this comment.
move this into the validate method
There was a problem hiding this comment.
SEQUENCE_NAME_SPLITTER in both code and comments can be removed correct? I still see it elsewhere.
|
I notice that this enforces RNAME compliance with the regexp regardless of any declared @HD-VN. Did HTSJDK consider making this check dependent on the file's VN header and allowing such RNAMEs in existing files with HD-VN≤1.5 or so? There was discussion in samtools/hts-specs#333 (comment) that it seems important not to gratuitously reject existing data files (if any…) while allowing for diagnostics on newly-produced files. |
|
hmmm. this is a good point. will look into making is 1.5 and on only (not sure how....) |
|
@jmarshall, we have decided (in htsjdk) to implement a blanket rejection of non-compliant sam files. if this turns out to be an undue burden on our users, we will consider putting to work to make htsjdk more version-aware. right now we are working on the next version which should include more version-aware features and it should be easier to do such a thing there...I still think that we were right to keep the SPEC version specific and not declare all old SAM files as non-compliant. |
|
Fair enough. Please take a look at the text added in that new commit on samtools/hts-specs#333 too 😄 |
samtools/hts-specs#333