-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAM/VCF Strict draft #283
base: master
Are you sure you want to change the base?
SAM/VCF Strict draft #283
Conversation
@jkbonfield I'd rather not expand this particular PR to include changes to the SAM specs themselves. My PR does appear to be highlighting some issues with SAMv1.5 that have avoided scrutiny, but I think the best course of action is to get a SAM v1.5 strict document and validator implementation based on the v1.5 specifications before updating the specs. |
That's fine - I can accept that as an order in which things are done. Yes I think there are problems with the spec which haven't been spotted as no one has really gone over it with such a fine tooth-comb before. Thank you for doing this. |
Could you do me a favor and include the pdf in these commits? when we get to the end you can remove the pdf and compile it "correctly" for the sake of having the PDF versioned. |
I was wondering if we want the error "code" to not have spaces in it. This will help when we want to run a validation tool on the commandline and specify which codes to ignore. Spaces will mean that every code needs to be surrounded by quotes. COULD_WE_USE_THIS_FORMAT_FOR_CODES? (not shouting, just suggesting!) |
The TeX is invalid at the moment, meaning a PDF cannot currently be produced. I'm not a TeX expert, but it looks like you need eg |
The version in https://github.com/jkbonfield/hts-specs/tree/strict builds now, but needs a lot of formatting still. I'm not planning on doing this myself - this was just a test to see how to get it in a readable format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!! there seems to be a lo of work left at the bottom part of the document, but I like the rigor so far.
I put some comments in the discussion...
|
||
\section{Headers} | ||
|
||
{paragraph} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a \begin or a \section missing here? looks like copy-paste error
|
||
{paragraph} | ||
|
||
The first segment is is considered to be the "next" segment of the final segment in a template as per the SAM specifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems out of place
|
||
|
||
\section{Headers} | ||
\samstrictrule{Undefined reserved header present}{Upper-case header record type codes are not defined in the SAM specifications must not be used.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "that" between codes
and are
(or remove are
if that's the style you're going for)
\section{Headers} | ||
\samstrictrule{Undefined reserved header present}{Upper-case header record type codes are not defined in the SAM specifications must not be used.} | ||
\samstrictrule{Undefined header tag present}{Upper-case header tags not defined in the SAM specifications must not be used.} | ||
\samstrictrule{Tag present as both lowercase and uppercase}{A file should not contain the same tag in both upper-case and lowercase format. See the SAM specifications header tags best practice footnote.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tag is ambiguous, should be header tag
(both in code and text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't interpret that foot-note the same way.....I interpret it as "lower case is a free for all, for upper case tags open an issue in hts-spec"
\samstrictrule{Undefined reserved header present}{Upper-case header record type codes are not defined in the SAM specifications must not be used.} | ||
\samstrictrule{Undefined header tag present}{Upper-case header tags not defined in the SAM specifications must not be used.} | ||
\samstrictrule{Tag present as both lowercase and uppercase}{A file should not contain the same tag in both upper-case and lowercase format. See the SAM specifications header tags best practice footnote.} | ||
\samstrictrule{Malformed header}{Header lines with conform to either the {\tt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"malformed header line"
s/with/must/
either
is generally used with two options...could you reword without either
?
\samstrictrule{SEQ QUAL length mismatch.}{The length of a non-* QUAL must match the length of SEQ.}{\samrule} | ||
\samstrictrule{Invalid QUAL}{The ASCII value of all QUAL bases must be at least 33.} | ||
\samstrictrule{QUAL of secondary alignments specified.}{QUAL of secondary alignments (0x100 FLAG set) should be set to * to reduce the file size.}{\v15bestpractice} | ||
\samstrictrule{TODO: QUAL edge case}{What should we do when a read is length 1 and the QUAL encodes to "*" ?}{\samrule} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say that spending any time on this edge case seems to be a waste...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've actually hit that one before! Hah.
\samstrictrule{Unknown reserved tag}{ | ||
No record can include any reserved tags not defined in the | ||
{\sl Sequence Alignment/Map Optional Fields Specification}. | ||
Non-standard tags must start X, Y, Z or a lowercase letter as per the SAM specifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...must start with...
} | ||
\samstrictrule{Incorrect tag type}{The \textit{type} of all \textit{standard tags} must match the type | ||
defined in the {\sl Sequence Alignment/Map Optional Fields Specification}.} | ||
\samstrictrule{Duplicate tag}{}{\samrule} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no text for "duplicate tag"?
|
||
|
||
\subsection{RG} | ||
3 When a RG tag appears anywhere in the alignment section, there should be a single corresponding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove 3 and 4 here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should->must
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, need error codes
3 When a RG tag appears anywhere in the alignment section, there should be a single corresponding | ||
@RG line with matching ID tag in the header.{\v15bestpractice} | ||
\subsection{RG} | ||
4 When a PG tag appears anywhere in the alignment section, there should be a single corresponding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this duplicated?
Any discussion or review comments on SAM and VCF strictness proposals are naturally separate. Would it not be more practical to open a VCF strict proposal as a new separate PR? |
I agree with a comment from @jmarshall above about splitting this into two. Since in its current form it concerns both VCF and SAM, I've added the appropriate labels for the time being |
Happy to progress these specs if someone has funding/budget/time to actually write a validator |
PR for discussing proposed SAM Strict rule changes. Work in progress continuing from #281 discussion.