Added basic helix parsing for mmCIF#140
Conversation
douweschulte
left a comment
There was a problem hiding this comment.
Sorry for the late reply, it is somewhat hard to find time to work on this when I am not working with the library in my phd. Looks like a great start. I like the concept of storing the secondary structures as start and end sequence ids. One small question though, would the insertion code also need to be tracked, because there might be multiple residues with the same serial number.
It looks like the mmCIF spec gives us label_asym_id, label_comp_id, and label_seq_id as guarenteed always present. I have to say it has been a while since looking at mmCIF files so I did forget a bit what all those different things actually mean and which are used in pdbtbx. And to be honest I think this has always been hazy for me thanks to the complexity of the standard.
For the duplication, it does not look too bad. If you find a better way go for it but it looks good enough for now.
Not supporting this in .PDB is fine. Make sure to document that somewhere and then if someone really needs in from .PDB files we can start looking into supporting it there. Having a start on the api on the library makes it a lot easier to decide how to store and parse the things.
Not storing redundant information is something I fully agree with, good job!
| fn parse_helices(input: &Loop, pdb: &mut PDB, options: &ReadOptions) -> Option<Vec<PDBError>> { | ||
| // todo: DRY with parse_atoms throughout this function. | ||
|
|
||
| define_columns!( |
There was a problem hiding this comment.
It might be nice to check the mmCIF docs for which items are required: https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/struct_conf.html.
|
Thanks for all the work! I don't feel great about this implementation. (As you say, the mmCIF (And PDB) standards are both complex. It seems to work in the cases I've tested, but I have not integrated this into a workflow I use regularly. I initially intended it for "ribbon" rendering. |
Some notes:
authids line up with the labels ("serial number") used elsewhere, but those are chain-specific... hence my concern.