Skip to content

Added basic helix parsing for mmCIF#140

Open
David-OConnor wants to merge 4 commits into
douweschulte:mainfrom
David-OConnor:helices
Open

Added basic helix parsing for mmCIF#140
David-OConnor wants to merge 4 commits into
douweschulte:mainfrom
David-OConnor:helices

Conversation

@David-OConnor
Copy link
Copy Markdown
Contributor

@David-OConnor David-OConnor commented Feb 17, 2025

Some notes:

  • No sheets yet (There are multiple sheet tables, and I'm not sure how to handle at this time)
  • I'm not sure how to correctly assign the start and end IDs based on chain. Maybe something to do with "label" vs "auth" ids? Or maybe the intent is to infer from the text fields. (mmCIF format issue; not code issue). The auth ids line up with the labels ("serial number") used elsewhere, but those are chain-specific... hence my concern.
  • Lots of repetition with the atom table parsing due to the way the macros are set up inside fns. I wasn't sure how to properly exfiltrate them to re-use.
  • No .PDB support
  • I didn't include AA information or length, but IMO it's redundant with the indices. (An overconstraint)

Copy link
Copy Markdown
Owner

@douweschulte douweschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, it is somewhat hard to find time to work on this when I am not working with the library in my phd. Looks like a great start. I like the concept of storing the secondary structures as start and end sequence ids. One small question though, would the insertion code also need to be tracked, because there might be multiple residues with the same serial number.

It looks like the mmCIF spec gives us label_asym_id, label_comp_id, and label_seq_id as guarenteed always present. I have to say it has been a while since looking at mmCIF files so I did forget a bit what all those different things actually mean and which are used in pdbtbx. And to be honest I think this has always been hazy for me thanks to the complexity of the standard.

For the duplication, it does not look too bad. If you find a better way go for it but it looks good enough for now.

Not supporting this in .PDB is fine. Make sure to document that somewhere and then if someone really needs in from .PDB files we can start looking into supporting it there. Having a start on the api on the library makes it a lot easier to decide how to store and parse the things.

Not storing redundant information is something I fully agree with, good job!

Comment thread src/read/mmcif/parser.rs
fn parse_helices(input: &Loop, pdb: &mut PDB, options: &ReadOptions) -> Option<Vec<PDBError>> {
// todo: DRY with parse_atoms throughout this function.

define_columns!(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to check the mmCIF docs for which items are required: https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/struct_conf.html.

@David-OConnor
Copy link
Copy Markdown
Contributor Author

Thanks for all the work! I don't feel great about this implementation. (As you say, the mmCIF (And PDB) standards are both complex. It seems to work in the cases I've tested, but I have not integrated this into a workflow I use regularly. I initially intended it for "ribbon" rendering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants