Skip to content

isolib: strip terminal polarity markers from formulas before isotopic parsing#758

Open
Copilot wants to merge 7 commits into
masterfrom
copilot/isolib-handle-polarity-information
Open

isolib: strip terminal polarity markers from formulas before isotopic parsing#758
Copilot wants to merge 7 commits into
masterfrom
copilot/isolib-handle-polarity-information

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 2, 2026

isolib failed on formulas containing polarity/charge suffixes (+/- forms), even though polarity does not affect isotopic pattern generation. This change normalizes formula strings before enviPat parsing so charged input representations are handled consistently.

  • Formula preprocessing (R runtime)

    • Normalize compound_table$formula in parse_args() by removing only terminal sign-led charge/polarity suffixes.
    • Supported stripped suffixes include: +, -, +2, -2 (with optional trailing whitespace context at end of string).
  • Regression coverage (Galaxy tool tests)

    • Added tools/isolib/test-data/markers_no_rt_polarity.tsv with formulas carrying polarity suffixes.
    • Added a new test case in tools/isolib/isolib.xml that runs this input and asserts equivalence with existing expected output (test1.msp), confirming polarity markers are ignored for isotopic pattern computation.
compound_table$formula <- stringr::str_remove(
    compound_table$formula,
    "\\s*[+-]\\d*$"
)

Copilot AI linked an issue Jun 2, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Fix polarity handling in formulas for isotopic patterns isolib: strip terminal polarity markers from formulas before isotopic parsing Jun 2, 2026
Copilot finished work on behalf of hechth June 2, 2026 09:16
Copilot AI requested a review from hechth June 2, 2026 09:16
@hechth
Copy link
Copy Markdown
Member

hechth commented Jun 2, 2026

@copilot limit the strip to only +X or -X, because otherwise you might not be sure whether the previous digit refers to the number of atoms of the last type or so ... normally there should only be + and - without any digits. The test case looks good and can stay like this.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jun 2, 2026

@copilot limit the strip to only +X or -X, because otherwise you might not be sure whether the previous digit refers to the number of atoms of the last type or so ... normally there should only be + a...

Updated in commit 2148512. The regex now strips only sign-led terminal suffixes (+, -, +N, -N) and no longer strips digit-led forms.

Copilot finished work on behalf of hechth June 2, 2026 09:23
@hechth hechth marked this pull request as ready for review June 2, 2026 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

isolib: handle polarity information in formulas

2 participants