Skip to content

implement generic inchitomol() with stereo support#138

Merged
mojaie merged 5 commits into
mojaie:masterfrom
hhaensel:master
Oct 2, 2025
Merged

implement generic inchitomol() with stereo support#138
mojaie merged 5 commits into
mojaie:masterfrom
hhaensel:master

Conversation

@hhaensel
Copy link
Copy Markdown
Contributor

@hhaensel hhaensel commented Sep 24, 2025

This PR solves #137 and supports parsing of stereo information.

julia> mol1 = smilestomol("[D]O[D]")
{3, 2} simple molecular graph SMILESMolGraph

julia> i1 = inchi(mol1)
"InChI=1S/H2O/h1H2/i/hD2"

julia> inchitomol(i1)
{3, 2} simple molecular graph SDFMolGraph

julia> mol2 = smilestomol("C[C@H](O)C(=O)O")
{7, 6} simple molecular graph SMILESMolGraph

julia> i2 = inchi(mol2)
"InChI=1S/C3H6O3/c1-2(4)3(5)6/h2,4H,1H3,(H,5,6)/t2-/m0/s1"

julia> inchitomol(i2)
{7, 6} simple molecular graph SDFMolGraph

julia> inchitomol(i2, stereo = false)
{7, 6} simple molecular graph SDFMolGraph
image image image

@hhaensel
Copy link
Copy Markdown
Contributor Author

Something still needs a fix. The demomol from the inchi test fails

@hhaensel hhaensel marked this pull request as ready for review September 25, 2025 11:32
This was referenced Sep 25, 2025
@mojaie
Copy link
Copy Markdown
Owner

mojaie commented Sep 25, 2025

I haven't checked it very well yet, but I think it's wonderful. Thank you very much.
I'm considering resolving the isotope issue by creating simple aliases like the following. Is that okay?

SDFAtom(; sym=:D, kwargs...) = SDFAtom(; sym=:H, isotope=2, kwargs...)

@hhaensel
Copy link
Copy Markdown
Contributor Author

hhaensel commented Sep 26, 2025

If we'd go that way, we would need to adapt the drawing routines appropriately, see e.g.

drawsvg(smilestomol("[D]O[2H]"))
image

We might introduce a switch to distiinguish whether we want '2H' or 'D' to be displayed.
For my application I have left everything as is, and implemented routines to exchange D with 2H

function replace_atom(x::T, symbol::Symbol) where T
    values = [field == :symbol ? symbol : getfield(x, field) for field in fieldnames(T)]
    T(values...)
end

function replace_atom(x::T, symbol::Symbol, isotope::Union{Nothing,Int}) where T
    values = [field == :symbol ? symbol : field == :isotope ? isotope : getfield(x, field) for field in fieldnames(T)]
    T(values...)
end

# write all deuterium atoms as '2H'
function d_to_h!(mol)
    for (k, v) in mol.vprops
        if v.symbol == :D
            mol.vprops[k] = replace_atom(v, :H, 2)
        end
    end
    mol
end

# write all deuterium atoms as 'D'
function h_to_d!(mol)
    for (k, v) in mol.vprops
        if v.symbol == :H && v.isotope == 2 || v.symbol == :D && v.isotope != 2
            mol.vprops[k] = replace_atom(v, :D, 2)
        end
    end
    mol
end

d_to_h(mol) = d_to_h!(copy(mol))
h_to_d(mol) = h_to_d!(copy(mol))

Moreover, I have defined matching functions that work independently of the notation. But having a generalised approach that forces a unique represenation is certainly a good idea.

function distinguish_deuterated(mol1::MolGraph, mol2::MolGraph)
    sym1 = atom_symbol(mol1)
    sym2 = atom_symbol(mol2)
    geo1 = hybridization(mol1)
    geo2 = hybridization(mol2)
    isotope1 = Vector{Union{Nothing,Int}}(nothing, length(sym1))
    @inbounds for i in 1:length(sym1)
        if sym1[i] == :D
            sym1[i] = :H
            isotope1[i] = 2
        end
        sym1[i] == :H || continue
        m = MolecularGraph.get_prop(mol1, i, :isotope)
        m  (1, nothing) && (isotope1[i] = m)
    end
    isotope2 = Vector{Union{Nothing,Int}}(nothing, length(sym2))
    @inbounds for i in 1:length(sym2)
        if sym2[i] == :D
            sym2[i] = :H
            isotope2[i] = 2
        end
        sym2[i] == :H || continue
        m = MolecularGraph.get_prop(mol2, i, :isotope)
        m  (1, 0, nothing) && (isotope2[i] = m)
    end

    return (v1, v2) -> sym1[v1] == sym2[v2] && geo1[v1] === geo2[v2] && isotope1[v1] == isotope2[v2]
end

function match_deuterated(mol1::MolGraph, mol2::MolGraph)
    # distinguish D and H
    sym1 = replace(atom_symbol(mol1), :D => :H)
    sym2 = replace(atom_symbol(mol2), :D => :H)
    geo1 = hybridization(mol1)
    geo2 = hybridization(mol2)
    return (v1, v2) -> sym1[v1] == sym2[v2] && geo1[v1] === geo2[v2]
end

@hhaensel
Copy link
Copy Markdown
Contributor Author

BTW, what do you think about inchitosdf(), should we replace it to go via inchitomol()?

@mojaie
Copy link
Copy Markdown
Owner

mojaie commented Sep 28, 2025

Yes, it would be better to replace inchitosdf().
And thank you for letting me know your use case. I'd like to keep calculated descriptors simple by using D => 2H alias, but experimentally introduced VirtualAtom (in ./src/virtualatom.jl) may fit this purpose. This provides custom interfaces for atoms and bonds. For example,

# just example, not implemented yet in the library
struct Deuterium{T<:StandardAtom} <: AbstractAtom
    atom::T
    function Deuterium{T}()
        new(T(; symbol=:H, isotope=2))
    end
end

atom_symbol(atom::Deuterium) = :D

may behave as 2H for calculated descriptors (e.g vmatchgen or standard_weight) without :D => :H mapping, but provide :D to atom_symbol and structure drawing functions that uses atom_symbol.

@hhaensel
Copy link
Copy Markdown
Contributor Author

How do we proceed then,? Are you simply merging and adapting inchitosdf or shall I do that?

@mojaie
Copy link
Copy Markdown
Owner

mojaie commented Sep 28, 2025

Could you add commit for inchitosdf? Anyway I will merge it.

@hhaensel
Copy link
Copy Markdown
Contributor Author

Didn't see this, will do ...

@mojaie
Copy link
Copy Markdown
Owner

mojaie commented Oct 2, 2025

Thank you very much!

@mojaie mojaie merged commit 17930ad into mojaie:master Oct 2, 2025
8 checks passed
mojaie referenced this pull request Oct 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants