Skip to content

AniSOAP descriptors vary very weakly with configuration of molecules #87

@elindgren

Description

@elindgren

Hi! I'm trying to use AniSOAP descriptors for representing trimer configurations of molecules, see example of a configuration in Fig 1. I've followed the tutorial with benzene molecules for how to create a suitable descriptor for my system. One difference is that I don't want a global descriptor, but rather a molecule-centered descriptor. Thus, I skip the averaging when computing the power spectrum, power_spectrum_tensor = calculator.power_spectrum(frames, mean_over_samples=False). I then compute these descriptors for a large set of trimers, and standardize them as in the tutorial using a StandardFlexibleScaler. However, when I then try to test the descriptor on a test trimer where I simply plot the change in the descriptor as a function of displacement of one of the molecules, I barely get any change. Specifically, I use the following similarity metric $\Gamma$ to compute the change:

$$ \Gamma = \exp{ ( - (d_{ref} - d_i)^2) }, $$

so a Gaussian similarity where $d_{ref}$ is the descriptor for the unperturbed trimer and $d_i$ is the descriptor for the perturbed system where one of the molecules have been translated. This similarity metric is essentially 1; the MSE in the exponent only changes by on the order of 1e-6. Note that both descriptors have been standardized using the scaler; if I don't standardize, the change is ~1e-12.

I suspect I'm doing something wrong, and I suspect it might be in how I compute the descriptor, but I cannot find what it is. Any help is gladly appreciated! Please find attached my code below.

Fig 1: Trimer of molecules
Image

Fig 2: Similarity as a function of distance

Image
def get_orientation_and_anisotropy(structure):
    # Get gyration tensor, in the lab frame (x, y, and z directions)
    G = get_gyration_tensor(structure)

    # Then extract the principal directions, in the lab frame
    eigenvalues, eigenvectors = np.linalg.eigh(G)

    # Finally, sort the eigenvectors according to the eigenvalues.
    # This interprets the eigenvectors as the rotation matrix
    # that rotates a molecule oriented with the long axis along
    # the x-axis, the short axis along the y-axis, and the
    # thickness along the z-axis to the current molecular orientation.

    idx = np.argsort(eigenvalues)[::-1]
    sorted_eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    # Check the determinant: if negative == improper reordering of axis, flip direction of last eigenvector
    det = np.linalg.det(eigenvectors)
    if det < 0:
        eigenvectors[:, -1] *= -1

    # However, Anisoap defines rotations to transform from the body frame to the labe frame
    # Hence, we need to take the inverse = transpose of our rotation matrix
    orientation = Rotation.from_matrix(eigenvectors.T)

    # We interpret the (sorted) eigenvalues as the MVGs in the AniSOAP formalism,
    # and thus skip tuning them. However, since the molecules are point clouds,
    # if they are flat then the anisotropy in the thickness direction will be
    # essentially zero. Set the minimum value of the anisotropy to 0.5 Å, which
    # was used for the thickness of the benzene monomer in the AniSOAP preprint.
    sorted_eigenvalues = np.maximum(sorted_eigenvalues, 0.5)

    return orientation, sorted_eigenvalues

def get_anisoap(structure, cutoff: float, width=1.5, lmax=5, nmax=3):
    """
    Computes the AniSOAP descriptor for a cluster of molecules.
    """
    # Identify molecules
    molecules = get_molecules_without_centering(structure)
    # get COM, orientation, and MVG for each molecule
    coms = []
    quaternions = []
    ellipsoid_diameters = []
    atoms_in_each_molecule = []
    for i, molecule in enumerate(molecules):
        # First get orientation
        coms.append(molecule.get_center_of_mass())
        # Center molecule to ensure that no bonds are crossing the PBC boundaries
        centered_molecule = get_centered_molecule(molecule)
        
        R, mvg = get_orientation_and_anisotropy(centered_molecule)
        Q = R.as_quat(canonical=True, scalar_first=True)
        quaternions.append(Q)
        ellipsoid_diameters.append(mvg)  # sigma_x, sigma_y, sigma_z

        original_atom_indices = molecule.get_array("original_index")
        for original_atom_index in original_atom_indices:
            atoms_in_each_molecule.extend([i, original_atom_index])
    coms = np.array(coms)
    quaternions = np.array(quaternions)
    ellipsoid_diameters = np.array(ellipsoid_diameters)
    frame = Atoms(
        symbols="".join(["X" for _ in molecules]),
        positions=coms,
        cell=structure.cell,
        pbc=structure.pbc,
    )

    # Add the information to the Atoms object
    frame.arrays["c_q"] = quaternions
    frame.arrays["c_diameter[1]"] = ellipsoid_diameters[:, 0]
    frame.arrays["c_diameter[2]"] = ellipsoid_diameters[:, 1]
    frame.arrays["c_diameter[3]"] = ellipsoid_diameters[:, 2]

    frames = [frame]  # only one frame at a time
    # Then use AniSOAP to compute the descriptor

    AniSOAP_HYPERS = {
        "max_angular": lmax,
        "max_radial": nmax,
        "radial_basis_name": "gto",
        "subtract_center_contribution": True,
        "rotation_type": "quaternion",
        "rotation_key": "c_q",
        "cutoff_radius": float(cutoff),
        "radial_gaussian_width": float(width),
        "basis_rcond": 1e-8,
        "basis_tol": 1e-3,
    }
    calculator = EllipsoidalDensityProjection(**AniSOAP_HYPERS)
    power_spectrum_tensor = calculator.power_spectrum(frames, mean_over_samples=False)
    # Extract the anisoap vector with shape (n_molecules, n_dim)
    # The anisoap tensor has a single block
    block = power_spectrum_tensor[0]
    descriptor = block.values[
        :, 0, :
    ]  # Has shape N_molecules, spherical_harmonices_m, N_dim

    molecular_structure = frame
    return molecular_structure, descriptor, np.array(atoms_in_each_molecule).T

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions