Hi! I'm trying to use AniSOAP descriptors for representing trimer configurations of molecules, see example of a configuration in Fig 1. I've followed the tutorial with benzene molecules for how to create a suitable descriptor for my system. One difference is that I don't want a global descriptor, but rather a molecule-centered descriptor. Thus, I skip the averaging when computing the power spectrum, power_spectrum_tensor = calculator.power_spectrum(frames, mean_over_samples=False). I then compute these descriptors for a large set of trimers, and standardize them as in the tutorial using a StandardFlexibleScaler. However, when I then try to test the descriptor on a test trimer where I simply plot the change in the descriptor as a function of displacement of one of the molecules, I barely get any change. Specifically, I use the following similarity metric $\Gamma$ to compute the change:
$$
\Gamma = \exp{ ( - (d_{ref} - d_i)^2) },
$$
so a Gaussian similarity where $d_{ref}$ is the descriptor for the unperturbed trimer and $d_i$ is the descriptor for the perturbed system where one of the molecules have been translated. This similarity metric is essentially 1; the MSE in the exponent only changes by on the order of 1e-6. Note that both descriptors have been standardized using the scaler; if I don't standardize, the change is ~1e-12.
I suspect I'm doing something wrong, and I suspect it might be in how I compute the descriptor, but I cannot find what it is. Any help is gladly appreciated! Please find attached my code below.
Fig 1: Trimer of molecules

Fig 2: Similarity as a function of distance
def get_orientation_and_anisotropy(structure):
# Get gyration tensor, in the lab frame (x, y, and z directions)
G = get_gyration_tensor(structure)
# Then extract the principal directions, in the lab frame
eigenvalues, eigenvectors = np.linalg.eigh(G)
# Finally, sort the eigenvectors according to the eigenvalues.
# This interprets the eigenvectors as the rotation matrix
# that rotates a molecule oriented with the long axis along
# the x-axis, the short axis along the y-axis, and the
# thickness along the z-axis to the current molecular orientation.
idx = np.argsort(eigenvalues)[::-1]
sorted_eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Check the determinant: if negative == improper reordering of axis, flip direction of last eigenvector
det = np.linalg.det(eigenvectors)
if det < 0:
eigenvectors[:, -1] *= -1
# However, Anisoap defines rotations to transform from the body frame to the labe frame
# Hence, we need to take the inverse = transpose of our rotation matrix
orientation = Rotation.from_matrix(eigenvectors.T)
# We interpret the (sorted) eigenvalues as the MVGs in the AniSOAP formalism,
# and thus skip tuning them. However, since the molecules are point clouds,
# if they are flat then the anisotropy in the thickness direction will be
# essentially zero. Set the minimum value of the anisotropy to 0.5 Å, which
# was used for the thickness of the benzene monomer in the AniSOAP preprint.
sorted_eigenvalues = np.maximum(sorted_eigenvalues, 0.5)
return orientation, sorted_eigenvalues
def get_anisoap(structure, cutoff: float, width=1.5, lmax=5, nmax=3):
"""
Computes the AniSOAP descriptor for a cluster of molecules.
"""
# Identify molecules
molecules = get_molecules_without_centering(structure)
# get COM, orientation, and MVG for each molecule
coms = []
quaternions = []
ellipsoid_diameters = []
atoms_in_each_molecule = []
for i, molecule in enumerate(molecules):
# First get orientation
coms.append(molecule.get_center_of_mass())
# Center molecule to ensure that no bonds are crossing the PBC boundaries
centered_molecule = get_centered_molecule(molecule)
R, mvg = get_orientation_and_anisotropy(centered_molecule)
Q = R.as_quat(canonical=True, scalar_first=True)
quaternions.append(Q)
ellipsoid_diameters.append(mvg) # sigma_x, sigma_y, sigma_z
original_atom_indices = molecule.get_array("original_index")
for original_atom_index in original_atom_indices:
atoms_in_each_molecule.extend([i, original_atom_index])
coms = np.array(coms)
quaternions = np.array(quaternions)
ellipsoid_diameters = np.array(ellipsoid_diameters)
frame = Atoms(
symbols="".join(["X" for _ in molecules]),
positions=coms,
cell=structure.cell,
pbc=structure.pbc,
)
# Add the information to the Atoms object
frame.arrays["c_q"] = quaternions
frame.arrays["c_diameter[1]"] = ellipsoid_diameters[:, 0]
frame.arrays["c_diameter[2]"] = ellipsoid_diameters[:, 1]
frame.arrays["c_diameter[3]"] = ellipsoid_diameters[:, 2]
frames = [frame] # only one frame at a time
# Then use AniSOAP to compute the descriptor
AniSOAP_HYPERS = {
"max_angular": lmax,
"max_radial": nmax,
"radial_basis_name": "gto",
"subtract_center_contribution": True,
"rotation_type": "quaternion",
"rotation_key": "c_q",
"cutoff_radius": float(cutoff),
"radial_gaussian_width": float(width),
"basis_rcond": 1e-8,
"basis_tol": 1e-3,
}
calculator = EllipsoidalDensityProjection(**AniSOAP_HYPERS)
power_spectrum_tensor = calculator.power_spectrum(frames, mean_over_samples=False)
# Extract the anisoap vector with shape (n_molecules, n_dim)
# The anisoap tensor has a single block
block = power_spectrum_tensor[0]
descriptor = block.values[
:, 0, :
] # Has shape N_molecules, spherical_harmonices_m, N_dim
molecular_structure = frame
return molecular_structure, descriptor, np.array(atoms_in_each_molecule).T
Hi! I'm trying to use AniSOAP descriptors for representing trimer configurations of molecules, see example of a configuration in Fig 1. I've followed the tutorial with benzene molecules for how to create a suitable descriptor for my system. One difference is that I don't want a global descriptor, but rather a molecule-centered descriptor. Thus, I skip the averaging when computing the power spectrum,$\Gamma$ to compute the change:
power_spectrum_tensor = calculator.power_spectrum(frames, mean_over_samples=False). I then compute these descriptors for a large set of trimers, and standardize them as in the tutorial using aStandardFlexibleScaler. However, when I then try to test the descriptor on a test trimer where I simply plot the change in the descriptor as a function of displacement of one of the molecules, I barely get any change. Specifically, I use the following similarity metricso a Gaussian similarity where$d_{ref}$ is the descriptor for the unperturbed trimer and $d_i$ is the descriptor for the perturbed system where one of the molecules have been translated. This similarity metric is essentially 1; the MSE in the exponent only changes by on the order of 1e-6. Note that both descriptors have been standardized using the scaler; if I don't standardize, the change is ~1e-12.
I suspect I'm doing something wrong, and I suspect it might be in how I compute the descriptor, but I cannot find what it is. Any help is gladly appreciated! Please find attached my code below.
Fig 1: Trimer of molecules

Fig 2: Similarity as a function of distance