Skip to content

Commit a012912

Browse files
authored
Single SOFT entity for structure resource (#196)
New single entity for an OPTIMADE structure resource Expand species and assemblies to singular values. They will be string-encoded, and would need to be decoded to get the "proper" lists back - this is due to dynamic list lengths. Also, add all space_group* properties to the new entity. But comment them out for now. Add support for the new entity to the DLite parser strategy. Extend DLite parser tests for the new single entity OPTIMADE Structure. Move entities out of Python package to a top-level 'entities' folder. Implement convenience functions for parsing species and assemblies from a single entity instance. These are available to import from `oteapi_optimade`. These are effectively helper/utils functions to "decode" the string-encoded values as mentioned above.
1 parent 55e6985 commit a012912

15 files changed

Lines changed: 716 additions & 143 deletions

.github/workflows/cd_upload_entities.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
SHA_BEFORE="${{ github.event.pull_request.base.sha }}"
4141
fi
4242
43-
git diff --name-only ${SHA_BEFORE} | grep -E '^oteapi_optimade/dlite/entities/.*\.ya?ml$' > entities.txt ||:
43+
git diff --name-only ${SHA_BEFORE} | grep -E '^entities/.*\.ya?ml$' > entities.txt ||:
4444
4545
if [ -s entities.txt ]; then
4646
echo "relevant_entities=true" >> $GITHUB_OUTPUT

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,4 +88,4 @@ repos:
8888
hooks:
8989
- id: validate-entities
9090
additional_dependencies: [".[cli]"]
91-
files: ^oteapi_optimade/dlite/entities/.*\.ya?ml$
91+
files: ^entities/.*\.ya?ml$

docs/api_reference/_utils.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# _utils
2+
3+
::: oteapi_optimade._utils

oteapi_optimade/dlite/entities/JSONAPIResourceLinks.yaml renamed to entities/JSONAPIResourceLinks.yaml

File renamed without changes.

oteapi_optimade/dlite/entities/OPTIMADERelationships.yaml renamed to entities/OPTIMADERelationships.yaml

File renamed without changes.

oteapi_optimade/dlite/entities/OPTIMADEStructure.yaml renamed to entities/OPTIMADEStructure.yaml

File renamed without changes.

oteapi_optimade/dlite/entities/OPTIMADEStructureAssembly.yaml renamed to entities/OPTIMADEStructureAssembly.yaml

File renamed without changes.

oteapi_optimade/dlite/entities/OPTIMADEStructureAttributes.yaml renamed to entities/OPTIMADEStructureAttributes.yaml

File renamed without changes.
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
uri: http://onto-ns.com/meta/1.0.1/OPTIMADEStructureResource
2+
description: An OPTIMADE structure resource.
3+
dimensions:
4+
nelements: Number of different elements in the structure as an integer.
5+
dimensionality: Number of spatial dimensions. Must always be 3.
6+
nsites: An integer specifying the length of the `cartesian_site_positions` property.
7+
nstructure_features: Number of structure features.
8+
## Uncomment when space_group attributes are implemented in OPT.
9+
# nspace_group_symmetry_operations: Number of space group symmetry operations.
10+
nspecies: Number of species.
11+
nassemblies: Number of assemblies.
12+
properties:
13+
# Resource
14+
id:
15+
type: string
16+
description: An entry's ID as defined in section Definition of Terms.
17+
type:
18+
type: string
19+
description: The name of the type of an entry. Must always be 'structures'.
20+
21+
# Entry Resource
22+
immutable_id:
23+
type: string
24+
description: The entry's immutable ID (e.g., an UUID). This is important for databases having preferred IDs that point to "the latest version" of a record, but still offer access to older variants. This ID maps to the version-specific record, in case it changes in the future.
25+
last_modified:
26+
type: string
27+
description: Date and time representing when the entry was last modified.
28+
29+
# Structure Resource
30+
elements:
31+
type: string
32+
shape: [nelements]
33+
description: The chemical symbols of the different elements present in the structure.
34+
elements_ratios:
35+
type: float
36+
shape: [nelements]
37+
description: Relative proportions of different elements in the structure.
38+
chemical_formula_descriptive:
39+
type: string
40+
description: The chemical formula for a structure as a string in a form chosen by the API implementation.
41+
chemical_formula_reduced:
42+
type: string
43+
description: The reduced chemical formula for a structure as a string with element symbols and integer chemical proportion numbers.
44+
chemical_formula_hill:
45+
type: string
46+
description: The chemical formula for a structure in [Hill form](https://dx.doi.org/10.1021/ja02046a005) with element symbols followed by integer chemical proportion numbers.
47+
chemical_formula_anonymous:
48+
type: string
49+
description: The anonymous formula is the `chemical_formula_reduced`, but where the elements are instead first ordered by their chemical proportion number, and then, in order left to right, replaced by anonymous symbols A, B, C, ..., Z, Aa, Ba, ..., Za, Ab, Bb, ... and so on.
50+
dimension_types:
51+
type: int
52+
shape: [dimensionality]
53+
description: "List of three integers. For each of the three directions indicated by the three lattice vectors (see property `lattice_vectors`), this list indicates if the direction is periodic (value `1`) or non-periodic (value `0`). Note: the elements in this list each refer to the direction of the corresponding entry in `lattice_vectors` and *not* the Cartesian x, y, z directions."
54+
nperiodic_dimensions:
55+
type: int
56+
description: An integer specifying the number of periodic dimensions in the structure, equivalent to the number of non-zero entries in `dimension_types`.
57+
lattice_vectors:
58+
type: float
59+
shape: [dimensionality, dimensionality]
60+
unit: Å
61+
description: The three lattice vectors in Cartesian coordinates, in ångström (Å).
62+
## Uncomment when space_group attributes are implemented in OPT.
63+
# space_group_symmetry_operations_xyz:
64+
# type: string
65+
# shape: [nspace_group_symmetry_operations]
66+
# description: List of symmetry operations given as general position x, y and z coordinates in algebraic form.
67+
# space_group_symbol_hall:
68+
# type: string
69+
# description: A Hall space group symbol representing the symmetry of the structure as defined in (Hall, 1981, 1981a).
70+
# space_group_symbol_hermann_mauguin:
71+
# type: string
72+
# description: A human- and machine-readable string containing the short Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.
73+
# space_group_symbol_hermann_mauguin_extended:
74+
# type: string
75+
# description: A human- and machine-readable string containing the extended Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.
76+
# space_group_it_number:
77+
# type: int
78+
# description: Space group number which specifies the space group of the structure as defined in the International Tables for Crystallography Vol. A. (IUCr, 2005).
79+
cartesian_site_positions:
80+
type: float
81+
shape: [nsites, dimensionality]
82+
description: Cartesian positions of each site in the structure. A site is usually used to describe positions of atoms; what atoms can be encountered at a given site is conveyed by the `species_at_sites` property, and the species themselves are described in the `species` property.
83+
species_at_sites:
84+
type: string
85+
shape: [nsites]
86+
description: Name of the species at each site (where values for sites are specified with the same order of the property `cartesian_site_positions`).
87+
structure_features:
88+
type: string
89+
shape: [nstructure_features]
90+
description: A list of strings that flag which special features are used by the structure.
91+
92+
## Species
93+
# A list describing the species of the sites of this structure. Species can represent pure chemical elements, virtual-crystal atoms representing a statistical occupation of a given site by multiple chemical elements, and/or a location to which there are attached atoms, i.e., atoms whose precise location are unknown beyond that they are attached to that position (frequently used to indicate hydrogen atoms attached to another element, e.g., a carbon with three attached hydrogens might represent a methyl group, -CH3).
94+
species_name:
95+
type: string
96+
shape: [nspecies]
97+
description: Name of the species.
98+
# This is an option for optimizing the shapes for species_chemical_symbols, species_concentration, and species_mass.
99+
# species_nchemical_symbols:
100+
# type: int
101+
# shape: [nspecies]
102+
# description: Length of each species' list of chemical symbols, concentration values, and mass.
103+
species_chemical_symbols: # TODO: Potential here for improvement. The individual list of chemical symbols is dynamic!
104+
type: string
105+
shape: [nspecies]
106+
description: Chemical symbol of the species as a string list of comma-separated entries.
107+
species_concentration: # TODO: Potential here for improvement. The individual list of concentraion is dynamic! Furthermore, it is originally float values!
108+
type: string
109+
shape: [nspecies]
110+
description: Concentration of the species in the structure. The concentration is given as a string list of comma-separated entries. The concentration values should be cast to float.
111+
species_mass:
112+
type: string
113+
shape: [nspecies]
114+
description: A string list of comma-separated entries that list the mass of the species in atomic mass units (amu). The values should be cast to float.
115+
unit: amu
116+
species_original_name:
117+
type: string
118+
shape: [nspecies]
119+
description: Can be any valid Unicode string, and should contain (if specified) the name of the species that is used internally in the source database.
120+
species_attached: # TODO: Potential here for improvement. The individual list of chemical symbols is dynamic!
121+
type: string
122+
shape: [nspecies]
123+
description: A string list of comma-separated entries that list chemical symbols for the elements attached to this site.
124+
species_nattached:
125+
type: string
126+
shape: [nspecies]
127+
description: A string list of comma-separated entries that list the number of attached atoms of the kind specified in the value of the `species_attached` key. The values should be cast to integers.
128+
129+
## Assemblies
130+
# A description of groups of sites that are statistically correlated.
131+
# TODO: Handle dynamic list lengths.
132+
assemblies_sites_in_groups:
133+
type: string
134+
shape: [nassemblies]
135+
description: A string list of inner-most comma-separated entries that list the sites that are in each group. The groups are separated by a semicolon. The sites are listed by their index in the `cartesian_site_positions` list and should be cast to integers.
136+
assemblies_group_probabilities:
137+
type: string
138+
shape: [nassemblies]
139+
description: A string list of comma-separated entries that list the probability of each group. The probabilities should be cast to floats.

oteapi_optimade/dlite/entities/OPTIMADEStructureSpecies.yaml renamed to entities/OPTIMADEStructureSpecies.yaml

File renamed without changes.

0 commit comments

Comments
 (0)