The NumPEx software catalog primarily consists in a main-list/projects.json file containing a set of properties (e.g. description, link to documentation, to guix package...) for every project included in the catalog.
The Automatic Import & Update Process makes it possible, for given projects, to fetch these metadata (or a subset of these metadata) from external JSON sources, such as Codemeta files. The two objectives are:
- to avoid duplicating the information for projects already having a JSON metadata file available
- to avoid need for manual updates of the NumPEx catalog to keep in sync with evolutions of the JSON metadata file
How does it work ?
- A specific configuration file called
main-list/mapping.jsondefines for each project the location of the external JSON source to be fetched, and specifies how to map the metadata contained in this external source to NumPEx catalog properties. - A Python script (
update_projects.py) is able to collect all the external metadata as defined inmapping.jsonand to update theprojects.jsonfile accordingly, either in-place or in a specified output file. - The CI system triggers this script at any change in either
mapping.jsonorprojects.jsonand also one time daily in order to capture potential changes in external JSON sources. Output is directed to theprojects-generated.jsonfile, which is the one used by HTML renderer to visualise the NumPEx Catalaog. - Users wanting to add a new project into the SW Catalog can either:
- edit the
mapping.jsonfile and run the script in-place to create an entry for the project intoprojects.json- this suits very well the situation where all the catalog properties can be fetched from an external JSON source - create manually an entry for the project in
projects.jsonand edit themapping.jsonto configure future updates - this corresponds to the case where only some catalog properties can be fetched from an external source, whereas others have to be manually provided.
- edit the
The overall workflow is shown in the diagram below:
flowchart LR
%% Step 1: Manual update
subgraph Step1a["Manual Edit & import"]
M["Manual Edition"]
A["projects.json"]
F["mapping.json"]
Script["Update Script"]
M --> A
M --> F
A --> Script
F --> Script
Script --> A_updated["Updated projects.json (same file)"]
end
%% Step 2: CI Generation
subgraph Step2["CI System (on changes + daily)"]
A_latest["projects.json"]
F2["mapping.json"]
Script2["Update Script"]
A_generated["projects-generated.json"]
A_latest --> Script2
F2 --> Script2
Script2 --> A_generated
end
%% Step 3: Web Visualization
subgraph Step3["Visualization"]
A_generated2["projects-generated.json"]
VizTool["HTML Renderer"]
Result["Displayed catalog"]
A_generated2 --> VizTool --> Result
end
%% Overall flow
A_updated --> A_latest
A_generated --> A_generated2
This workflow ensures a good separation between human & machine edition :
mapping.jsonis only human-editedprojects.jsonis either human-edited or updated by script, upon human request & under human supervisionprojects-generated.jsonis only machine-edited by CI system
The commit history of each file is kept clean and we have good separation between intentional changes and automatic updates.
This file must contains one projects property, which is an array of mapping objects. Each mapping object have the following properties:
-
source: mandatory - the URL or local path to the JSON file to be used a source for projects metadata. A local path should be expressed in the formlocal:path/relative/to/the/root/of/the/Git/repo/file_name.json- attempts to escape out of the Git repo are detected and flagged as errors. The source JSON file may contain either one JSON object or an array of JSON objects. In the latter case, the script loops over every JSON object sequentially, and for each:- applies the mapping for that object according to
mappingReforfields(see below) - updates the corresponding project in the NumPEx software catalog
- applies the mapping for that object according to
-
name: optional - the name of the project to be updated in in the NumPEx software catalog. If this property is omitted, the script searches for a property callednameacross the fields fetched from the JSON source during the mapping step, and uses its value instead. See note 3 below. -
mappingRef: optional - the URL of a standard mapping among the following ones:URL Comment https://numpex.github.io/sw-catalog/mappings/codemeta-v2-v3.jsonStandard mapping for Codemeta file v2 or v3 adhering to the NumPEx conventions -
fields: optional - the definition for a custom mapping, consisting in an object including multiple string properties:- on left side, name of the NumPex catalog property to be fetched from the JSON source - must be one of these:
name,documentation,description,discussion,guix_packageorspack_package. - on right side, value of the property is a Jq query to be applied on the JSON source in order to retrieve the corresponding NumPEx property. Note that Jq supports complex queries, possibly much more powerful than just selecting one specific field from the JSON source - see a tutorial here.
Missing NumPEx catalog properties will not be fetched from the JSON source. If they are defined in
projects.jsontheir value will be preserved. If not, they will stay undefined. - on left side, name of the NumPex catalog property to be fetched from the JSON source - must be one of these:
-
allow: optional - specifies whether this mapping object is allowed to create new projects in the NumPEx software catalog ("create"), to update existing ones ("update"), or both ("both"). Default value is"update". -
_comment: optional - human-readable comments, ignored by the script.
Note
-
Each mapping object must contain at least one of the two optional properties
mappingRefandfields. In case both properties are defined, then the custom mapping defined infieldstakes precedence. -
Multiple mapping objects may refer to the same project (i.e. have an identical
nameproperty). This can be useful if you want to combine multiple metadata sources for your project. The script does process the mapping objects in their order of appearance in themapping.jsonfile - if the same NumPEx Catalog property is mapped multiple times for a given project, the latter mapping overrides earlier ones. -
It is also possible to have a single
sourcefile aggregating metadata for multiple projects. You only need to omit thenameproperty in the mapping object, to structure thesourcefile as an array of objects, one per project, and then rely on the mapping process to correctly fetch thenamefor each project.
See README.
Let's assume for the sake of example that:
- your project has a Codemeta file including fields
description,buildInstructionsandissueTracker, - you would like to use content of these Codemeta fields to fill in the
description,documentationanddiscussionfields of the SW Catalog (respectively) - you would like that the other fields of the SW Catalog (
guix_packageandspack_package) are managed in the classic way, viaprojects.json.
Then in the step 4 of the submission workflow, when editing main-list/projects.json in your fork, you just need to fill in the guix_package and spack_package fields, you can leave the other fields empty (or put some default text, see note 1 below):
{
"name": "My Super Software",
"description": "",
"discussion": "",
"documentation": "",
"guix_package": "https://example.com/guix",
"spack_package": "https://example.com/spack"
}And before initiating the pull request, you must also update the main-list/mapping.json file in you fork, adding a new mapping object for your project in the projects array, like this:
{
"projects": [
{
"name": "My Super Software",
"source": "https://url.to.your.repo/codemeta.json",
"allow": "update",
"fields":
{
"description": ".description",
"discussion": ".issueTracker",
"documentation": ".buildInstructions"
}
}
]
}This gives the instructions to the CI system where to fetch the metadata for your project, and how to retrieve the fields needed for the SW Catalog. Especially:
- The fact that neither
guix_packagenorspack_packageproperties are defined indicates that these properties will not be updated automatically ; the values you manually edited inmain-list/mapping.jsonare preserved. - The
allowproperty tells the script that we expect an existing entry with that name to be already present in themain-list/projects.json, and that an error shall be triggered otherwise. That prevents an incorrect behaviour in case someone would latter delete the project entry in themain-list/projects.jsonbut forget to delete the correspond entry inmapping.json: without that protection the script would silently create a new entry with the specifiedfieldsmapping, leading to incomplete presentation of the project in the catalog.
After the submission is accepted, automatic updates of the SW Catalog with the latest versions of your source metadata file will be performed daily.
Note
- If something goes wrong during an update (e.g. network failure), the values you have edited in
main-list/projets.jsonare used as backup. It's therefore recommended to put some default text, rather than leaving an empty string.