Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 174 additions & 53 deletions content/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,54 +3,80 @@ title: User Guide
layout: sidenav
---

## Generating a CodeMeta file
## What are CodeMeta files?

You can use the [codemeta-generator](https://codemeta.github.io/codemeta-generator/) directly at <https://codemeta.github.io/codemeta-generator/>
CodeMeta files, also called "CodeMeta instance files" are the `codemeta.json`
documents that are placed in the root of a software's code repository tree.
They define various aspects of the project in a JSON variant called JSON-LD,
which uses linking attributes to connect the data in this file with data from
other available sources.

## The CodeMeta Generator

The CodeMeta Generator is a tool for taking user input and either generating a
valid `codemeta.json` file, or testing an existing file to make sure that it
is valid.

### Generating a CodeMeta instance file

CodeMeta files can be generated using the
[CodeMeta Generator](https://codemeta.github.io/codemeta-generator/).
Instructions for [using the CodeMeta Generator](create) are available.

A _*beta*_ version of an automatic generator is also linked on that page.

### Testing a CodeMeta instance file

Your CodeMeta files can be validated using the
[codemeta-generator](https://codemeta.github.io/codemeta-generator/). Paste
the contents of a `codemeta.json` file into the bottom box, and click the
`Validate codemeta.json` button.

## Creating a CodeMeta instance file manually

A CodeMeta instance file describes the metadata associated with a software object using JSON's linked data (JSON-LD) notation. A CodeMeta file can contain any of the properties described on the [CodeMeta terms page](/terms/). Most CodeMeta files are called `codemeta.json` by convention.
A CodeMeta instance file describes the metadata associated with a software
object using JSON's linked data (JSON-LD) notation. A CodeMeta file can
contain any of the properties described on the [CodeMeta terms page](terms).

Here is an example of a basic `codemeta.json` that you can put at the root of a GitHub repo
([link to full example](https://github.com/gem-pasteur/macsyfinder/blob/master/codemeta.json)):
Any plaintext or code editor is sufficient for creating a CodeMeta instance
file. An editor that has syntax highlighting for `JSON` can assist by
making errors in the syntax stand out.

```json
{
"@context": "https://w3id.org/codemeta/3.1",
"type": "SoftwareSourceCode",
"applicationCategory": "Biology",
"codeRepository": "https://github.com/gem-pasteur/macsyfinder",
"description": "MacSyFinder is a program to model and detect macromolecular systems, genetic pathways… in prokaryotes protein datasets.",
"downloadUrl": "https://pypi.org/project/MacSyFinder/",
"license": "https://spdx.org/licenses/GPL-3.0+",
"name": "macsyfinder",
"version": "2.1.4",
"continuousIntegration": "https://github.com/gem-pasteur/macsyfinder/actions",
"developmentStatus": "active",
"issueTracker": "https://github.com/gem-pasteur/macsyfinder/issues",
"referencePublication": "https://doi.org/10.24072/pcjournal.250"
}
```
Most CodeMeta files are called `codemeta.json` by convention. While other
filenames are valid, they will be less recognisable and may be overlooked.
{.tip}

### Understanding JSON and JSON-LD

### Basics
CodeMeta files contain JSON *key-value pairs*, sometimes referred to as
*name-value pairs* where the values can be *simple values*, *arrays*, or *JSON
objects*. Key-value pairs are known as *property-value pairs* in JSON-LD
linked-data.
Comment thread
meldra marked this conversation as resolved.
Outdated

When creating a CodeMeta document, note that they contain JSON name ("property" in linked-data), value pairs where the values can be simple values, arrays or JSON objects. A simple value is a number, string, or one the literal values *false*, *null* *true*, for example:
#### Simple Values

A simple value is a number, string, or one the literal values *false*, *null*
*true*. For example:

```json
"name" : "R Interface to the DataONE REST API"
```

There must be a comma between of these key-value pairs, and no comma at the end before the closing bracket (`}`).
Key-value pairs must be separated by a comma. There must be no comma at the
end before the closing brace (`}`).

### Arrays
#### Arrays

A JSON array is surrounded by the characters `[` and `]`, and can contain multiple values separated by commas:
A JSON array is surrounded by parentheses; `[` and `]`. Arrays can contain
one or multiple values separated by commas:

```json
"keywords": [ "data sharing", "data repository", "DataONE" ]
```

As with any JSON documents, you can add line breaks between values for improved quality. For example, the former key-value pair is this is equivalent to:
Arrays should contain line breaks between values and indenting (spaces at the
start of a line). These make the data easier for humans to read. The above
example is equivalent to:

```json
"keywords": [
Expand All @@ -60,7 +86,9 @@ As with any JSON documents, you can add line breaks between values for improved
]
```

All fields that accept a value of a given type accept an array of values of this type, and vice-versa. For example, a software with two licenses could have this attribute:
Fields that accept a value of a given type will accept an array of values of
that type. For example, a software with two licenses could have this
attribute:

```json
"license": [
Expand All @@ -69,9 +97,11 @@ All fields that accept a value of a given type accept an array of values of this
]
```

### Objects
#### Objects

Some properties, such as `author`, can refer to other JSON objects surrounded by curly braces and can contain other JSON values or objects, for example:
Some properties, such as `author`, can refer to other JSON objects. Objects
are surrounded by braces; `{` and `}`. These can contain other JSON values or
objects. For example:

```json
"author": {
Expand All @@ -83,18 +113,44 @@ Some properties, such as `author`, can refer to other JSON objects surrounded by
}
```

The JSON-LD "@type" keyword associates a JSON value or object with a well known type, for example, the
statement `"@type":"Person"` associates the `author` object with `http://schema.org/Person`.
It is good practice to always provide the `@type` for any property which specifies a node (JSON object).
The [terms page](/terms/) indicates these node types.
#### Keywords

JSON-LD has the concept of Keywords, which are properties prefaced with a `@`.

These references work similar to includes in a programming language.
{.tip}

Keywords are instructions for the processor of the file to refer to
previously stored information. This means that the exact same information can
be included multiple times, and pulled from a consistent source.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. @type and @list do not refer to other information for example.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — and to add some clarification around the @ properties: most JSON-LD keywords do not pull in other data. Instead, they provide instructions on how the values in the file should be interpreted. In a CodeMeta file, properties that start with @ are JSON-LD keywords and help tools understand the structure and meaning of the data.

For example, the @type keyword indicates what kind of thing is being described. In a CodeMeta file, you might see:

"@type": "SoftwareSourceCode"

This allows tools reading the file to recognize that the information refers to a software project, rather than something like a person or an organization. Another keyword, @list, is used in cases where the order of items matters, such as when listing contributors in a specific sequence.

The @context keyword plays a different role. It links the property names used in the file to a shared vocabulary or ontology that tools can refer to. CodeMeta files typically use the standard CodeMeta context hosted on GitHub (https://github.com/codemeta), which gives well-defined meanings to terms like author, name, and license.


Comment thread
meldra marked this conversation as resolved.
The source can be an external resource, as depicted in the diagram below:

![Diagram of a JSON-LD reference pulling data in from an external data source](/img/jsonld-references-diagram.svg)

The "author" JSON object illustrates the use of the JSON-LD keyword "@id", which is used to associate an IRI with the JSON object. Any such node object can be assigned an `@id`, and we may use the `@id` to refer to this same object (the person, Peter), elsewhere in the document; e.g. we can indicate the same individual is also the `maintainer` by adding:
The JSON-LD "@type" keyword associates a JSON value or object with a well
known type. In the previous example, the statement `"@type":"Person"`
associates the `author` object with `http://schema.org/Person`. It is good
practice to always provide the `@type` for any property which specifies a node
(JSON object). The [terms page](/terms/) indicates these node types.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this diagram come from? it implies that any @ refers to external data but only @context does. (And we typically have @context at the beginning of the document.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagram is from my newbie understanding of JSON-LD. I'll revise this section.

Copy link
Copy Markdown
Contributor

@lindangulopez lindangulopez Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the problems with the current diagram is that it

(i) emphasizes pairwise mappings instead of CodeMeta’s real role as a hub; we can also
(ii) use the colour to give semantic meaning; and untangle
(iii) the arrows which mix conversion, alignment, and information flow; also it is
(iv) hard to tell what is authoritative vs what is derived.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think about this here : https://codepen.io/collection/rBjdmV

(i) https://codepen.io/lindangulopez/pen/LENKgEO is a attempt to show a Canonical CodeMeta 3.x vocabulary alignment diagram. The arrows denote semantic alignment between CodeMeta and external metadata vocabularies, with direction indicating whether the alignment is largely invertible or only partially covered.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Linda, thanks for the feedback and apologies for the delay in response.

The diagram you linked is not the one discussed here, this is, and after some time away from it I am definitely less happy with it.

At the same time I'm not sure your diagram is useful in the context either, or in this guide in particular given the target audience.

Since I'm also not really sure how to make a visualization that does this justice, I'm leaning towards removing the diagram as it is not really effective.

Some other parts of this section also need revising in my opinion.

Copy link
Copy Markdown
Contributor

@lindangulopez lindangulopez Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @meldra ,

Rather than iterating further on this in the PR, I believe it would be more effective to rework the diagram collaboratively with the wider CodeMeta group. This will allow us to align on what the diagram should communicate and to whom, ensuring we avoid a visual that is technically incorrect and potentially misleading.

It’s clear that, in its current form, the diagram is both technically flawed and causing more confusion than clarity. Given this, I agree that we should remove it for now and decide later how best to integrate a new diagram.

My proposal was intended to follow up and spark a discussion in this PR, not as a finalized solution. To facilitate further collaboration on this, I’ve opened a discussion in the CodeMeta forums (Discussion #467) to gather broader input. I will also bring this up in our next meeting, this coming Wednesday, January 14th.

Once we have clearer alignment, I’ll report back here with recommendations.

Kind regards,
Linda

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a commit to retract the diagram pending workshopped replacement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the absence of a replacement diagram a blocker for merging the current rewrite? It can be added in a later revision.


Keywords also provide similiar utility to a function in a programming
language; they prompt the processor to output the same data in another place
in the file.
{.tip}

The "author" JSON object illustrates the use of the JSON-LD keyword "@id",
which is used to associate an IRI with the JSON object. Any such node object
can be assigned an `@id`, and we may use the `@id` to refer to this same
object (the person, Peter), elsewhere in the document; e.g. we can indicate
the same individual is also the `maintainer` by adding:

```json
"maintainer": "http://orcid.org/0000-0003-0077-4738"
```

This should be added at the top level of the document, indicating that this individual is the `maintainer` of the software being described, like this:
This should be added at the top level of the document, indicating that this
individual is the `maintainer` of the software being described, like this:

```json
{
Expand All @@ -113,7 +169,12 @@ This should be added at the top level of the document, indicating that this indi
}
```

JSON-LD operations can later *expand* this reference and *embed* the full information at both locations. This means the example above is equivalent to:

JSON-LD operations can later *expand* this reference and *embed* the full
information at both locations.

This means the previous example is equivalent to:


```json
{
Expand All @@ -138,9 +199,10 @@ JSON-LD operations can later *expand* this reference and *embed* the full inform
}
```

### Nesting objects
#### Nesting objects

We saw before a simple (root) SoftwareSourceCode object:
The following SoftwareSourceCode object is an example of a simple root
object:

```json
{
Expand All @@ -150,7 +212,8 @@ We saw before a simple (root) SoftwareSourceCode object:
}
```

and this root object can refer to other objects, for example recommend a SoftwareApplication:
A root object can refer to other objects. For example, it may recommend a
SoftwareApplication:

```json
{
Expand All @@ -165,7 +228,8 @@ and this root object can refer to other objects, for example recommend a Softwar
}
```

And you may in turn want to add attributes to this application:
Nesting can go many layers deep. In this example, to add attributes to this
application:

```json
{
Expand All @@ -185,9 +249,22 @@ And you may in turn want to add attributes to this application:
}
```

It is important to mind the order of curly brackets (an object begins with a `{` and ends with a matching `}`) and indentation (spaces at the beginning of a line) to reflect the hierarchy: "Central R Archive Network (CRAN)" is the name of the provider of "rmarkdown", which is a softwareSuggestions of CodemetaR.
Indentation and matching braces are important. These reflect the hierarchy of
the document.

Each object begins with a `{` and ends with a matching `}`. Each object should
also have a depth of indentation (the spaces at the beginning of a line) that
reflects its place in the hierarchy.

This above example defines "Central R Archive Network (CRAN)" as the name of
the provider of "rmarkdown", which is a softwareSuggestions of CodemetaR.

Putting key-value or property-value pairs in a different place in the document
hierarchy can change the meaning of the document.

For example, the above code is not equivalent to:
The code below has the `"url"` pair at a different hierarchy. The result is
that it no longer belongs with the `"provider"` information, and the meaning
of the document has changed. It is *_not_* equivalent to the code above.

```json
{
Expand All @@ -207,23 +284,67 @@ For example, the above code is not equivalent to:
}
```

because in the latter, `"https://cran.r-project.org"` is the `"url"` of `rmarkdown`, instead of being the url of `Central R Archive Network (CRAN)`.
The change in hierarchy means that `"https://cran.r-project.org"` is
represented as the `"url"` of `rmarkdown`, instead of being the url of
`Central R Archive Network (CRAN)`.

### Example of a CodeMeta file

The following is an example of a basic `codemeta.json` that can be put at the
root of a code repository.
([link to full example](https://github.com/gem-pasteur/macsyfinder/blob/master/codemeta.json)):

```json
{
"@context": "https://w3id.org/codemeta/3.1",
"type": "SoftwareSourceCode",
"applicationCategory": "Biology",
"codeRepository": "https://github.com/gem-pasteur/macsyfinder",
"description": "MacSyFinder is a program to model and detect macromolecular systems, genetic pathways… in prokaryotes protein datasets.",
"downloadUrl": "https://pypi.org/project/MacSyFinder/",
"license": "https://spdx.org/licenses/GPL-3.0+",
"name": "macsyfinder",
"version": "2.1.4",
"continuousIntegration": "https://github.com/gem-pasteur/macsyfinder/actions",
"developmentStatus": "active",
"issueTracker": "https://github.com/gem-pasteur/macsyfinder/issues",
"referencePublication": "https://doi.org/10.24072/pcjournal.250"
}
```

## The context

Every CodeMeta document must refer to the context file *codemeta.jsonld*, for example via a URL. This indicates that all terms in the document should be interpreted in the "context" of CodeMeta. Most terms are chosen to match the equivalent terms in <http://schema.org>, but CodeMeta provides a few additional terms not found in <http://schema.org> which may be helpful for software projects. CodeMeta also restricts the context to use only those <https://schema.org> terms that are explicitly listed on the [terms](/terms/) page. Users wanting to include additional terms must extend the context (see [developer-guide](/developer-guide/)).
Every CodeMeta document must refer to the context file *codemeta.jsonld*, for
example via a URL. This indicates that all terms in the document should be
interpreted in the "context" of CodeMeta.

Most terms are chosen to match the equivalent terms in <http://schema.org>,
but CodeMeta provides a few additional terms not found in <http://schema.org>
which may be helpful for software projects.

CodeMeta also restricts the context to use only those <https://schema.org>
terms that are explicitly listed on the [terms](/terms/) page. Users wanting
to include additional terms must extend the context (see
[the developer guide](/developer-guide/)).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they don't have to, they can use the schema: prefix to use schema.org terms.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this change at some point? was it previously not permitted (hence the "must") for users to extend it in this manner?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allowed since the first version I can find (0.1-alpha): https://github.com/codemeta/codemeta/blob/0.1-alpha/codemeta.jsonld#L4

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exclusive to the developer context; as in development of codemeta, rather than a user context; a user making a codemeta.json file for their code repo? Or is it valid in both?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's valid in both


The context file may be modified and updated in the future, if new JSON
properties are added or existing ones modified.

The context file may be modified and updated in the future, if new JSON properties are added or existing ones modified.
The CodeMeta GitHub repository defines tags to allow specific versions of a file to be referenced, and assigns
*digital object identifiers*, or DOIs, to each release tag. Please use the [appropriate release](https://github.com/codemeta/codemeta/releases) of the CodeMeta schema in order to refer to the
appropriate context file, e.g.
The CodeMeta GitHub repository defines tags to allow specific versions of a
file to be referenced, and assigns *digital object identifiers*, or DOIs, to
each release tag. Please use the
[appropriate release](https://github.com/codemeta/codemeta/releases) of the
CodeMeta schema in order to refer to the appropriate context file, e.g.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we stopped using DOIs and switched to only w3id instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would I be able to confirm that? @moranegg perhaps?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone with rewriting the section to be non-committal.


```json
"@context": "https://w3id.org/codemeta/3.1"
```

Release candidate versions may be referred to consistently using their [git tag](https://github.com/codemeta/codemeta/tags) for the raw version, e.g. <https://raw.githubusercontent.com/codemeta/codemeta/2.0-rc/codemeta.jsonld>. *Please do not refer to the raw GitHub URL for the master branch*, as this is subject to change and will not guarantee a stable metadata file.
## Referencing CodeMeta

## Testing An Instance file
Release candidate versions may be referred to consistently using their
[git tag](https://github.com/codemeta/codemeta/tags) for the raw version, e.g.
<https://raw.githubusercontent.com/codemeta/codemeta/2.0-rc/codemeta.jsonld>.
*Please do not refer to the raw GitHub URL for the master branch*, as this is
/subject to change and will not guarantee a stable metadata file.

Our [codemeta-generator](https://codemeta.github.io/codemeta-generator/) can also check a codemeta.json file you wrote is valid. To do that, copy-paste your code in the bottom box, and click "Validate codemeta.json".
Loading