Skip to content

Harvesting related works

Brian Riley edited this page Mar 7, 2025 · 3 revisions

Overview

The DMP Tool in conjunction with the Curtin Open Knowledge Initiative (COKI) are developing a workflow to find research outputs registered in various datasources in an effort to connect them to DMPs managed by the DMP Tool.

Potential research output discovery

We are currently working with the following aggregator datasets:

  • DataCite, Crossref, OpenAlex, Make Data Count Citation Corpus

We are also working with funder APIs and crawling award/grant pages to find DOIs.

We first convert the research project's title and abstract into vectors to facilitate matching against the titles and descriptions of research outputs within the aggregation datasets. We use the score from that match combined with matches against grant ids, funding opportunity numbers, funder and affiliation RORs, contributor ORCIDs, emails and name, and repository re3Data ids to arrive at an overall score.

The project is still a work-in-progress. We will detail the specifics of the workflow and the matching algorithm in the future as we test and refine the process.

Once a potential match is found, it is recorded in a table that contains a weight/score. We will then use the weight to determine what potential matches should be presented to users within the DMP Tool.

Potential research output curation

When a DMP owner (or institutional admin) loads their DMP within the DMP Tool, they can review the list of potential matches and either approve or reject them.

If approved, the user has the opportunity to connect the DOI to a research output that had been defined within their DMP. If they do not connect it to a research output, it is added to a list of related works attached to the DMP (this is a common use case for outputs that would not have been covered by the DMP like a paper or article).

Clone this wiki locally