How do DataCite services support the EOSC PIDGraph?
Introduction to the PIDGraph
The PID Graph started as an output of the FREYA project (read more in the introductory blogpost), resulting in a GraphQL API and the first versions of DataCite Commons. During the FAIRCORE4EOSC project, the EOSC PIDGraph was created as a series of data enhancements, new services, and improvements to existing services, built on top of the original FREYA PID Graph.
The PIDGraph services are designed to facilitate the exposure and usage of PID metadata and connections, centered around a trusted source - in this instance, the core data of the graph is made up of DOI metadata held by DataCite and the network of the graph is built around the relationships connection one DOI to another, and also outwards from DOIs to other established and trusted PIDs within the research community, including ORCiDs for people, and ROR identifiers for institutions, as well as domain specific persistent identifiers such as zbMATH Article Identifiers.
DataCite provides different services that enable users to interact with the PIDGraph and the data it contains:
DataCite REST API
The DataCite REST API provides an interface for querying and retrieving the DOI metadata that makes up the core of the PIDGraph. Individual DOI records can be directly retrieved, and lists of DOI records can be returned by querying the API.
Full documentation is available from the Introduction to the DataCite REST API page.
DataCite GraphQL API and DataCite Commons
The DataCite GraphQL API can be used for making direct queries into the graph, providing the ability to retrieve subsets of the graph and enabling more complex queries than are possible via the REST API, such as "two-hop" connections between objects.
The GraphQL API can be accessed via the API endpoint at https://api.datacite.org/graphql, and more documentation on usage is available in the DataCite GraphQL API Guide.
DataCite Commons is a web frontend built on top of the GraphQL API, providing an easy to use, interactive interface to the graph data. Works, people, organisations, and repositories are all searchable and a variety of connections and statistics are presented, giving a visual overview of the graph.
For more information, please see Introduction to DataCite Commons.
DataCite Public Data File and PID Links Data File
DataCite publishes two datasets of PID Graph data - one containing the metadata for all DataCite DOI nodes within the graph, and one containing the vertices of the graph along with metadata about the connections.
The DataCite Public Data File is released yearly and comprises metadata for all publicly available DataCite DOIs that were registered up to the end of the year, stored as JSONLines and using the response structure from the DataCite REST API.
The DataCite PID Links file is released regularly, and comprises a core triple of object-subject-relationship as well as additional metadata about the relationship, such as the source of the assertion and when it occurred. The data is stored as JSONLines and is a subset of the DataCite Event Data response structure.
Both Data Files are available via the DataCite Data Files Service.
DataCite OAI-PMH API
The DataCite OAI-PMH API can be used for harvesting the graph data using the Open Archives Initiative Protocol for Metadata Harvesting, a common framework enabling interoperable exchange of data.
OAI-PMH compliant clients can make use of standard patterns such as sets, different metadata formats, and date-limited queries to retrieve targeted subsets of the graph.
More documentation on usage is available in the DataCite OAI-PMH Guide.
DataCite Event Data
DataCite Event Data is the service that stores the connections between the different objects that make up the PIDGraph, as well as usage statistics. The data is sourced from DataCite DOI metadata, other DOI registration agencies, trusted third-party sources such as the zbMATH Open portal, and usage reports submitted to DataCite.
DataCite Event Data can be accessed as part of the PID Links Data File, or queried directly using the DataCite REST API. For guidance on querying, please see DataCite Event Data.
Updated 11 days ago