DataCite Usage Reports API Guide

🚧

Usage for non-dataset materials

This guide outlines how to send usage reports for repository items that follow the COUNTER Code of Practice for Research Data. If you would like to submit usage reports for repository items that are NOT datasets, please contact us first at [email protected].

Purpose

The DataCite Usage Reports API allows repositories to store data usage metrics. The API requires authentication for most writing endpoints but all reading endpoints are accessible without credentials. There is a Live API reference also available. To obtain credentials, your repository needs to be a Datacite Member. Interested parties should contact the DataCite support team.

If you're interested in how DataCite displays the information you send, see DataCite Commons

Authentication

All requests to the Usage Reports API require authentication for writing. For this reason, only traffic via a secure connection (HTTPS) is supported. The DataCite Usage Reports uses JWT authentication.

To start interacting with our DataCite Usage Reports API you must be a DataCite Member. DataCite will provide you a JSON Web Token.

📘

JSON Web Tokens will expire one year after the date that they are created

We suggest you request and implement a new token before the previous token expires to maintain continuous access to the Usage Reports API.

How to Use

The DataCite Usage Reports API can be used directly by making HTTP requests with packages such as cURL.

🚧

JWT needs to be put in quotes

If you are using the command line remember to put your JWT in quotes.

Fetching Usage Reports

Let's start by testing our setup. In this guide, the examples we will use would employ the Usage Reports API Production endpoint (i.e., https://api.datacite.org ) We will use the curl command-line tool.

Now, open up a command prompt and enter the following command:

# GET /reports
$ curl  https://api.datacite.org/reports/

The response will be a list of all the reports. In this list only the headers of the reports is included. In the next step we will look at how to see the whole content of each report.

Next, let's fetch a single usage report:

# GET /reports/report-id
$ curl https://api.datacite.org/reports/21fd2e8e-5481-4bbd-b2ef-742d8b270a66
{
	"report": {
		"id": "21fd2e8e-5481-4bbd-b2ef-742d8b270a66",
		"report-header": {
			"report-name": "dataset report",
			"report-id": "DSR",
			"release": "rd1",
			"created-by": "Dash",
			"created": "2018-04-30",
			"reporting-period": {
				"end-date": "2018-04-30",
				"begin-date": "2018-04-01"
			},
			"report-filters": [],
			"report-attributes": []
		},
		"report-datasets": [
			{
				"uri": "https://oneshare.cdlib.org/stash/dataset/doi:10.15146/R3J66V",
				"yop": "2017",
				"platform": "Dash",
				"data-type": "dataset",
				"publisher": "DataONE",
				"dataset-id": [
					{
						"type": "doi",
						"value": "10.15146/R3J66V"
					}
				],
				"performance": [
					{
						"period": {
							"end-date": "2018-04-30",
							"begin-date": "2018-04-01"
						},
						"instance": [
							{
								"count": 20,
								"metric-type": "total-dataset-requests",
								"access-method": "regular",
								"country-counts": {
									"au": 6,
									"jp": 1,
									"us": 13
								}
							},
							{
								"count": 5,
								"metric-type": "unique-dataset-requests",
								"access-method": "regular",
								"country-counts": {
									"au": 1,
									"jp": 1,
									"us": 3
								}
							},
							{
								"count": 4,
								"metric-type": "total-dataset-investigations",
								"access-method": "regular",
								"country-counts": {
									"de": 2,
									"dk": 1,
									"us": 1
								}
							},
							{
								"count": 4,
								"metric-type": "unique-dataset-investigations",
								"access-method": "regular",
								"country-counts": {
									"de": 2,
									"dk": 1,
									"us": 1
								}
							},
							{
								"count": 14,
								"metric-type": "total-dataset-requests",
								"access-method": "machine",
								"country-counts": {
									"kr": 14
								}
							},
							{
								"count": 14,
								"metric-type": "unique-dataset-requests",
								"access-method": "machine",
								"country-counts": {
									"kr": 14
								}
							}
						]
					}
				],
				"publisher-id": [
					{
						"type": "grid",
						"value": "tbd"
					}
				],
				"dataset-dates": [
					{
						"type": "pub-date",
						"value": "2017-12-31"
					}
				],
				"dataset-title": "Influence of human disturbance on marine invertebrate biodiversity in Acadia National Park’s rocky intertidal community",
				"dataset-contributors": [
					{
						"type": "name",
						"value": "Cassandra Lopez"
					}
				]
			},
			{
				"uri": "https://dash.ucmerced.edu/stash/dataset/doi:10.6071/M32Q0X",
				"yop": "2017",
				"platform": "Dash",
				"data-type": "dataset",
				"publisher": "UC Merced",
				"dataset-id": [
					{
						"type": "doi",
						"value": "10.6071/M32Q0X"
					}
				],
				"performance": [
					{
						"period": {
							"end-date": "2018-04-30",
							"begin-date": "2018-04-01"
						},
						"instance": [
							{
								"count": 32,
								"metric-type": "total-dataset-requests",
								"access-method": "regular",
								"country-counts": {
									"au": 10,
									"de": 7,
									"dk": 1,
									"us": 14
								}
							},
							{
								"count": 8,
								"metric-type": "unique-dataset-requests",
								"access-method": "regular",
								"country-counts": {
									"au": 3,
									"de": 1,
									"dk": 1,
									"us": 3
								}
							},
							{
								"count": 35,
								"metric-type": "total-dataset-investigations",
								"access-method": "regular",
								"country-counts": {
									"br": 2,
									"de": 4,
									"dk": 1,
									"us": 28
								}
							},
							{
								"count": 27,
								"metric-type": "unique-dataset-investigations",
								"access-method": "regular",
								"country-counts": {
									"br": 2,
									"de": 4,
									"dk": 1,
									"us": 20
								}
							}
						]
					}
				],
				"publisher-id": [
					{
						"type": "grid",
						"value": "266096.d"
					}
				],
				"dataset-dates": [
					{
						"type": "pub-date",
						"value": "2017-09-06"
					}
				],
				"dataset-title": "TWOSTATE, a resonance Raman excitation profile and absorption spectrum simulator",
				"dataset-contributors": [
					{
						"type": "name",
						"value": "Anne Kelley"
					}
				]
			}
		]
	}
}


The response is a usage report in json format. We have truncated the report here to save space. The API expects SUSHI-formatted reports in ingestion and returns collections of SUSHI reports for consumption. The API closely follows the RESEARCH_DATA_SUSHI specification.

Let's add the -i flag to include headers:

# GET /reports/report-id
$ curl -i https://api.datacite.org/reports/21fd2e8e-5481-4bbd-b2ef-742d8b270a66
< HTTP/1.1 200 OK
< Date: Wed, 22 Aug 2018 08:26:05 GMT
< Content-Type: application/json; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Status: 200 OK
< X-Anonymous-Consumer: true
< Cache-Control: max-age=0, private, must-revalidate
< Vary: Accept-Encoding, Origin
< ETag: W/"cc8c86a0ad0181ac67dcbebb7127f675"
< X-Runtime: 0.267312
< X-Request-Id: 22ede321-a835-43c1-b954-23ac81313fe2
< X-Powered-By: Phusion Passenger 5.3.4
< Server: nginx/1.14.0 + Phusion Passenger 5.3.4

{
	"report": {
		"id": "21fd2e8e-5481-4bbd-b2ef-742d8b270a66",
		"report-header": {

Register a Usage Report

To register a usage report you will need to generate a Data Usage report following the Code of practice for research data usage metrics.

There are also a few tools you can use to generate your usage report based on your repository weblogs. For example, the Counter Processor is a Python 3 (written in 3.6.4) script for processing dataset access statistics from logs.

Once you have your usage report formatted, there are two ways to register a report:

A) using an autogenerated report ID
B) providing a report ID

To register a usage report with an autogenerated ID use:

# POST /reports
$ curl --header "Content-Type: application/json; Accept: application/json" -H "Authorization: Bearer {YOUR-JSON-WEB-TOKEN}" -X POST https://api.datacite.org/reports/ -d @usage-report-file.json

The successful response will generate a report with an autogenerated report ID that has the form of a UUID. If a report has already been submitted with a matching created-by and reporting-period.begin-date in the report-header, the existing report will be returned.

To register a usage report with a specific ID you will need to provide the report ID. Keep in mind that the report ID MUST be a UUID.

# PUT /reports/report-id
$ curl --header "Content-Type: application/json; Accept: application/json" -H "Authorization: Bearer {YOUR-JSON-WEB-TOKEN}" -X PUT https://api.datacite.org/reports/{report-id-as-uuid} -d @usage-report-file.json

The successful response to either call will return the same usage report. The response status can be one of:

  • 201 Created: operation successful
  • 401 Unauthorised: no JWT provided
  • 403 Forbidden: JWT expired,
  • 415 Wrong Content-Type: Not including the correct content type in the header.
  • 422 Unprocessable Entity: invalid JSON, and others.

Once you have registered the report, you can fetch the report described earlier in this guide. You can also update the report.

Update a Usage Report

To update the usage report you simply need to provide the report ID of the report you want to update like so:

# PUT /reports/report-id
$ curl --header "Content-Type: application/json; Accept: application/json" -H "Authorization: Bearer {YOUR-JSON-WEB-TOKEN}" -X PUT https://api.datacite.org/reports/{report-id-as-uuid} -d @usage-report-file.json

Register a large Usage Report

Usage reports can get very large, so we have two ways to approach the submission of very large reports. The first approach is compression and the second is subsetting. Large reports need to be divided and compressed. We have set up a top limit of 50,000 datasets per report.

In both cases, you need to add this exception in the report header:

"exceptions": [{
  "code": 69,
  "severity": "warning",
  "message": "Report is compressed using gzip",
  "help-url": "https://github.com/datacite/sashimi",
  "data": "usage data needs to be uncompressed"
}]

Sending compressed reports

We suggest compressing any report that is larger than 10MB. Here it is a ruby example of report compression:

def compress file
  report = File.read(file)
  gzip = Zlib::GzipWriter.new(StringIO.new)
  string = JSON.parse(report).to_json
  gzip << string
  body = gzip.close.string
  body
end

When sending the compressed reports you need to send them using application/gzip as Content Type and gzip as Content Encoding. For example

URI = 'https://api.datacite.org/reports'

def post_file file
  
  headers = {
    content_type: "application/gzip",
    content_encoding: 'gzip',
    accept: 'application/json'
  }
  
  body = compress(file)

  request = Maremma.post(URI, data: body,
    bearer: ENV['TOKEN'],
    headers: headers,
    timeout: 100)
end

The equivalent Curl call would be:

$ curl --header "Content-Type: application/gzip; Content-Encoding: gzip" -H "Authorization: Bearer {YOUR-JSON-WEB-TOKEN}" -X POST https://api.datacite.org/reports/ -d @usage-report-compressed

Send Usage Report in subsets

In order to create a report with more than 50,000 records, just keep making POST requests with the same report-header. This will create subsets of the report. For example:

POST /reports
POST /reports
POST /reports

To update an existing compressed report make a PUT request followed with as many POST requests with the same report-header as you need. For example:

PUT /reports/{report-id}
POST /reports
POST /reports

Next

Now you know the basics of the Usage Reports API!

  • Authentication
  • Fetching and registering and updating usage reports

Keep learning with the Usage Reports API Reference!

📘

Would you like to know more?

If you have any questions, requests or ideas please contact us!