DataCite Support

Link Checker

The DataCite link checker service is a custom-built web crawler that periodically checks a random sampling of DOIs to verify that they still resolve to a valid URL and to gather other useful information about the metadata for DOIs registered with DataCite.

How do I use the link checker?

The link checker is a service available only to DataCite Members. The link checker service runs automatically in the background, so there is nothing Members need to do and nothing they need to enable in order to benefit from the service.

The results of the link checker are displayed in DOI Fabrica at the bottom of each DOI record that has been checked.

You can filter the list of DOIs on the DOIs tab by HTTP status code to see which DOIs have link checker results of each status.

Link checker results are also available to DataCite Members via the REST API.

How does the link checker work?

The link checker checks one random DOI per Client per day. It attempts to follow the URL listed in the URL field of the DOI's metadata and returns results about whether it was successful and what it found at the other end.

The crawler that powers link checker was built open-source by DataCite and goes by the name PidCheck. It is software built on top of existing crawler technology, namely the Scrapy project, with various customizations specific to us.

What does the link checker look for?

The link checker looks for characteristics of proper functioning of URLs, as well as elements that make up a well-formed DOI landing page. (See Best Practices for DOI Landing Pages.)

HTTP status code

The link checker will attempt to follow the URL listed in the URL field of the DOI's metadata. If that URL resolves successfully, the link checker will return HTTP status code 200. Otherwise, one of several standard HTTP error codes will be returned, such as 404 (page not found).

Number and URL of any redirects

If the link checker is redirected while attempting to follow a URL, the results will display a list of all URLs the link checker was redirected through, ending in the final URL that was ultimately resolved.

Landing page

The link checker will return the HTTP content type of the content found at the URL to which it ultimately resolves. Ideally, this will be the content type text/html, indicating that an HTML landing page was found on the other end. If another content type is found, the link checker will indicate which content type was found.

Machine-readable DOI

The link checker will indicate whether a machine-readable DOI was found on the landing page. A "machine-readable DOI" means a DOI that appears in the body of the landing page and is appropriately tagged so as to be recognizable as a DOI by a machine. The link checker looks for either a DC.identifier meta tag, a citation_doi meta tag, or for appropriately tagged schema.org DOI metadata.

Schema.org metadata

The link checker looks for schema.org metadata on the landing page, if a landing page is found. It's is specifically looking for embedded JSON-LD with @context https://schema.org.

What do I do if the link checker isn't getting the results I think it should get?

Certain HTTP errors can be temporary. If one of your DOIs is showing that it returned an error code on its last check, try to resolve the URL yourself to see if there is still a problem. If the URL resolves normally, there is no need to contact us. The HTTP status code will be updated the next time the link checker works its way around checking that DOI.

If the URL resolved successfully (status code 200), but the results of the metadata checks don't align with what you were expecting, make sure that your DOI's landing page conforms to our recommendations for Best Practices for DOI Landing Pages. If your landing page does conform, but the link checker is not picking up the appropriate results, please contact us at support@datacite.org and we will investigate the issue.

Link Checker


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.