How do I expose my datasets to Google Dataset Search?

Google Dataset Search relies on exposed crawlable structured data via markup, using the dataset class. DataCite exposes an index of such crawlable data thanks to DataCite Search. 'Dataset' items (according to the resourceTypeGeneral) found in DataCite Search will show up in Google Dataset Search with a DOI link, as well as a link to the source record in DataCite Search. The DOI link will resolve to your dataset’s regular landing page.


While we do our best to enable indexing of DOIs for datasets, DataCite has no control over the processes and time it takes Google to index "Dataset" items in Google Dataset Search.

If you would like your datasets to also show up in Google Dataset Search with a direct link to your own repository as the source, then you should expose the appropriately crawlable structured data by implementing markup (using the dataset class) on each landing page in your repository.

One easy way to do this is by enabling our Content Negotiation Service in your landing pages. For example, you can include this javascript file that will return Metadata in marked-up JSON dynamically through the power of our Content Negotiation Service. Just add a <script></script> tag with the file to the script to your landing pages template, and whenever that landing page is requested, the script will append the appropriately marked up metadata in markup.


Your datasets should appear in Google Dataset Search if:

  1. They have the Findable state (which is what makes them indexable).
  2. They have Dataset as the resourceTypeGeneral in the metadata you have registered with DataCite. Text items, for example, won't appear in Google Dataset Search.
  3. You implement markup on your datasets' landing page and use the dataset class (your datasets will appear in Google Dataset Search with a direct link to your repository as the source).

Google updates the data they show on a regular basis, but their schedule is out of DataCite’s control. As long as your DataCite-registered DOIs are Findable and are tagged as datasets, they will appear in Google Dataset Search once Google has re-indexed.


Sitemap best practices

Use a sitemap file to help Google find your URLs. Using sitemap files and sameAs markup helps document how dataset descriptions are published throughout your site. More info:

For more information on exposing your datasets to Google Dataset Search, see Google's help page on the Dataset content type.

If you’re not sure whether your repository landing pages contain the appropriate structured data, you can test them using Google’s Structured Data Testing Tool.

Did this page help you?