Google Dataset Search relies on exposed crawlable structured data via schema.org markup, using the schema.org
dataset class. DataCite exposes an index of such crawlable data thanks to DataCite Search. 'Dataset' items (according to the
resourceTypeGeneral) found in DataCite Search will show up in Google Dataset Search with a DOI link, as well as a link to the source record in DataCite Search. The DOI link will resolve to your dataset’s regular landing page.
If you would like your datasets to also show up in Google Dataset Search with a direct link to your own repository as the source, then you should expose the appropriately crawlable structured data by implementing schema.org markup (using the
dataset class) on each landing page in your repository.
<script></script> tag with the file to the script to your landing pages template, and whenever that landing page is requested, the script will append the appropriately marked up metadata in schema.org markup.
To ensure your datasets will appear in Google Dataset Search:
- They must have the Findable state (which is what makes them indexable).
- They must have
resourceTypeGeneralin the metadata you have registered with DataCite.
Textitems, for example, won't appear in Google Dataset Search.
If you also want your datasets to appear in Google Dataset Search with a direct link to your repository as the source, you must additionally:
- Implement schema.org markup on your datasets' landing page and use the
Google updates the data they show on a regular basis, but their schedule is out of DataCite’s control. As long as your DataCite-registered DOIs are Findable and are tagged as datasets, they will appear in Google Dataset Search once Google has re-indexed.
Sitemap best practices
Use a sitemap file to help Google find your URLs. Using sitemap files and sameAs markup helps document how dataset descriptions are published throughout your site. More info: https://developers.google.com/search/docs/data-types/dataset