Processing Views and Downloads

Minimal log information

📘
COUNTER Code of Practice
This is a summary of subsections of the COUNTER Code of Practice for Research Data (COUNTER CoP for RD).

When looking at your raw logs, there is certain information you need to consider. First, only successful requests (HTTP status codes 200 and 304) should be counted.

The CoP suggests the following items that can be tracked, although some items may be optional if the repository system implementation doesn’t support them:

Date and time
Request IP address
Session cookie ID: an ID kept in a session cookie which only lives as long as the browser/session is open
User cookie ID: an ID that identifies a user session that may persist past the closing of the browser window to future visits
Username or user ID: identifies a user of the system because they have logged in or identified themselves
The requested URL
The DOI name: the DOI which uniquely identifies this dataset
The size of the dataset: size (only needed for requests which download a dataset and not for investigations which display landing pages or metadata about a dataset)
The user-agent string being sent by the client

Log Processing and Enrichment

Your log processing implementation will need to further process logs and enrich them:

Classify logged URLs as either an investigation or a request

First, the logs processing implementation will need to classify logs between investigations and requests. For example, the Counter Processor from CDL does this by way of regular expressions that check against the URL path. (See COUNTER CoP for RD section 3.3.4: Metric Types).

Classify the User as a Robot or Not by the User-Agent

Your implementation will need to classify the logs between those produced by robots or humans. You can achieve this by comparing the user-agent in the logs against the official list of robots and machines from the MDC project. For example, the Counter Processor from CDL divides this existing list into robots and machine agents lists (available in the Make-Data-Count GitHub repository. There is one text file for robots and one text file for machine agents. These lists are regular expressions separated by newlines in each text file. The Counter Processor retrieves these lists and uses them to classify log lines by the user-agent in each line.

Obtain Country-Code for IP addresses

Although not mandatory, you can enrich your logs with country codes to enable more granularity. For example, the Counter Processor uses a service called freegeoip.net for IP to location lookups. It is free, has the code available on GitHub, is community supported and allows a generous number of API calls per hour (10,000) to their already existing API server. This service provides country, state, and often city/locality IP address geolocation.

Generate a Session ID

The COUNTER CoP for RD has several rules for tracking user-sessions which don't match up with traditional sessions in a web application or web application framework, though they may include those concepts in their session calculation.

COUNTER CoP for RD section 7.2: Double-click Filtering identifies ways to eliminate double-clicks for the same URL by the same user-session within 30 seconds. Similarly, the COUNTER CoP for RD seeks to identify unique dataset visits and unique dataset volume. The unique identifier is described in COUNTER CoP for RD section 7.3: Counting Unique Datasets and is similar to the double-click identification.

Enrich with DOI Metadata

Finally, you will need to enrich your logs with DOI metadata. The COUNTER CoP for RD requires submitting descriptive metadata along with statistics for datasets. The descriptive metadata may either be logged at the time a dataset is accessed or metadata enrichment may take place as part of the log processing. The list below contains the mandatory DOI metadata fields that should be included:

dataset title
publisher
publisher ID
creators
publication date
dataset version
URL
publication year

This metadata can be obtained by querying the the DataCite REST API (see Retrieving a single DOI).

You can also validate your reports against the JSON schema version of the COUNTER Code of Practice.

📘
Counter Processor is an external software not maintained by DataCite that can be used to process usage logs for contribution to the Usage Reports API. Counter Processor was created by California Digital Library and is now maintained by Global Dataverse Community Consortium. If you are using Counter Processor v0.1.04 or earlier, we recommend updating to a later version released by the Global Dataverse Community Consortium to maintain compatibility with the Usage Reports API.

Updated 5 months ago

Minimal log information

📘COUNTER Code of Practice

Log Processing and Enrichment

Classify logged URLs as either an investigation or a request

Classify the User as a Robot or Not by the User-Agent

Obtain Country-Code for IP addresses

Generate a Session ID

Enrich with DOI Metadata

📘

📘
COUNTER Code of Practice