Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This interactive whitepaper aims to establish a standardised method of requesting data from data processors and controllers in the European Union.
The whitepaper is substantiated by an API proposal definition, following the OpenAPI 3 specification. Documentation for the API specification is available here. Additionally, the whitepaper describes a background, and rationale for a data request API, a substantive proposal and considerations for future use.
The Open Data Rights API is a work in progress. Both the API specification and this document are tracked in a GitHub repository. Comments, feedback and discussions are welcomed in the GitHub issues section of this repository. Pull requests are welcomed as well. Any further questions or comments can be directed to Lei Nelissen.
This whitepaper is written and created by Lei Nelissen, as part of his graduation project at the Industrial Design department of the Eindhoven University of Technology.
Given the rough state of exercising data rights, many solutions could be put forward. Why propose an API for data requests? The argument we make is that a standardised Data Rights API offers more technically robust, secure, accessible and compliant data rights experience. All the while, organisations have the opportunity to reduce compliance costs by following best practices, rather than inventing them.
If an API is standardised and accessible for third parties, the front-end and back-end of any data rights experience is no longer tightly coupled. This provides organisations the opportunity of using available, commodified solutions to act as a user interface to exercise of their data rights.
Given that the back-end is built on well-known standards, an Open Data Rights API provides crucial assurances to a user-interfacing application: we can identify you, we can identify what data we process, which of that processing applies to you, and what that data looks like. These standardised assurances make building on top of this API in an efficient way much easier.
The user-facing aspects of a data rights experience have the strongest cards to delight or depress a citizen. Inventing this wheel again and again not only leads to fragmentation, but frustration on part of the end-user. With assurances about what an API provides, competition between various solutions that consume the API is a lot more levelled.
Moreover this competition means reduced compliance costs for organisations. While they may offer a front-end for exercising data rights, these front-ends are interchangeable and commodified. Organisations no longer need to lock in to a particular compliance vendor, if the product is missing their needs. If the front-end is based on open-source tools, the organisation can choose fork and adjust it to its needs.
Finally, organisations may even make the case that if third party software is available to connect to the API, they should be exempt from offering anything but a data request API. This reduces engineering, design and management effort and thus costs.
Implementing systems and processes that allow access to huge amounts of personal information is not easy. Small gaps in implementation or processes can have dire consequences. Imagine an identifcation process gone wrong, and a user receiving someone else's data. Moreover, poor security practices can lead to massive fines.
In cryptography, there is an accepted axiom that "you don't roll your own crypto". Rather than thinking of a poor implementation yourself, we use well-tested and well-designed libraries that do this kind of work for us. We've accepted that using OpenSSL, OAuth and bcrypt provide both qualitative and provably secure benefit to development. This is especially true when compared with systems we can design and implement ourselves. The Open Data Rights API provides a robust set of API design principles that ensure that data is secure and private.
Currently, the European Union does little to clarify how the law implements practical considerations in systems and user experience design. Thus, all best practices we know are built off of individual organisations making interpretations themselves. Thus, we end up in a space where smaller organisations follow the wave of their larger precursors, without understanding the rationale behind their decisions.
With the Open Data Rights API, best practices concerning data rights are made glaringly obvious. Meanwhile, rationale behind design decisions should be made clear, so that individual implementers understand why things are built the way they are. This affords smaller and larger organisations insight into these considerations at little to no cost.
Lastly, a Open Data Rights acts as a forum where discussions regarding data rights are held. This encourages continuous development of these best practices across the industry as a whole, rather than within individual organisations. This increases the amount of eyeballs that watch over technical accessibility, feasibility and security.
The GDPR has introduced stricter and more substantive requirements for organisations that process data. Meanwhile, it affords a number of data rights to citizens that organisations are obligated to facilitate. Not doing so, presents the potential for huge fines. In this chapter, the changes made by the GDPR are highlighted. These are contrasted with what the current practice of exercising data rights looks like.
Note: I am not a lawyer and this is not legal advice. Also, find a glossary of GDPR terms here.
The General Data Protection Regulation is a piece of legislation from the European Union that constitutes a shift in thinking regarding the regulation of personal data. Next to promoting standards and practices for data-protective measures, it establishes a broad framework under which the processing of personal data is considered legal. Additionally, the law affords a number of rights to individual citizens regarding personal data that is linked to them.
A major shift is that the the application of the law is not dependent on where the data is processed, but rather whether the the data owner is a European citizen. Thus, if a non-European organisation wants to cater to European citizens, they must follow the GDPR.
But what is it that organisations must comply with? Next to a host of specific details (ie. consult your lawyer), for the purpose of this whitepaper, we'll zoom in on two specific GDPR concepts: Lawfulness of Processing and Data Subject Rights.
Under the GDPR, each processor (or data-processing organisation) must prove they have a valid reason for processing personal information. These valid reasons are known as processing grounds and the resulting list of data and reasons is known as a data processing register. These processing grounds are covered by Article 6 of the GDPR. The six grounds for legal processing are as follows:
the data subject (or individual) gives consent for processing
processing is necessary for a contract
processing is necessary for legal compliance
processing is necessary to protect the vital interests of someone
processing is necessary for a task in the public interest
processing is in the legitimate interest of the processor, unless it violates the rights and freedoms of the data subject
The last one is slightly vague, and subject of ongoing discussion. Yet, we're not here to make judgements. After an organisation determines what data they process, based on which legal ground, they must share this with their data subjects. Then, they are free to process this data, with the limitation of lawsuits, legal intervention, etc.
In return for allowing the processing of their personal information, individuals (or data subjects) gain a number of rights. These rights are supposed to give individuals a sense of transparency and control over data that is essentially theirs. These rights are covered by Chapter 3 of the GDPR. We will cover them summarily as follows:
Per Article 12, 13 and 14, data processors must inform individuals of which of their personal information they are processing (and which not). Article 12 also establishes that organisations have up to one month (optionally extensible by two extra months to process and complete the requests.
Also, data processors must communicate clearly about their data practices. Article 22 extends this right to automated decision making, more commonly known as artificial intelligence, machine learning, or more generally algorithms.
Finally, organisations must make it easy and accessible for data subject to excercise their data rights. This includes clear communication and notification, per Article 19. Lastly, exercising data rights is always free.
Per Article 15 of the GDPR, data processor must allow individuals to get access to personal information that is being processed of them. Processors must be able to hand over a copy of the data being processed belonging to the data subject. Following Article 20, this data must be provided in a commonly-used, and machine-readble format.
Per Article 16, the data subject has the right to rectify incorrect or incomplete information. Moreover, per Article 17, the data subject has the right to have parts of their personal information be removed. While there are restrictions to the application of this right, in most cases, revoked consent covers the basis of this right.
Based on the data that the data subject retrieves, data subject may object to or even restrict certain data processing practices, per Article 18 and Article 21. This right extends to automated-decision making, such as profiling, per Article 22.
If organisations fail to comply with these, or other obligations they have under the GDPR, a local Data Protection Authority may impose a fine for the breach of obligations. The maximum for this fine is set at either €20M, or 4% of global annual turnover, whichever is higher. The potential for staggering fines is high, and over two years since becoming active, at least €259M in fines have been awarded to date.
While the General Data Protection regulation is specific to the European Union, it has inspired other pieces of legislation in the world. At least the California CCPA and Brazilian LGPD contain similar provisions and rights as the GDPR does. Several other US states, Canada, India and Australia are considering new personal information legislation, which are likely to find inspiration in the GDPR (source). Thus, investing in the GDPR is a safe bet, even if it at present only applies to a subset of customers.
The Open Data Rights API is aimed at improving the status quo of exercising data rights. In this proposal, we clarify the goals of the initiative. Further development, cooperation and implementation efforts are made according to these goals.
Organisations must provide a form of processing data rights. We strongly believe that electronic means for doing so constitute the best way of doing so. By providing common patterns and best practices for creating these electronic means, we aim to increase adoption and widespread use of data rights systems, not only in Europe or companies covered by the GDPR, but universally.
Adoption by organisations should be as easy as possible. The specification must support the 80% use case, while facilitating th 20% use case. Documentation should be high-quality and plentiful, not only covering the specification, but implementation from different perspectives. Moreover, the documentation should provide guides for going through the process of supporting data rights as a whole, not only covering the implementation of a data request API.
Exercising data rights should be feasible for any citizen. Finding out how citizens can exercise their rights and facilitating that should be as easy as humanly possible. The aim of these systems should therefore focus on human aspects of data rights.
Personal information should be stored, processed and be made available securely. By providing well-known and tested frameworks for facilitating data rights, we aim to make the process of exercising data rights more secure.
We believe that exercising data rights is an important aspect of digital citizenship. The law allows citizens to verify agreements that are made with an organisation. As this process that is based on trust, we place emphasis on that a delightful process increases this bond of trust that exists between a data subject and data processor. Therefore, increasing ways of facilitating delight in user interaction with regards to data rights is a must-have.
Data rights provide an opportunity to organisations to demonstrate their commitment to their users privacy. As awareness on the necessity of such rights rises with citizens, the neccessity for organisations to address those rights is evident. By promoting easy, accessible and easy-to-implement standards we aim to facilitate the sorts of conversations in organisations that increase this commitment to user privacy.
Data privacy increasingly is becoming a right that many citizens are concerned about, but feel increasingly powerless over. Data rights are handles that facilitate better reflection on what actually is happening, as well as providing citizens power to change their situations. By making exercising data rights easy, we aim to stimulate societal discussion on what personal information means to individuals. By making data rights accessible, we aim to increase the awareness that citizens have on what power they have to command their personal information.
What does a data request API exactly look like? In this chapter, we cover goals and the means that are put forward for achieving these goals.
The brief GDPR introduction, which only covers a subset of the GDPR, means that companies must adjust many of their processes to become compliant. Yet, despite four years gone by since the passing of the GDPR, many organisations still lack rigorous processes for dealing with personal information and subject data rights. This applies more so for the data rights that subjects enjoy.
In a litmus test for data access requests conducted in 2020, 59 organisations received requests for data access. At the 30 day mark, little over half of surveyed organisations had succeeded in responding to the data request. Even after 90 days, 20% of data requests remained unresolved, despite repeated attempts at progress.
Looking at individual requests reveals a picture of organisations figuring out processes as they go. Some gathered information over insecure channels such as email, while other performed little validation of user identity. The poor security of data request practices are corroborated by a wide gamut of researchers, eg. Martino et al. (2019), Pérez-Solà et al. (2019) and Boniface et al. (2019).
Meanwhile, large tech companies have the legal and engineering resources available to construct infrastructure that is capable of servicing data requests at scale. Yet, all solutions available (by e.g. Google, Apple, Facebook, Twitter, Spotify) are custom-made and homebrew. This makes it hard for other organisations to follow suit, while citizens face multiple interfaces and paradigms for achieving the same basic task.
This page describes the structure of the public API that must be exposed by a Open Data Rights-compliant API. The API endpoint is described in the Open API 3 standard in the GitHub repository. Please find a detailed overview of responses, parameters and endpoint in the API Endpoint Documentation.
The API consists is REST-based, and can be operated language-agnostically. All responses are either in application/json
, text/plain
or application/zip
for specialised endpoints. Any POST parameters must always be supplied in a application/json
request body. Requests which require authentication must follow OAuth standards, with use of a Bearer
-token on the Authorization
header.
Authentication for this API is OAuth based, and are described more intricately on the Authentication page:
This endpoint describes the data practices of the organisation, as well as how those practices impact the user. The inner workings of the data endpoint are documented on the Describing Data page:
The settings endpoint describes the operation criteria for the specific instance of the Open Data Rights API. The available parameters, along with implementation instructions are available as part of the API Endpoint documentation.
This endpoint describes currently ongoing requests, how to request a new archive, archive status and archive download when the request is complete. As part of a new request a third party may supply a JSON array with all context
-types that are requested from the organisation for the user. If no request body is supply, all processed data types are requested for the authenticated user.
TL;DR: OAuth 2
The first step towards exercising data rights is ascertaining the identity of the user. The Open Data Rights API builds upon the foundations in authentication that are laid by OAuth 2.0.
Typically in OAuth, an organisation has a process in which clients that are able to make use of the OAuth endpoint are checked and granted access. However, in the Open Data Rights API, any third party is allowed to authenticate users for making data requests. Hence, rather than each implementer of the API requesting a client, an organisation must create a (hardcoded) client that can solely be used to authenticate any Open Data Rights API endpoints. In the authentication flow, the organisations may refer to the Open Data Rights API in name, as well as use its logo in connection tot his public Client ID.
For authentication, organisations must use the Authorization Code
flow with support for Proof Key for Code Exchange (PKCE)
, as described in RFC7636. The endpoint for the URL and the Open Data Rights Client ID must be made public using the /settings
endpoint. The OAuth scope that enables all Open Data Rights endpoints must be named open-data-request-api
.
Organisations are free to establish seperate OAuth servers for Open Data Rights API purposes. Alternatively, they may re-use existing OAuth servers in connection to the Open Data Rights API.
In case of seperate OAuth server, organisations specify a oAuthUrl
that refers to the base endpoint for this OAuth server. Implementing software then appends the /authorization
and /oauth/token
endpoints to the base OAuth URL.
In the latter case, organisations may specify a oAuthUrl
that contains a metadata endpoint that specifies endpoints for the Open Data Rights API to use. This URL must refer to a RFC8414-compliant metadata specification for OAuth. Do note that if you are implementing an extending service, the OAuth server must support the Authorization Code (PKCE)
flow for the open-data-request-api
scope.
While many organisations have processes for data rights (some electronic even), they differ between organisations. Particularly smaller organisations suffer from a lack of attention to make this process feasible and accessible for citizens. We consider this a pity, because dealing with these processes in a thorough manner showcases the attention that should be expected from an organisation processing personal information. A great user experience reinforces the idea that organisations are accountable and diligent with the data they are provided. Doing the process right, increases consumer trust, which is beneficial for revenue (e.g. Tsai et al., 2011; Tang et al., 2008; Udo, 2001).
We strongly encourage the development of a standard range of tools that can act as a front-end to the Open Data Rights API. Work is already being done here in the form of Aeon, a desktop application that automates and archives data requests. Companies can either adopt such tools, or build their own. Yet, the choice for a suitable front-end lies with the user. This promotes developing strong user experiences in the realm of data rights.
At this point in time, the API is built around information transparency and data access rights. Yet, the GDPR is broader and there are more data rights. Notwithstanding any changes in scope, or incorporation of other aspects (such as a front-end or UX), there are a number of areas which are ripe for consideration in the current structure of the Open Data Rights API.
Next to right of access, citizens enjoy other data rights. Most importantly, when citizens have inspected their data, they have the right to have some (or all) of it erased. Additionally, when their informatino is incorrect or incomplete, they have the right to rectify that information. While the current API design includes indication for whether this right can be exercised for specific types of data, it offers no infrastructure for carrying out those changes.
One could imagine a set of endpoints that build upon /requests
, that incorporate requests concerning these rights as well. Building these on top of the current infrastructure that describes data semantics is relatively trivial.
Meanwhile, operating a one-way organisation-wide infrastructure that give users access to all their information is one thing, making that system able to exercise control over that data is another altogether. Organisational practices must be well-considered when incorporating these rights into the Open Data Rights API. A wide range of implementations is viable concerning this right.
In addition to erasure, citizens may demand a restriction of the processing of some of their data, based on a number of grounds. While these are similarly easy to implement at face value, the particular gorunds on which they rest are more intricate. This process should be well-considered from a legal standpoint to ensure feasibility and operatability for both citizens and organisations alike.
The GDPR contains a full chapter addressing transfers of data to third parties or across international borders. Many provisions and conditions apply to this practice, which should be available and accountable to citizens as well. The Open Data Rights API should facilitate and standardise how this practice can be shared with end-users.
Moreover, the current landscape of data processors and controllers can be quite intricate when organisations reach a large scale. Standardisation could be helpful to make practices between controllers and data processors more easy, all the while making data more accessible for citizens.
The current proposal for a Data Rights API is neither perfect, nor complete. This chapter describes practices, ideas and avenues that must be explored and implemented in future iterations of the API. Where we invite practical scrutiny of the Proposal chapter, we invite more theoretical consideration of the following:
A large aspect of both GDPR compliance and user understanding of data being processed comes down to the question what data types are being processed. With this being a central part, the Open Data Rights API needs to define semantics describing data types as well. Fortunately, major strides have already been made in the department of data semantics in the form of schema.org and JSON-LD. The Open Data Rights API adopts both for the purposes of providing data semantics.
These semantics are broken up between how data processing is described, as well as how the eventual data is described in a data archive.
Data is processing is described as part of the /data
and /data/me
endpoints. The former endpoint describes the data processing practices for the organisation at large. Both endpoints are described in detail in the API documentation.
For the organisational practices, all possibly processed data types must be listed. For each data type, organisations must indicate the following:
What schema.org / custom context the type is based on
The processing ground for this type of information
Whether the data may be erased by the user
Whether the data may be rectified by the user
A human-readable description for the data type.
Lastly, the /data/me
endpoint constitutes an array of all context-based data types that are being (possibly) being processed in connection with the authenticated user.
When data is eventually exported using the API, the data must be made available in a particular format. In the case of the Open Data Rights API, this data is formatted using JSON-LD
and schema.org
definitions. Organisations may create many files containing data, as long as data is always formatted using JSON-LD
. Additional file types, such as images, videos and other non-structured media, must be included as well. Any included non-JSON-LD files must be referenced from at least a single JSON-LD file. The archive must be ZIP file.
The archive may follow a self-defined internal structure, except for a a folder at the root level with the open-data-rights-api
name. This folder contains another folder called schemas
, in which each JSON-LD @context
property that is used is archived as well. In this folder, each schema is contained in a seperate JSON file, of which the content exhaustively describes the schema. Also, two files called data.json
and data-me.json
describe the output for the respective endpoints at the time of archive creation.
As the Open Data Rights API find more widespread adoption, ensuring quality of implementation should be a high priority. In order to get more involved in this process, a certification process could be drawn up.
Given that proper governance and regulation has taken place, a certification process could be drawn up for organisations participating in the API. This certification process should include strict and continuous review of API implementations, and the process as a whole. If organisations comply with these standards, they should receive certification. This certification might be operationalized in the form of a check-mark which carries the weight of the Open Data Rights community.
Through such cooperation with organisations, the Open Data Rights API can more closely coordinate and improve the status quo of data requests.
Given that the requests concern the entirety of a person's personal information within an organisation, the Open Data Rights API must be secure without question. To support continuous guarantees on security, practices need to be established in this area.
Firstly, the security of the current proposal must be proven. Correspondingly, we strongly encourage a security audit to be completed in the near future. We consider this essential before a definitive v1 release is made. After completion, learnings should be incorporated in the further development process. Further, continuous security audits should be a regular staple of this development process.
Secondly, a process needs to be established where critical vulnerabilities can be (confidentially) accepted and addressed within the smallest frame of time. This goes beyond a GitHub issues checklist for serious security issues. Secure infrastructure for this communication must be setup and monitored. Additionally, manpower must be made available to verify and accommodate these issues.
Thirdly, common implementations on both front- and back-end should be regularly tested and scrutinised for particular implementation or security faults. We encourage the Data Rights API to not only take responsibility for specification, but implementations as well. These practices should increase the security of the ecosystem at large.
Given the wide variety of available data sources, the process of implementing data request for large organisations is positively massive. Any efforts that can help make this process easier should be an accepted part of Open Data Rights API practice.
First of all, we encourage the development of open-source reference implementations of the Data Rights API. These implementations could also feature dashboards and experiences that facilitate the process of data requests on the organisational side. These reference implementation should support a wide gamut of languages.
Secondly, a number of plug-ins for common data sources could be made available that easily integrate with Open Data Rights API-based systems. These should face the same scrutiny as the Open Data Rights API. If this development ends up being popular, further standardisation is certainly desirable.
Thirdly, reference implementations for citizen-facing applications implementing the Open Data Rights API should be considered as well. If this development is significantly desirable, a particular solution for citizen-facing applications could be developed, standardised and maintained by the Open Data Rights community.
The GDPR's Article 12, subsection 7 strongly promotes a set of standardised icons to facilitate the understanding of processing by citizens. When done so electronically, the law stresses that these icons must be machine readable.
Given the Open Data Rights API's involvement in user experience, it could be desirable to create a set of reusable, standardised icons in connection of data processing. This set of icons should cover at least:
All data rights
All grounds for processing
A large majority of data types
This process could be undertaken along with another open source community, or perhaps be based on an existing, comprehensive icon set. Regardless, the user experience, accessibility and usability of these icons must be significant.
If and when the Open Data Rights API finds application, stakeholders will need to be more consistently and meaningfully involved in the governance process of it. As of current, the Data Rights API finds a beneficial dictator in the form of a single maintainer. In case of adoption by organisations, the process of change needs to be structured.
First and fore-mostly, organisations need to be part of the conversation on how they facilitate data rights. In the future, the Open Data Rights API (and derivative trademarks, etc.) should be part of an independent, not-for-profit organisation. Initially, this organisation could be stewarded by another organisation (e.g. the Linux Foundation or Internet Society) with the knowledge to support such a project. This organisation should allow membership from stakeholders, both commercial and non-commercial.
Given the basis for membership, the rights and duties that are associated with it must be defined. While most of these have no need of definition right now, at the very least a process for participation in evolution of the standard must be defined. Preferably, some sort of Request for Comment process needs to be set up. In this process, member must be able to (unsolicitedly) submit proposals for change. Additionally members must be able to participate in decision making by means of voting.
These functions need to be supported legally, by means of statutes, incorporation and codification of procedures. In the meanwhile, this project is governed by GitHub issues. We invite anyone to create an issue to discuss an aspect of the API, or submit a Pull Request against the OpenAPI specification or documentation.
Citizens may not always know which organisations are processing their data. In this case, it could be desirable to have an authority that keeps track of which organisations process what data, and where their API is available. Regardless on whether this needs to become a registry, there will be a need for standardising the discovery of Open Data Rights APIs.
Multiple models for facilitating this idea can be though of, both centralised and decentralised. However, we encourage the Open Data Rights community to strongly consider the user experience of data rights when operationalising these ideas.