Secondary Use Prevention in Large-Scale Data Lakes

Shizra Sultan*, Christian D. Jensen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


Large-scale infrastructures acquire data from integrated data lakes and warehouses managed by diverse data owners and controllers, which is then offered to a large variety of users or data processors. This data might contain personal information about individuals, which if not used according to the data collection purposes can lead to secondary use that may result in legal ramifications. The significance of data often increases with different transformations and aggregations when new linkages and correlations are revealed, making it valuable for users. However, with continuous transformation and new emerging data requirements of different users, often it is difficult for controllers to monitor resource usage closely, and collection purposes are overlooked. Hence, in order to limit secondary use in large-scale distributed environments, the collection purposes for the resources need to be preserved for data through the different transformations that they may undergo. We, therefore, propose to record the collection purposes as part of the resource metadata or provenance. This way it can be preserved and maintained through different data changes and can be used as a deciding factor in limiting the exposure of personal information for different users or data processors. This paper offers insight into how collection purposes can be described as a provenance property, and how is it used in an access control mechanism to limit secondary use.

Original languageEnglish
Title of host publicationIntelligent Computing
EditorsKohei Arai
Publication date2021
ISBN (Print)9783030801281
Publication statusPublished - 2021
EventComputing Conference 2021
- Virtual event
Duration: 15 Jul 202116 Jul 2021


ConferenceComputing Conference 2021
LocationVirtual event
Internet address
SeriesLecture Notes in Networks and Systems

Bibliographical note

Funding Information:
The ‘legal basis’ is the foundation for the lawful (personal) data processing required by different data protection legislation. It means that whenever DC collects and processes personal data for whatsoever ‘purpose/s’, there should be specific legal grounds to support it. A legal-base is a set of different laws/rules that grants DS rights about how their personal information should be managed. The legal base also binds the DS, DC, and DP, with their respective rights and obligations towards each other. Some examples of valid legal bases supported by the GDPR are Consent (explicit permission to use data), Contract (formal contract to which the DS is a party). Legitimate Interest (often followed by consent or contract, in which the DC already has the data), Public Interest (processing data in an official capacity for the public interest), Legal Obligation (Data processing complies with law (local, federal, global)), and Vital Interest (Data processing in order to save someone’s life) [22]. Not all legal basis grant the same rights to DS, and differ in situation, as mentioned in Table 1.

Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.


  • Access control
  • Data lakes
  • Privacy
  • Provenance
  • Purpose
  • Secondary use


Dive into the research topics of 'Secondary Use Prevention in Large-Scale Data Lakes'. Together they form a unique fingerprint.

Cite this