Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For GDPR compliance how to delete user data from data lake efficiently? #59

Open
kanistha opened this issue Aug 5, 2019 · 0 comments
Open

Comments

@kanistha
Copy link

kanistha commented Aug 5, 2019

I have a question related to GDPR compliance needs to delete user data from data lake when user request to delete the account. Currently we are storing user data for data analytics in Azure Data lake with following configuration:

  • Type: Data Lake Storage Gen1
  • Data format in Data lake: Avro
  • Using default partitioning based on time

We are using de-Identified data lake approache to be inline with data privacy challenges by de-identifying and protecting sensitive information before it even enters a data lake. By minimizing the storage and use of personally identifiable information. So before storing data into data lake we are making data with random id. Is it still required to delete the non-personally identifiable information from data lake to be compliance to GDPR? If so, is there an efficient way to delete the user specific data from data lake as azure data lake store is an append-only file system. Data once committed cannot be erased or updated.

Please let me know if you need any further informations.

Thanks a lot for your help in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant