Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: Retention period. #9375

Closed
PaulBoon opened this issue Feb 9, 2023 · 4 comments · Fixed by #10336
Closed

Feature Request/Idea: Retention period. #9375

PaulBoon opened this issue Feb 9, 2023 · 4 comments · Fixed by #10336
Labels
Feature: File Upload & Handling Type: Suggestion an idea User Role: Superuser Has access to the superuser dashboard and cares about how the system is configured
Milestone

Comments

@PaulBoon
Copy link
Contributor

PaulBoon commented Feb 9, 2023

Retention period functionality overview:

It should be possible to specify a retention period with the result that the data wil be automatically made unavailable after the period. The motivation is GDPR related; privacy sensitive data in files should be retended.

Original motivation or use-case:

In our Dataverse instance (DataverseNL), we have some datasets on medical topics from the Dutch university medical centres. Due to our national legislation, these medical datasets should be deleted after a defined period. (10 or 15 years). Therefore it would be useful to have a metadata field to store information about the retention period, so the datasets that are reaching the end of the retention period can be easily found and proper measures on file level can be taken.

Proposed functionality:

The following sections describe the proposed behaviour in the chronological consecutive stages:

  • Retention specification
    Allow to specify a retention period for a file. Similar to embargo we should be able to specify the period on a file. Using the File options dropdown.
    We should have the 'Retention' option possibly just below the 'Embargo' but before the 'Delete' option. There will then be an 'Edit Retention' dialog (similar to embargo), where you select a day for the retention and an optional reason.
    The data is made unavailable on (and after) the day being specified.
    When set in different versions, the earliest date 'wins' and will override the others.

  • Pre removal information
    Like with an embargo there is information available about the file that has a retention set, but that it is not removed yet. On the File page and in search result the Status message will indicate the retention date. Also the Dataset status will indicate that there is some retention set.

  • File removal/deaccession
    When the file is made unavailable it can not be downloaded or previewed. The deletion of the data file on storage (which would make it irreversible) could be later. If the file is 'retended' it is made unavailable for every version. The consequence is that if it is specified in an earlier version it will also be made unavailable in the later version.
    The files won't be deleted from storage automatically by dataverse, instead there should be another process (manual or otherwise) to do the actual deletion if needed.
    It should be possible to configure some kind of action that is automatically executed (like the pre- and post publish workflows) when a file becomes 'retended'. The action itself could be sending a simple email to some sysadmin or update some database, whatever it does is up to the implementer.

  • Post removal information
    There should be a landing page (thombstone like) for the files being removed. Especially if a Dataverse has 'PID for files' configured it would be a real problem if we did not provide a decent landing page. The file metadata should remain visible and also the retention information; date and reason.

@philippconzett
Copy link
Contributor

This sounds like a useful feature. It reminds me of how DataverseNO currently manages embargoes (we're still on version 5.6, so we haven't implemented the embargo feature); see our Deposit Guidelines: The dates entered into the Distribution Date field, are checked by a chron job every night, and a message is sent to our ticket system one month before the specified date, so that the curator(s) in charge of the dataset can be noticed, who then will contact the dataset owner to clarify whether the files indeed can be released.

@PaulBoon
Copy link
Contributor Author

PaulBoon commented Mar 7, 2023

An alternative interpretation of the retention period is that it is the period the data must be available, and it does not have to be after the period. In this case the data should not be made unavailable automatically, but just marked as 'no need to keep', or 'could be made unavailable'.
This could be supported by the functionality that we suggested, if we make the 'automatic deaccession' optional.
Another requirement could then be that it should be possible to compile a list of files (and their datasets) for which the period has passed.

@PaulBoon
Copy link
Contributor Author

Started implementing minimal functionality: #10336

@PaulBoon
Copy link
Contributor Author

PaulBoon commented May 1, 2024

@sekmiller Thanks for the testing and feedback on that PR

@pdurbin pdurbin added this to the 6.3 milestone May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: File Upload & Handling Type: Suggestion an idea User Role: Superuser Has access to the superuser dashboard and cares about how the system is configured
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants