Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS 9375 - Retention period #10336

Merged
merged 46 commits into from
May 1, 2024
Merged

IQSS 9375 - Retention period #10336

merged 46 commits into from
May 1, 2024

Conversation

PaulBoon
Copy link
Contributor

@PaulBoon PaulBoon commented Feb 22, 2024

What this PR does / why we need it:
The minimal implementation of the retention feature, which will make a file unavailable after the retention period (or end date) has expired. This is opposite to the embargo feature, because that makes it available after the embargo expires.

Which issue(s) this PR closes:

Closes #9375

Special notes for your reviewer:
After the retention expires any (optional) full-text indexing will remain, potentially exposing information via searching that should not be exposed anymore. This PR prevents this by not allowing full-text indexing for files with a retention, even if is has not expired. This is similar to how restricted files are handled. This is not optimal, but save and simple.

Suggestions on how to test this:

The Solr schema.xml has changed, so you need to update this in your existing dev machine.

  • Copy the schema.xml file; with the docker dev setup this wil be docker cp conf/solr/9.3.0/schema.xml 3bfcc68124cb:/var/solr/data/collection1/conf/schema.xml.
    When you have custom metadata blocks you then need to run the update script, which is in conf/solr/9.3.0/update-fields.sh.
    Let Solr reload that changed schema.xml
    curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"

  • Configure the system to allow retention:
    Set the minimum at 10 years for example:
    curl -X PUT -d 120 http://localhost:8080/api/admin/settings/:MinRetentionDurationInMonths

  • Add a retention to a file

    • Log in, create a dataset and upload at least one file, save this dataset so you then have a fresh draft.
    • From the 'File Options' dropdown select 'Retention', the 'Edit Retention' dialog should display.
    • Just pushing the 'Save' button will work.
    • Publish the dataset
  • Make the retention period expire, by changing the database record

    • Get into the dev system that runs postgres and start a 'psql' session.
      On the docker dev box, the command is psql -U dataverse -W and the password is 'secret'.
      Change the 'dateunavailable' to somtime in the past; for example:
      UPDATE retention SET dateunavailable='2015-02-28' WHERE id='1';
      The id is '1' in this example, but you could check the retention_id field in the datafile table for that specific file.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

There is a dialog to set the Retention:
Screenshot-dataset-file-editretentiondialog

When the retention has expired, the Dataset result shows that as a label, like with embargoed.:
Screenshot-dataset-result-withretentionexpired
Note that this test dataset also has an embargoed file.

When trying to download it will not provide it, but instead show that the retention has expired:
Screenshot-datafile-download-filewithretentionexpired

Is there a release notes update needed for this change?:

Yes, The Solr schema must be updated

Additional documentation:

NOTE the flyway script src/main/resources/db/migration/V6.1.0.99__xxxx-retention.sql should be renamed depending on the last number when merging.

@coveralls
Copy link

coveralls commented Feb 22, 2024

Coverage Status

coverage: 20.618% (-0.09%) from 20.707%
when pulling 8758557 on PaulBoon:RetentionPeriod
into 0b404c8 on IQSS:develop.

@pdurbin pdurbin added the Size: 10 A percentage of a sprint. 7 hours. label Feb 28, 2024
@PaulBoon PaulBoon marked this pull request as ready for review March 12, 2024 13:01
@PaulBoon PaulBoon requested a review from qqmyers March 12, 2024 13:14
@PaulBoon
Copy link
Contributor Author

@sekmiller hope it's OK now.

@sekmiller
Copy link
Contributor

Thanks @PaulBoon. I was away yesterday, but I will make time today to take a look.

@sekmiller
Copy link
Contributor

@PaulBoon One thing that I noticed - if I use the multiple file download button when one of the files I have selected is no longer available due to the retention expiration the warning message refers to "Restricted Files"
Screen Shot 2024-04-16 at 4 07 13 PM

@sekmiller
Copy link
Contributor

Also can you add a release note that briefly describes the feature and notes that the schema.xml must be updated in the upgrade process.

Thanks

@PaulBoon
Copy link
Contributor Author

@PaulBoon One thing that I noticed - if I use the multiple file download button when one of the files I have selected is no longer available due to the retention expiration the warning message refers to "Restricted Files" Screen Shot 2024-04-16 at 4 07 13 PM

@sekmiller The best solution I can think of is to change these messages in a way that they also cover these files with an expired retention. A multiple selection can contain both types and it would be difficult to adapt the message to that dynamically.
In both cases the file is not available, but for a different reason.

@qqmyers
Copy link
Member

qqmyers commented Apr 17, 2024

FWIW: There's a similar issue w.r.t. files that are only available for Globus transfer and I've made changes to these messages and display logic in #10451. I'd suggest adding retention period expiration into those changes messages rather than tweaking the existing ones. (That PR is also in QA - not sure if it can get merged and picked up here in a reasonable time or not.)
If there are better ideas for handling this general issue, just let me know. For example, I debated just saying Inaccessible Files Selected rather than trying to list all the reasons why they might not be accessible but ended up just adding Globus text. Perhaps with retention period as well, the list is long enough to want to simplify?

@qqmyers
Copy link
Member

qqmyers commented Apr 17, 2024

@PaulBoon - FYI: we had a discussion in the daily standup mtg and think the way to go is to get #10451 merged and have you update this PR to include it from develop, and you or I can then make changes. For the changes themselves, I think the idea was to go with the simpler title of "Inaccessible Files Selected" and then include retention period expired as one of the options in the descriptive text.

@PaulBoon
Copy link
Contributor Author

@qqmyers Just added the release notes, and as with most of this PR I reused yours. I will wait for that other PR to be merged while I am on a short vacation (just a week).

@scolapasta scolapasta added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Apr 23, 2024
@scolapasta scolapasta removed the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Apr 29, 2024
@sekmiller
Copy link
Contributor

sekmiller commented Apr 30, 2024

@PaulBoon Hi Paul, Thanks for the updates.
I have a few observations/change requests

  1. For the api success message can we say "File(s) retention information updated" instead of "Files were retentioned"
  2. If I put in a date that isn't a date (February 30) in the api, it doesn't update but there's no error response. (response is just {})
  3. If I'm logged in as a superuser I'm not able to set or update the retention of a published file - which I am able to do via the api - did you consider allowing this via the UI?

@sekmiller
Copy link
Contributor

Screen Shot 2024-04-30 at 3 42 03 PM

On the dataset page file list - It looks like you should add a space or punctuation between the deposit date and "Draft:..."

@PaulBoon
Copy link
Contributor Author

PaulBoon commented May 1, 2024

Screen Shot 2024-04-30 at 3 42 03 PM

On the dataset page file list - It looks like you should add a space or punctuation between the deposit date and "Draft:..."

This is strange, because I don't see that happening. Maybe your CSS is not the same? If I inspect the page I see that this 'Draft: .." string is inside a span with class="retentionright", and that has a margin-left of 10px;.

@PaulBoon
Copy link
Contributor Author

PaulBoon commented May 1, 2024

@PaulBoon Hi Paul, Thanks for the updates. I have a few observations/change requests

1. For the api success message can we say "File(s) retention information updated" instead of "Files were retentioned"

2. If I put in a date that isn't a date (February 30) in the api, it doesn't update but there's no error response. (response is just {})

3. If I'm logged in as a superuser I'm not able to set or update the retention of a published file - which I am able to do via the api - did you consider allowing this via the UI?

@sekmiller About the observations:

  1. I agree that those messages need improving; will make it "File(s) retention period has been set or updated".
  2. Will have to look into that, I did see this kind of behavior with other API endpoints, so maybe it was difficult to handle properly.
  3. This behavior was copied from the embargo implementation, not only to be efficient with the limited development time, but also to keep similar features behave similar.
    If we discover that this happens a lot and we want to have it in the UI, we could handle this in an separate issue/PR.

@PaulBoon
Copy link
Contributor Author

PaulBoon commented May 1, 2024

@sekmiller Just commited additions to the input handling of that set-retention API end point.
It now gives informative responses when:

  1. The JSON is invalid
  2. The dateUnavailable is not specified
  3. The dateUnavailable is not a string
  4. The dateUnavailable has not a valid date format

@sekmiller
Copy link
Contributor

@PaulBoon Thanks for the updates they look good. Your error message for improper json led me to try leaving some data out of a well-formed json object. This is what I found:

If I use the api with a json object that doesn't include "fileIds", I get a success message and a record is written to the retention table, but of course there are no records in the datafile table with that retention_id.

If I leave out "reason" I get an empty response {} and nothing seems to happen to the retention table.

If I leave out "dateUnavailable" I get a proper error message - no change needed here.

@PaulBoon
Copy link
Contributor Author

PaulBoon commented May 1, 2024

At some point the Dataverse APIs should have some formal specification which we can hopefully use to do some automated 'schema' validation against. This sort of 'manually' handling of all possible issues with the input is cumbersome. But until then...

@sekmiller
Copy link
Contributor

Thanks for hanging in there and making the updates. I will merge it shortly.

@sekmiller sekmiller merged commit a329f29 into IQSS:develop May 1, 2024
16 checks passed
@pdurbin pdurbin added this to the 6.3 milestone May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request/Idea: Retention period.
6 participants