Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5028 Add dataset level external tools #6059

Merged
merged 30 commits into from
Sep 13, 2019
Merged

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Jul 26, 2019

For #5028

@pdurbin pdurbin mentioned this pull request Jul 26, 2019
@coveralls
Copy link

coveralls commented Jul 26, 2019

Coverage Status

Coverage increased (+0.04%) to 19.513% when pulling 6dd0cfc on 5028-dataset-explore-btn into 9f41601 on develop.

@pdurbin pdurbin requested review from qqmyers and lubitchv July 26, 2019 16:38
@@ -2016,6 +2021,7 @@ private String init(boolean initFull) {

configureTools = externalToolService.findByType(ExternalTool.Type.CONFIGURE);
exploreTools = externalToolService.findByType(ExternalTool.Type.EXPLORE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findByType isn't filtering by scope yet - should these use findByTypeAndScope with file scope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they probably should.

@@ -131,6 +147,7 @@ public static ExternalTool parseAddExternalToolManifest(String manifest) {
String displayName = getRequiredTopLevelField(jsonObject, DISPLAY_NAME);
String description = getRequiredTopLevelField(jsonObject, DESCRIPTION);
String typeUserInput = getRequiredTopLevelField(jsonObject, TYPE);
String scopeUserInput = getRequiredTopLevelField(jsonObject, SCOPE);
String contentType = getOptionalTopLevelField(jsonObject, CONTENT_TYPE);
//Legacy support - assume tool manifests without any mimetype are for tabular data
if(contentType==null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to set a content type on dataset level tools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a bug. We don't want dataset tools to have a content type. Good catch. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers eventually we may use this - for example, show the dataset level explore only if the dataset has at least one file of content type X - but for now we're going with simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand @scolapasta's suggestion - we may have a use for the content type, with some dataset-level external tools; but it sounds like it still needs to be nullable in the table; and it still should be null by default.
In other words, if we adopt this scheme, then we definitely DON'T want to set the type to tab-delimited by default.

ReservedWord reservedWord = ReservedWord.fromString(value);
if (reservedWord.equals(ReservedWord.FILE_ID)) {
allRequiredReservedWordsFound = true;
if (scope.equals(Scope.FILE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else Scope.DATASET - just not written yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure. Should dataset tools have any reserved words that are required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without an else statement, allRequiredReservedWordsFound is false and an exception happens so this method doesn't work for scope dataset as is.
w.r.t. reserved words - it would be odd/bad to configure a dataset tool to not get the dataset id (or pid) in the same way that a file tool needs a file id (or pid if that's implemented).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry - I see the exception is in the if statement - probably should not define the allRequiredWordsFound outside the if clause then (to avoid confusing people like me :-) )

- ``{apiToken}`` (optional) - The Dataverse API token of the user launching the external tool, if available.
- ``{datasetId}`` (optional) - The ID of the dataset containing the file.
- ``{datasetId}`` (optional) - The ID of the dataset.
- ``{datasetPid}`` (optional) - The Persistent ID (DOI or Handle) of the dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this and not file PID too? Since Dataverse is always generating this, I'm not sure it helps anyone to have both id and pid (a tool can get the pid), but it seems like file and dataset should be the same.
Also - right now we check for fileId existing (for file scope) as a required condition. For dataset, and file if it is made parallel, the condition will have to change to be that id or pid is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The focus right now is on dataset level tools but sure, if a file PID exists it would hurt to expose it. Tool makers should just understand that they might not always get a file PID.

I'm not sure if I grok your other comment but thanks for making it. 😄 Someone else will be picking this up while I'm on vacation, I suspect.

@@ -0,0 +1,3 @@
ALTER TABLE externaltool ADD COLUMN IF NOT EXISTS scope VARCHAR(255);
UPDATE externaltool SET scope = 'FILE';
ALTER TABLE externaltool ALTER COLUMN scope SET NOT NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should content type now be nullable (in the db and class definition) since it isn't used for scope dataset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, due to a bug in the pull request, dataset tools get the default content type. Yes, I think this field should be made nullable (and the bug fixed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is the status of this - is this being addressed? I.e., is the bug in ExternalToolServiceBean getting fixed in this PR, and is ContentType being made nullable?

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments re: minor changes. Overall, it looks OK but incomplete w.r.t. scope dataset. I didn't run it, but it looks like it should not break file scope tools, so it should be OK to merge.

@djbrooke
Copy link
Contributor

Thanks @sekmiller for picking this up. Question for you, @scolapasta and @pdurbin:

Was there any discussion about how/if Dataset explorations should be recorded in the guestbook table? At the file level, I believe the explore tool name is added. Happy to discuss and sorry if I missed some communication or the spot in the code where this was reflected.

@scolapasta
Copy link
Contributor

It's a good question - it has not been discussed. It's a challenge because guestbook responses are at the file level.

The other thing that we need to think about are restrictions? if one or more files are restricted, do we show this button?

@djbrooke
Copy link
Contributor

Thanks @scolapasta. When the Dataset explore is initiated, does the tool download all the files or does it download them as needed (or will this differ from tool to tool)? If it's the former, I'd be OK with guestbook records being created for all files in the dataset when a dataset explore is initiated and reflecting the dataset explore tool's name in the table. If the latter, we can create the records as the tool pulls them. Better ideas or corrections welcome.

I'm not super worried about the some-restricted-files use case right now as the tools being focused on at this time are geared towards replication of datasets from journals which are mostly (all?) public. We will definitely need to worry about it in the future for these replication tools, both for restricted files and for files in TRSAs, and also as more tools are added with additional use cases that are not replication focused.

Note that we do need to provide the explore option pre-publish, so we'll already be dealing with files that can only be accessed by certain users, not sure if there's something that can be leveraged there.

@qqmyers
Copy link
Member

qqmyers commented Jul 31, 2019

If the dataset level tools work as the file level ones, they'll retrieve files as needed and either be restricted to public files, or, if configured to be sent an api key, the ones the user launching them can see. So I think security is always maintained, but the question of whether the button should be shown is separate and depends somewhat on whether/how well Dataverse can determine if the tool can retrieve all files it will need and therefore be able to run. A binary yes if the tool can get all files /no if it can't wouldn't be hard to calculate, but if tools only depend on some files, one would need a way to decide if it can get the 'right' files...

@scolapasta
Copy link
Contributor

I think it will depend on the tool, but we should err on the side of a tool pulling what is needed. But as currently designed it does present a challenge.

The way file explore currently works (correct me if I'm wrong @sekmiller @qqmyers) is that wen you click the explore button you get the popup, you fill it out and a guestbookresponse / download is created. When the explore tool downloads the file, it sends an extra parameter to not write another guestbook entry (in order to not double count).

I think once we have the concept of guestbook entries existing independent of downloads, it should be easier, because then download record wouldn't be created until the tool actually fetches a file (and attaches to the existing download). But that doesn't seem in scope for this issue.

Re: restricted, my question is really about when to show the button. Draft shouldn't be an issue, because if you can see the draft you can download all the files. We could go with something simple like if all files are available to you, show the button, or if at least one file is available, show the button. I lean towards the former for now, as that would still work for all cases were the files are all public and seems "safer". (not from a security perspective, but from a "will the user be confused" perspective).

@scolapasta
Copy link
Contributor

@qqmyers yes, I think we're saying the same thing (our comments crossed paths).

@djbrooke
Copy link
Contributor

Thanks @qqmyers and @scolapasta. If I'm understanding correctly, the only truly scalable solution here seems to be to always show the button and rely on the external tool to provide the messaging about why it can't work with the files in the dataset, instead of trying to add logic to the button display.

@djbrooke djbrooke changed the title Add dataset level external tools 5028 Add dataset level external tools Aug 5, 2019
@pdurbin
Copy link
Member Author

pdurbin commented Sep 11, 2019

@landreev woof, thanks for the heads up. I guess I'll keep and eye on that other pull request. I assume it'll get merged first.

@scolapasta
Copy link
Contributor

For the end user "what tools can operate on this file or dataset?" endpoint, I still don't see the utility of it from the API. What is returned is NOT which tools can operate, but which tool will show up in the UI. You can always use the tools on the file (or dataset) if you are able to construct the URL, this is just constructing the URL so it can display the link on the UI.

In other words this is an API for a UI only (by design) feature.

The only argument I see for keeping this API is for testing (code coverage) which is valid. In that case it should moved, ideally, to a new API prefix ("test") We will likely be getting more of these test only APIs as we focus on Integration tests. I also don't think these therefore need to be documented in a user or admin facing guide as they would not be called by anyone but the integrations tests and could be changed / removed at any moment (if the underlying code changes).

@pdurbin
Copy link
Member Author

pdurbin commented Sep 11, 2019

@scolapasta instead of a brand new API namespace called "test", can we use "admin/test"? That way, we don't have to rewrite the "Securing Your Installation" part of the Installation Guide which only mentions blocking "admin" and "builtin-users":

Screen Shot 2019-09-11 at 5 06 53 PM

Also, I can imagine UIs other than JSF. Maybe it's React or Vue or Angular and calling into Dataverse APIs to construct the GUI, including dropdowns of "Explore" buttons. But this is a conversation for another day, maybe a future tech hours! 😄

@scolapasta
Copy link
Contributor

While I think it would be preferable to having a dedicated "test" namespace (and don't see why adding a few words in documentation would be challenging (and in release notes, of course). I'm ok with "admin/test". The idea that these APIs are (at least currently) only there for the automated testing is the main point.

@scolapasta scolapasta removed their assignment Sep 12, 2019
@pdurbin
Copy link
Member Author

pdurbin commented Sep 12, 2019

I'm ok with "admin/test".

Cool. I'll move my test method there. I'm with you for a future effort to address #4137 ("Move testing calls out of Admin API") but I'm happy to not have to worry about that effort right now.

@pdurbin
Copy link
Member Author

pdurbin commented Sep 12, 2019

@scolapasta ok, fixed in 06105e8

I'm done bashing on this branch (in time for the all day Open Source Software Health workshop tomorrow) so please send this off to QA when you get a chance.

@kcondon thanks for previewing the improved docs. That ended up being 4897de4

@pdurbin pdurbin removed their assignment Sep 12, 2019
@scolapasta scolapasta removed their assignment Sep 13, 2019
@pdurbin
Copy link
Member Author

pdurbin commented Sep 13, 2019

@kcondon unfortunately PDF versions of the guide are failing in this branch as well as "develop" so I just opened #6168.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants