-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5028 Add dataset level external tools #6059
Conversation
@@ -2016,6 +2021,7 @@ private String init(boolean initFull) { | |||
|
|||
configureTools = externalToolService.findByType(ExternalTool.Type.CONFIGURE); | |||
exploreTools = externalToolService.findByType(ExternalTool.Type.EXPLORE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findByType isn't filtering by scope yet - should these use findByTypeAndScope with file scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they probably should.
@@ -131,6 +147,7 @@ public static ExternalTool parseAddExternalToolManifest(String manifest) { | |||
String displayName = getRequiredTopLevelField(jsonObject, DISPLAY_NAME); | |||
String description = getRequiredTopLevelField(jsonObject, DESCRIPTION); | |||
String typeUserInput = getRequiredTopLevelField(jsonObject, TYPE); | |||
String scopeUserInput = getRequiredTopLevelField(jsonObject, SCOPE); | |||
String contentType = getOptionalTopLevelField(jsonObject, CONTENT_TYPE); | |||
//Legacy support - assume tool manifests without any mimetype are for tabular data | |||
if(contentType==null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to set a content type on dataset level tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a bug. We don't want dataset tools to have a content type. Good catch. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers eventually we may use this - for example, show the dataset level explore only if the dataset has at least one file of content type X - but for now we're going with simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand @scolapasta's suggestion - we may have a use for the content type, with some dataset-level external tools; but it sounds like it still needs to be nullable in the table; and it still should be null by default.
In other words, if we adopt this scheme, then we definitely DON'T want to set the type to tab-delimited by default.
ReservedWord reservedWord = ReservedWord.fromString(value); | ||
if (reservedWord.equals(ReservedWord.FILE_ID)) { | ||
allRequiredReservedWordsFound = true; | ||
if (scope.equals(Scope.FILE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else Scope.DATASET - just not written yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm not sure. Should dataset tools have any reserved words that are required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without an else statement, allRequiredReservedWordsFound is false and an exception happens so this method doesn't work for scope dataset as is.
w.r.t. reserved words - it would be odd/bad to configure a dataset tool to not get the dataset id (or pid) in the same way that a file tool needs a file id (or pid if that's implemented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry - I see the exception is in the if statement - probably should not define the allRequiredWordsFound outside the if clause then (to avoid confusing people like me :-) )
- ``{apiToken}`` (optional) - The Dataverse API token of the user launching the external tool, if available. | ||
- ``{datasetId}`` (optional) - The ID of the dataset containing the file. | ||
- ``{datasetId}`` (optional) - The ID of the dataset. | ||
- ``{datasetPid}`` (optional) - The Persistent ID (DOI or Handle) of the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this and not file PID too? Since Dataverse is always generating this, I'm not sure it helps anyone to have both id and pid (a tool can get the pid), but it seems like file and dataset should be the same.
Also - right now we check for fileId existing (for file scope) as a required condition. For dataset, and file if it is made parallel, the condition will have to change to be that id or pid is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The focus right now is on dataset level tools but sure, if a file PID exists it would hurt to expose it. Tool makers should just understand that they might not always get a file PID.
I'm not sure if I grok your other comment but thanks for making it. 😄 Someone else will be picking this up while I'm on vacation, I suspect.
@@ -0,0 +1,3 @@ | |||
ALTER TABLE externaltool ADD COLUMN IF NOT EXISTS scope VARCHAR(255); | |||
UPDATE externaltool SET scope = 'FILE'; | |||
ALTER TABLE externaltool ALTER COLUMN scope SET NOT NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should content type now be nullable (in the db and class definition) since it isn't used for scope dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, due to a bug in the pull request, dataset tools get the default content type. Yes, I think this field should be made nullable (and the bug fixed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is the status of this - is this being addressed? I.e., is the bug in ExternalToolServiceBean getting fixed in this PR, and is ContentType being made nullable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments re: minor changes. Overall, it looks OK but incomplete w.r.t. scope dataset. I didn't run it, but it looks like it should not break file scope tools, so it should be OK to merge.
Thanks @sekmiller for picking this up. Question for you, @scolapasta and @pdurbin: Was there any discussion about how/if Dataset explorations should be recorded in the guestbook table? At the file level, I believe the explore tool name is added. Happy to discuss and sorry if I missed some communication or the spot in the code where this was reflected. |
It's a good question - it has not been discussed. It's a challenge because guestbook responses are at the file level. The other thing that we need to think about are restrictions? if one or more files are restricted, do we show this button? |
Thanks @scolapasta. When the Dataset explore is initiated, does the tool download all the files or does it download them as needed (or will this differ from tool to tool)? If it's the former, I'd be OK with guestbook records being created for all files in the dataset when a dataset explore is initiated and reflecting the dataset explore tool's name in the table. If the latter, we can create the records as the tool pulls them. Better ideas or corrections welcome. I'm not super worried about the some-restricted-files use case right now as the tools being focused on at this time are geared towards replication of datasets from journals which are mostly (all?) public. We will definitely need to worry about it in the future for these replication tools, both for restricted files and for files in TRSAs, and also as more tools are added with additional use cases that are not replication focused. Note that we do need to provide the explore option pre-publish, so we'll already be dealing with files that can only be accessed by certain users, not sure if there's something that can be leveraged there. |
If the dataset level tools work as the file level ones, they'll retrieve files as needed and either be restricted to public files, or, if configured to be sent an api key, the ones the user launching them can see. So I think security is always maintained, but the question of whether the button should be shown is separate and depends somewhat on whether/how well Dataverse can determine if the tool can retrieve all files it will need and therefore be able to run. A binary yes if the tool can get all files /no if it can't wouldn't be hard to calculate, but if tools only depend on some files, one would need a way to decide if it can get the 'right' files... |
I think it will depend on the tool, but we should err on the side of a tool pulling what is needed. But as currently designed it does present a challenge. The way file explore currently works (correct me if I'm wrong @sekmiller @qqmyers) is that wen you click the explore button you get the popup, you fill it out and a guestbookresponse / download is created. When the explore tool downloads the file, it sends an extra parameter to not write another guestbook entry (in order to not double count). I think once we have the concept of guestbook entries existing independent of downloads, it should be easier, because then download record wouldn't be created until the tool actually fetches a file (and attaches to the existing download). But that doesn't seem in scope for this issue. Re: restricted, my question is really about when to show the button. Draft shouldn't be an issue, because if you can see the draft you can download all the files. We could go with something simple like if all files are available to you, show the button, or if at least one file is available, show the button. I lean towards the former for now, as that would still work for all cases were the files are all public and seems "safer". (not from a security perspective, but from a "will the user be confused" perspective). |
@qqmyers yes, I think we're saying the same thing (our comments crossed paths). |
Thanks @qqmyers and @scolapasta. If I'm understanding correctly, the only truly scalable solution here seems to be to always show the button and rely on the external tool to provide the messaging about why it can't work with the files in the dataset, instead of trying to add logic to the button display. |
@landreev woof, thanks for the heads up. I guess I'll keep and eye on that other pull request. I assume it'll get merged first. |
For the end user "what tools can operate on this file or dataset?" endpoint, I still don't see the utility of it from the API. What is returned is NOT which tools can operate, but which tool will show up in the UI. You can always use the tools on the file (or dataset) if you are able to construct the URL, this is just constructing the URL so it can display the link on the UI. In other words this is an API for a UI only (by design) feature. The only argument I see for keeping this API is for testing (code coverage) which is valid. In that case it should moved, ideally, to a new API prefix ("test") We will likely be getting more of these test only APIs as we focus on Integration tests. I also don't think these therefore need to be documented in a user or admin facing guide as they would not be called by anyone but the integrations tests and could be changed / removed at any moment (if the underlying code changes). |
@scolapasta instead of a brand new API namespace called "test", can we use "admin/test"? That way, we don't have to rewrite the "Securing Your Installation" part of the Installation Guide which only mentions blocking "admin" and "builtin-users": Also, I can imagine UIs other than JSF. Maybe it's React or Vue or Angular and calling into Dataverse APIs to construct the GUI, including dropdowns of "Explore" buttons. But this is a conversation for another day, maybe a future tech hours! 😄 |
While I think it would be preferable to having a dedicated "test" namespace (and don't see why adding a few words in documentation would be challenging (and in release notes, of course). I'm ok with "admin/test". The idea that these APIs are (at least currently) only there for the automated testing is the main point. |
Cool. I'll move my test method there. I'm with you for a future effort to address #4137 ("Move testing calls out of Admin API") but I'm happy to not have to worry about that effort right now. |
@scolapasta ok, fixed in 06105e8 I'm done bashing on this branch (in time for the all day Open Source Software Health workshop tomorrow) so please send this off to QA when you get a chance. @kcondon thanks for previewing the improved docs. That ended up being 4897de4 |
For #5028