5028 Add dataset level external tools #6059

pdurbin · 2019-07-26T16:30:04Z

… [ref #5028]

coveralls · 2019-07-26T16:38:42Z

Coverage increased (+0.04%) to 19.513% when pulling 6dd0cfc on 5028-dataset-explore-btn into 9f41601 on develop.

qqmyers · 2019-07-26T16:53:04Z

src/main/java/edu/harvard/iq/dataverse/DatasetPage.java

@@ -2016,6 +2021,7 @@ private String init(boolean initFull) {

        configureTools = externalToolService.findByType(ExternalTool.Type.CONFIGURE);
        exploreTools = externalToolService.findByType(ExternalTool.Type.EXPLORE);


findByType isn't filtering by scope yet - should these use findByTypeAndScope with file scope?

Yes, they probably should.

qqmyers · 2019-07-26T16:55:32Z

src/main/java/edu/harvard/iq/dataverse/externaltools/ExternalToolServiceBean.java

@@ -131,6 +147,7 @@ public static ExternalTool parseAddExternalToolManifest(String manifest) {
        String displayName = getRequiredTopLevelField(jsonObject, DISPLAY_NAME);
        String description = getRequiredTopLevelField(jsonObject, DESCRIPTION);
        String typeUserInput = getRequiredTopLevelField(jsonObject, TYPE);
+        String scopeUserInput = getRequiredTopLevelField(jsonObject, SCOPE);
        String contentType = getOptionalTopLevelField(jsonObject, CONTENT_TYPE);
        //Legacy support - assume tool manifests without any mimetype are for tabular data
        if(contentType==null) {


This is going to set a content type on dataset level tools

Yes, this is a bug. We don't want dataset tools to have a content type. Good catch. Thanks.

@qqmyers eventually we may use this - for example, show the dataset level explore only if the dataset has at least one file of content type X - but for now we're going with simple.

If I understand @scolapasta's suggestion - we may have a use for the content type, with some dataset-level external tools; but it sounds like it still needs to be nullable in the table; and it still should be null by default.
In other words, if we adopt this scheme, then we definitely DON'T want to set the type to tab-delimited by default.

qqmyers · 2019-07-26T16:56:07Z

src/main/java/edu/harvard/iq/dataverse/externaltools/ExternalToolServiceBean.java

-                ReservedWord reservedWord = ReservedWord.fromString(value);
-                if (reservedWord.equals(ReservedWord.FILE_ID)) {
-                    allRequiredReservedWordsFound = true;
+        if (scope.equals(Scope.FILE)) {


else Scope.DATASET - just not written yet?

Well, I'm not sure. Should dataset tools have any reserved words that are required?

without an else statement, allRequiredReservedWordsFound is false and an exception happens so this method doesn't work for scope dataset as is.
w.r.t. reserved words - it would be odd/bad to configure a dataset tool to not get the dataset id (or pid) in the same way that a file tool needs a file id (or pid if that's implemented).

ah sorry - I see the exception is in the if statement - probably should not define the allRequiredWordsFound outside the if clause then (to avoid confusing people like me :-) )

qqmyers · 2019-07-26T17:02:50Z

doc/sphinx-guides/source/installation/external-tools.rst

 - ``{apiToken}`` (optional) - The Dataverse API token of the user launching the external tool, if available.
- ``{datasetId}`` (optional) - The ID of the dataset containing the file.
+- ``{datasetId}`` (optional) - The ID of the dataset.
+- ``{datasetPid}`` (optional) - The Persistent ID (DOI or Handle) of the dataset.


Why this and not file PID too? Since Dataverse is always generating this, I'm not sure it helps anyone to have both id and pid (a tool can get the pid), but it seems like file and dataset should be the same.
Also - right now we check for fileId existing (for file scope) as a required condition. For dataset, and file if it is made parallel, the condition will have to change to be that id or pid is required.

Good point. The focus right now is on dataset level tools but sure, if a file PID exists it would hurt to expose it. Tool makers should just understand that they might not always get a file PID.

I'm not sure if I grok your other comment but thanks for making it. 😄 Someone else will be picking this up while I'm on vacation, I suspect.

qqmyers · 2019-07-26T17:06:25Z

src/main/resources/db/migration/V4.15.1.1__5028-dataset-explore.sql

@@ -0,0 +1,3 @@
+ALTER TABLE externaltool ADD COLUMN IF NOT EXISTS scope VARCHAR(255);
+UPDATE externaltool SET scope = 'FILE';
+ALTER TABLE externaltool ALTER COLUMN scope SET NOT NULL;


should content type now be nullable (in the db and class definition) since it isn't used for scope dataset?

As mentioned above, due to a bug in the pull request, dataset tools get the default content type. Yes, I think this field should be made nullable (and the bug fixed).

So what is the status of this - is this being addressed? I.e., is the bug in ExternalToolServiceBean getting fixed in this PR, and is ContentType being made nullable?

qqmyers

Made some comments re: minor changes. Overall, it looks OK but incomplete w.r.t. scope dataset. I didn't run it, but it looks like it should not break file scope tools, so it should be OK to merge.

djbrooke · 2019-07-31T16:21:08Z

Thanks @sekmiller for picking this up. Question for you, @scolapasta and @pdurbin:

Was there any discussion about how/if Dataset explorations should be recorded in the guestbook table? At the file level, I believe the explore tool name is added. Happy to discuss and sorry if I missed some communication or the spot in the code where this was reflected.

scolapasta · 2019-07-31T16:26:45Z

It's a good question - it has not been discussed. It's a challenge because guestbook responses are at the file level.

The other thing that we need to think about are restrictions? if one or more files are restricted, do we show this button?

djbrooke · 2019-07-31T16:49:28Z

Thanks @scolapasta. When the Dataset explore is initiated, does the tool download all the files or does it download them as needed (or will this differ from tool to tool)? If it's the former, I'd be OK with guestbook records being created for all files in the dataset when a dataset explore is initiated and reflecting the dataset explore tool's name in the table. If the latter, we can create the records as the tool pulls them. Better ideas or corrections welcome.

I'm not super worried about the some-restricted-files use case right now as the tools being focused on at this time are geared towards replication of datasets from journals which are mostly (all?) public. We will definitely need to worry about it in the future for these replication tools, both for restricted files and for files in TRSAs, and also as more tools are added with additional use cases that are not replication focused.

Note that we do need to provide the explore option pre-publish, so we'll already be dealing with files that can only be accessed by certain users, not sure if there's something that can be leveraged there.

qqmyers · 2019-07-31T16:55:48Z

If the dataset level tools work as the file level ones, they'll retrieve files as needed and either be restricted to public files, or, if configured to be sent an api key, the ones the user launching them can see. So I think security is always maintained, but the question of whether the button should be shown is separate and depends somewhat on whether/how well Dataverse can determine if the tool can retrieve all files it will need and therefore be able to run. A binary yes if the tool can get all files /no if it can't wouldn't be hard to calculate, but if tools only depend on some files, one would need a way to decide if it can get the 'right' files...

scolapasta · 2019-07-31T17:00:20Z

I think it will depend on the tool, but we should err on the side of a tool pulling what is needed. But as currently designed it does present a challenge.

The way file explore currently works (correct me if I'm wrong @sekmiller @qqmyers) is that wen you click the explore button you get the popup, you fill it out and a guestbookresponse / download is created. When the explore tool downloads the file, it sends an extra parameter to not write another guestbook entry (in order to not double count).

I think once we have the concept of guestbook entries existing independent of downloads, it should be easier, because then download record wouldn't be created until the tool actually fetches a file (and attaches to the existing download). But that doesn't seem in scope for this issue.

Re: restricted, my question is really about when to show the button. Draft shouldn't be an issue, because if you can see the draft you can download all the files. We could go with something simple like if all files are available to you, show the button, or if at least one file is available, show the button. I lean towards the former for now, as that would still work for all cases were the files are all public and seems "safer". (not from a security perspective, but from a "will the user be confused" perspective).

scolapasta · 2019-07-31T17:00:45Z

@qqmyers yes, I think we're saying the same thing (our comments crossed paths).

djbrooke · 2019-07-31T17:17:19Z

Thanks @qqmyers and @scolapasta. If I'm understanding correctly, the only truly scalable solution here seems to be to always show the button and rely on the external tool to provide the messaging about why it can't work with the files in the dataset, instead of trying to add logic to the button display.

pdurbin · 2019-09-11T01:49:26Z

@landreev woof, thanks for the heads up. I guess I'll keep and eye on that other pull request. I assume it'll get merged first.

scolapasta · 2019-09-11T20:59:36Z

For the end user "what tools can operate on this file or dataset?" endpoint, I still don't see the utility of it from the API. What is returned is NOT which tools can operate, but which tool will show up in the UI. You can always use the tools on the file (or dataset) if you are able to construct the URL, this is just constructing the URL so it can display the link on the UI.

In other words this is an API for a UI only (by design) feature.

The only argument I see for keeping this API is for testing (code coverage) which is valid. In that case it should moved, ideally, to a new API prefix ("test") We will likely be getting more of these test only APIs as we focus on Integration tests. I also don't think these therefore need to be documented in a user or admin facing guide as they would not be called by anyone but the integrations tests and could be changed / removed at any moment (if the underlying code changes).

pdurbin · 2019-09-11T21:14:24Z

@scolapasta instead of a brand new API namespace called "test", can we use "admin/test"? That way, we don't have to rewrite the "Securing Your Installation" part of the Installation Guide which only mentions blocking "admin" and "builtin-users":

Also, I can imagine UIs other than JSF. Maybe it's React or Vue or Angular and calling into Dataverse APIs to construct the GUI, including dropdowns of "Explore" buttons. But this is a conversation for another day, maybe a future tech hours! 😄

scolapasta · 2019-09-12T15:14:47Z

While I think it would be preferable to having a dedicated "test" namespace (and don't see why adding a few words in documentation would be challenging (and in release notes, of course). I'm ok with "admin/test". The idea that these APIs are (at least currently) only there for the automated testing is the main point.

pdurbin · 2019-09-12T15:46:20Z

I'm ok with "admin/test".

Cool. I'll move my test method there. I'm with you for a future effort to address #4137 ("Move testing calls out of Admin API") but I'm happy to not have to worry about that effort right now.

pdurbin · 2019-09-12T22:33:10Z

@scolapasta ok, fixed in 06105e8

I'm done bashing on this branch (in time for the all day Open Source Software Health workshop tomorrow) so please send this off to QA when you get a chance.

@kcondon thanks for previewing the improved docs. That ended up being 4897de4

pdurbin · 2019-09-13T18:12:35Z

@kcondon unfortunately PDF versions of the guide are failing in this branch as well as "develop" so I just opened #6168.

mheppler and others added 2 commits July 25, 2019 12:33

Added placeholder Explore btn to dataset pg to be wired up to backend…

f699fd6

… [ref #5028]

add dataset level external tools #5028

48d1181

pdurbin mentioned this pull request Jul 26, 2019

Dataset Explore #5028

Closed

pdurbin requested review from qqmyers and lubitchv July 26, 2019 16:38

qqmyers reviewed Jul 26, 2019

View reviewed changes

pdurbin mentioned this pull request Jul 26, 2019

Dataverse repoprovider and URLs jupyterhub/binderhub#900

Closed

qqmyers reviewed Jul 26, 2019

View reviewed changes

qqmyers requested changes Jul 26, 2019

View reviewed changes

return scope in listing, GET tool by id #5028

0390e3c

djbrooke assigned sekmiller Jul 31, 2019

adding release notes for dataset explore

6fe2996

sekmiller added 2 commits July 31, 2019 15:53

Merge branch 'develop' into 5028-dataset-explore-btn

e9dad22

Merge branch 'develop' into 5028-dataset-explore-btn

d8f7a96

djbrooke changed the title ~~Add dataset level external tools~~ 5028 Add dataset level external tools Aug 5, 2019

scolapasta removed their assignment Sep 12, 2019

pdurbin added 7 commits September 12, 2019 17:16

add lots more content for external tool makers #5028

4897de4

ignore python virutal environments (venv)

7828dea

Merge branch 'develop' into 5028-dataset-explore-btn #5028

760d670

rename flyway script # 5028

6dd0cfc

get deployment working again for new installations #6165

519662a

Merge branch '6165-cannot-deploy' into 5028-dataset-explore-btn #5028

09766f1

move testing methods to /api/admin/test #5028 #4137

06105e8

pdurbin removed their assignment Sep 12, 2019

pdurbin assigned scolapasta Sep 12, 2019

scolapasta removed their assignment Sep 13, 2019

kcondon merged commit 622f320 into develop Sep 13, 2019

kcondon deleted the 5028-dataset-explore-btn branch September 13, 2019 18:32

kcondon self-assigned this Sep 16, 2019

pdurbin mentioned this pull request Sep 17, 2019

Can't add external tools now that https://github.com/IQSS/dataverse/pull/6059 has been merged IQSS/dataverse-ansible#103

Closed

djbrooke added this to the 4.17 milestone Sep 18, 2019

pdurbin mentioned this pull request Sep 20, 2019

Cannot add TwoRavens as an external tool after Dataverse 4.16 (scope is required) #6203

Closed

pdurbin mentioned this pull request Sep 30, 2019

"Explore" and "Configure" buttons do not appear for file level tools with no contentType (Data Explorer, Data Curation Tool, etc.) on develop, works on 4.16 #6229

Closed

This was referenced Dec 13, 2019

Unable to add dataset level external tool due to non null constraint for contenttype #6459

Closed

contenttype can be null because dataset tools do not require it #6459 #6460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5028 Add dataset level external tools #6059

5028 Add dataset level external tools #6059

pdurbin commented Jul 26, 2019

coveralls commented Jul 26, 2019 •

edited

Loading

qqmyers Jul 26, 2019

pdurbin Jul 26, 2019

qqmyers Jul 26, 2019

pdurbin Jul 26, 2019

scolapasta Jul 26, 2019

landreev Aug 7, 2019

qqmyers Jul 26, 2019

pdurbin Jul 26, 2019

qqmyers Jul 26, 2019

qqmyers Jul 26, 2019

qqmyers Jul 26, 2019

pdurbin Jul 26, 2019

qqmyers Jul 26, 2019

pdurbin Jul 26, 2019

landreev Aug 7, 2019

qqmyers left a comment

djbrooke commented Jul 31, 2019

scolapasta commented Jul 31, 2019

djbrooke commented Jul 31, 2019

qqmyers commented Jul 31, 2019

scolapasta commented Jul 31, 2019

scolapasta commented Jul 31, 2019

djbrooke commented Jul 31, 2019

pdurbin commented Sep 11, 2019

scolapasta commented Sep 11, 2019

pdurbin commented Sep 11, 2019

scolapasta commented Sep 12, 2019

pdurbin commented Sep 12, 2019

pdurbin commented Sep 12, 2019

pdurbin commented Sep 13, 2019

		@@ -2016,6 +2021,7 @@ private String init(boolean initFull) {

		configureTools = externalToolService.findByType(ExternalTool.Type.CONFIGURE);
		exploreTools = externalToolService.findByType(ExternalTool.Type.EXPLORE);

5028 Add dataset level external tools #6059

5028 Add dataset level external tools #6059

Conversation

pdurbin commented Jul 26, 2019

coveralls commented Jul 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qqmyers left a comment

Choose a reason for hiding this comment

djbrooke commented Jul 31, 2019

scolapasta commented Jul 31, 2019

djbrooke commented Jul 31, 2019

qqmyers commented Jul 31, 2019

scolapasta commented Jul 31, 2019

scolapasta commented Jul 31, 2019

djbrooke commented Jul 31, 2019

pdurbin commented Sep 11, 2019

scolapasta commented Sep 11, 2019

pdurbin commented Sep 11, 2019

scolapasta commented Sep 12, 2019

pdurbin commented Sep 12, 2019

pdurbin commented Sep 12, 2019

pdurbin commented Sep 13, 2019

coveralls commented Jul 26, 2019 •

edited

Loading