-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914
base: master
Are you sure you want to change the base?
Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914
Conversation
In bidscoin, I have added extra lines of code to remove those bidsignore files from the scans.tsv file because the bids-validator was complaining about them. Since the specs don't mention anything about this (at least not that I'm aware of), I simply took the validator output as the definition of the standard. So does this mean that I should not have added those extra lines of code to bidscoin and should just ignore the validator here? |
@@ -330,6 +330,8 @@ async function getBIDSIgnore(dir) { | |||
if (bidsIgnoreFileObj) { | |||
const content = await readFile(bidsIgnoreFileObj) | |||
ig.add(content) | |||
// Store the .bidsignore content in session storage | |||
sessionStorage.setItem('bidsignoreContent', JSON.stringify(content)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing an error.
Unhandled rejection (
reason: ReferenceError: sessionStorage is not defined
at getBIDSIgnore (/home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/utils/files/readDir.js:334:5)
at Object.readDir (/home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/utils/files/readDir.js:23:14)
at /home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/validators/bids/start.js:40:21
).
@marcelzwiers I would call this undefined behavior. The spec didn't say what to do, and bidsignore is not a spec-level concept, so the interaction turned out to be a problem. As a practical matter, I think following the validator when it exceeds the specification is a good idea unless your goal is to force the issue and aim your users to complain to the validator issue tracker. I wouldn't consider that a bug, but you could stop removing the lines, once this or a similar solution is released. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master bids-standard/legacy-validator#1914 +/- ##
==========================================
+ Coverage 83.57% 85.52% +1.94%
==========================================
Files 92 132 +40
Lines 3890 6285 +2395
Branches 1271 1549 +278
==========================================
+ Hits 3251 5375 +2124
- Misses 541 806 +265
- Partials 98 104 +6 ☔ View full report in Codecov by Sentry. |
I'm not sure if global storage is the right tool for this job. @rwblair @nellh Your insight would be appreciated here. I don't think I'm qualified to review this PR. |
I did have my doubts about using sessionStorage for solving this problem, and it was infact a fallback approach to get the web app working. I believe the JS web app and the Python package might be using partly the same codebase (?), and if so this will not suffice. My first approach was to re-open the Looking forward to some feedback (@rwblair @nellh ) and suggestions on what would be a useful way to tackle this. Happy to try out a different approach. |
Please ignore the Python package. It is only useful for packaging regular expressions for validating filenames. It will be replaced entirely in the future. |
Just trying to debug the remaining errors with the current approach while waiting for further feedback. Locally, I am testing the following (based on instructions here) before pushing the changes:
But the errors that have been reported above (through the workflows) are never caught/tested via the above. Are there additional instructions for running the entire testing pipeline locally? I have manually ran the following to ensure previous workflow errors do not arise again:
5 workflows await approval from maintainers to check if all is clear this time around. |
if (ig && ig.ignores(path.relative('/', scanRelativePath))) { | ||
continue | ||
} | ||
|
||
// check if scan matches full dataset path list | ||
if (!pathList.includes(scanFullPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I get what you're doing here. Thanks for this!
The issue here is that we want to error if a listed file doesn't exist, but we're currently erroring if the file doesn't exist or exists but is ignored. If you're going to follow the strategy you're using here, instead of storing the ignore patterns, you should be storing the ignored files. We can then say
if (!(pathList.includes(scanFullPath) || ignoreList.includes(scanFullPath))) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @effigies
That should also work, just that we would need to store much more data in the session storage (storing ignore patterns vs complete list of ignored files across all subjects and sessions). I'm also curious why the current approach of storing patterns would not suffice; maybe I didn't follow your comment entirely.
As per #1366 , the intention was to allow .bidsignore
files to be listed in _scans.tsv
.
The plan then was to:
doing a match of scans.tsv files against ignore entries
The PR, in its current form, was intending to just add that one extra check, i.e. for each file listed in _scans.tsv
, check if the file is to be ignored (based on .bidsignore
), if yes then ignore that file entry with no error. This is being done irrespective of whether the file actually exists in the directory or not.
I added two tests for checking the following cases:
- file is listed in _scans.tsv but isn't part of output of
readDir()
-> ERROR 129 - file is listed in _scans.tsv; isn't part of output of
readDir()
, but is to be ignored -> no error
and both these tests are passing.
Note that there is a difference between the file actually existing (on disk) vs being present in the output of readDir()
as the latter function already filters out the files to be ignored, before returning the list of required files (i.e. existing and non-ignored) to other steps in the workflow (e.g. TSV() in tsv.js)
In addition to the discussions above about the nature of the changes, I think the PR is passing most of the checks. 3 of them that it is failing, is found to have failed with similar status in a recent merged PR as well, so not sure if it's something specific to this PR that I should look into. I have made a small change to address the 4th failed test which dealt with code coverage; the current change should marginally improve that. |
Could we run the remaining 5 workflows to see if they pass?
Also, happy to receive any feedback regarding the changes here. |
As mentioned above, I believe it is now passing all the checks that a previous merged PR conformed to. So not sure if the 3 that failed, and 1 skipped, are something that I should look into. |
I was wondering if we had any feedback on this PR. |
I still think we want the following cases:
It seems like a regression to me to stop erroring when a user mentions a file that does not exist. The goal is to I spent a little time trying to figure out how to pass ignored files around. I got this far: ❯ git --no-pager diff
diff --git a/bids-validator/validators/bids/fullTest.js b/bids-validator/validators/bids/fullTest.js
index edf0b405..efcd6204 100644
--- a/bids-validator/validators/bids/fullTest.js
+++ b/bids-validator/validators/bids/fullTest.js
@@ -76,8 +76,10 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
}
// remove ignored files from list:
+ const ignoredFiles = {}
Object.keys(fileList).forEach(function (key) {
if (fileList[key].ignore) {
+ ignoredFiles[key] = fileList[key]
delete fileList[key]
}
})
@@ -123,6 +125,7 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
participants,
phenotypeParticipants,
stimuli,
+ ignoredFiles,
)
})
.then(({ tsvIssues, participantsTsvContent }) => {
diff --git a/bids-validator/validators/tsv/tsv.js b/bids-validator/validators/tsv/tsv.js
index 0b76e6cd..153b62b7 100644
--- a/bids-validator/validators/tsv/tsv.js
+++ b/bids-validator/validators/tsv/tsv.js
@@ -36,7 +36,7 @@ const filenameEvidence = (filename) => `Filename: ${filename}`
* specification.
*/
-const TSV = (file, contents, fileList, callback) => {
+const TSV = (file, contents, fileList, ignoredFiles, callback) => {
const issues = []
const stimPaths = []
if (contents.includes('\r') && !contents.includes('\n')) {
@@ -628,7 +628,7 @@ const TSV = (file, contents, fileList, callback) => {
const scanFullPath = scanDirPath + '/' + scanRelativePath
// check if file should be ignored based on .bidsignore content
- if (ig && ig.ignores(path.relative('/', scanRelativePath))) {
+ if (ignoredFiles.includes(scanFullPath)) {
continue
}
diff --git a/bids-validator/validators/tsv/validate.js b/bids-validator/validators/tsv/validate.js
index dba18e7a..dfcd725c 100644
--- a/bids-validator/validators/tsv/validate.js
+++ b/bids-validator/validators/tsv/validate.js
@@ -9,6 +9,7 @@ const validate = (
participants,
phenotypeParticipants,
stimuli,
+ ignoredFiles,
annexed,
dir,
) => {
@@ -34,6 +35,7 @@ const validate = (
file,
contents,
fileList,
+ ignoredFiles,
function (tsvIssues, participantList, stimFiles) {
if (participantList) {
if (file.name.endsWith('participants.tsv')) { Something's not working right, but I don't understand the validator enough to know what. Perhaps this will help you, though? |
PR for handling bids-standard/bids-validator#48
Makes use of session storage to save the contents of .bidsignore when it is read originally. This is retrieved later while evaluating _scans.tsv files for ignoring the matching files. All tests with
npm run test
are passing (except two which were failing even on the unchanged master branch), and two new tests have been added.For testing purposes, a mock session storage was required to be implemented as Jest is currently configured with
"testEnvironment": "node"
. Could possibly be avoided if set to"testEnvironment": "jsdom"
. Another option is to employjest-localstorage-mock
package. But the current mock implementation seemed the simplest, and so went with it.Two tests have been added:
should ignore files in scans.tsv that correspond to entries in .bidsignore
This checks that files to be ignored, based on .bidsignore, are being ignored when evaluating the file list in _scans.tsv
should not allow missing files listed in scans.tsv and not accounted for by .bidsignore
This catches files that are listed in _scans.tsv but are actually missing, and not accounted for by .bidsignore
First PR here; apologies if I have missed anything.
@effigies: Could you take a look?