Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: Index folder from outside #459

Closed
oesteban opened this issue Jul 10, 2019 · 6 comments
Closed

New feature: Index folder from outside #459

oesteban opened this issue Jul 10, 2019 · 6 comments

Comments

@oesteban
Copy link
Collaborator

oesteban commented Jul 10, 2019

Gauging whether it would be interesting to allow softlinks to top-level subfolders outside the current BIDS root, at BIDSLayout instantiation.

In other words: it would be very useful (use case: templateflow/python-client#12) to implement internally the behavior of symlinking a (consistent) sub-XXX folder from outside from the top BIDS root. Especially for filesystems not supporting symlinks.

WDTY?

For the particular use case I linked, this feature would make it very easy to integrate new templates (tpl-XXXX) which are like subjects of BIDS to the base installation.

@effigies
Copy link
Collaborator

Don't we have a notion of adding datasets together? Would that be sufficient?

@oesteban
Copy link
Collaborator Author

In that case you'd need the mandatory metadata to be present in both, right (e.g., dataset_description.json)?

Not sure how the participants.tsv could be updated, though.

@tyarkoni
Copy link
Collaborator

I think this runs counter to the spirit of the spec, and possibly its letter... Do you just need the indexing process to respect symlinks found inside the root? If so, that seems reasonable, and shouldn't be hard. But if you need each sub dir to be treated like its own dataset, I'm not convinced...

@oesteban
Copy link
Collaborator Author

oesteban commented Jul 10, 2019

I think I did not sufficiently describe the use case:

>>> layout = BIDSLayout('/path/to/bids', index_external=['/mnt/hd1/other-bids1/sub-*', '/mnt/hd1/other-bids2/sub-singleexternal'])
>>> layout.get_subjects()
['01', '02', '03', 'ext1a', 'ext1b', 'singleexternal']

where 01, 02, and 03 are /path/to/bids/sub-0{1,2,3}, ext1a and ext1b are /mnt/hd1/other-bids1/sub-ext1{a,b} (and no other subjects are found there), and finally /mnt/hd1/other-bids2/sub-singleexternal.

I would never suggest allowing anything like this for the BIDS-Validator (where I believe the spirit of the spec should be important), but for PyBIDS this adds some nice features (particularly for systems not supporting symlinks). Actually, if you do this "by hand" (manually creating symlinks), the BIDS-Validator would pass provided the "external" subjects are consistent with the full dataset.

Do you just need the indexing process to respect symlinks found inside the root?

Yes, except for the need to have actual symlinks - see the example above.

But if you need each sub dir to be treated like its own dataset, I'm not convinced...

Nope, this is not the idea :). Not convinced either this would be something we want.

@tyarkoni
Copy link
Collaborator

I guess I'm not really sold... I can see the utility for the use case you describe, but it introduces some complexities that would be a bit of a pain to handle. For one thing, I see no particular reason to only allow subject directs to be indexed this way... what about other partial structures? But doing that would then require us to internally parse everything in index_external to figure out where it fits... and that could be ambiguous.

As you say, the current indexing approach should in principle respect symlinks (we may want to check that it does—I don't think we have tests for it, and I wouldn't be surprised if we currently use realpath in ways that might break things). Assuming that's true, why not just have the user create those themselves?

@tyarkoni
Copy link
Collaborator

tyarkoni commented Sep 9, 2019

Any objection to closing this, @oesteban? Unless you have other thoughts, I don't think this is viable without a lot of work, and I think it presents problems from a spec standpoint too as it encourages users to have broken partial BIDS structures (e.g., a single subject directory with no parent dataset_description.json, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants