-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classify "phenotype/" as a datatype directory with no subject/session parent #1828
Comments
Thanks @effigies for the ping!
Yes, we did discuss this option as the "segregated" representation, i.e. "put pheno data in the leaves of the tree" or as you say: "treat pheno as any other data type" In the current version of BEP036 I am (we are?) leaning to excluding the "segregated" option in favour of the "aggregated" option of a root level
I can see how treating Maybe I'm overcautious here - but in my experience phenotypic data can be the most messy part of a dataset and are often acquired / handled by non-technical people in a research team. So I'm a bit concerned about tranforming data and storing them in a way that makes it easy for hard to detect inconsistencies to sneak in. I think @barbarastrasser, you also commented on BEP036 about this topic because of your use cases, maybe you could add your thoughts too. |
I am not presently trying to make phenotype valid within subject, just to determine if it is a datatype. That it allows us the possibility of enabling it at lower levels if the use cases are compelling seems like an argument that it is that kind of thing. Saying so would not obligate us to define files with this datatype that show up in subject/session directories. We cannot disable it at the root level in BIDS 1.x, in any case. |
Ah OK, guess I misread your question.
So you are proposing to turn
I'm not very familiar with the BIDS schema or what the implication of such a change would be. From looking at your PR, my limited understanding is that the proposed change allows you to do some general checks for Maybe @ericearl would be better here to give feedback. |
Correct. |
Hi everyone, Maybe first some high-level thoughts on the aggregated vs. segregated approach: I think it depends a bit on how to look at data. Is the aim to describe a participant in depth (maybe also interesting when looking up imaging and pheno data across datasets) or is the aim to describe a dataset in depth? For the former, it might be easier if everything that is collected is structured the same segregated way - especially when thinking about designing software for automatic querying etc.). For the latter a phenotype directory in the dataset root should be sufficient to my impression. But I also understand the user perspective. I agree that the aggregated format is the way people acquire phenotype data most of the time, and that it might be easier for them to handle than storing single rows, which is error-prone. However, issues I witnessed with the current specification is that it is not flexible enough to satisfy the needs of researchers. I know that there are individual efforts going on to split the aggregated data row-wise and store this single line in the The specific problems we encountered were that the way validation is currently handled does not allow for
I think people will use the phenotype directory more, whether it is in the subject directory or in the root directory, as long as there is flexibility to deal with cases like the ones described above. |
phenotype/
is a bit of an outlier in BIDS terms. Other folders at the top level are either entities (sub-<label>
) or their contents are opaque to BIDS.phenotype/
, on the other hand, is a collection of.tsv
/.json
files that are to be validated on the same terms asparticipants.tsv
/participants.json
.I would suggest classifying
phenotype
as a datatype, distinct from other datatypes only in that it spans multiple subjects, and so the subject and session entities do not apply. BEP036 seems to go some way in a similar direction, permitting apheno/
datatype within subjects/sessions.In the (unmerged) PR #1672, I suggest using phenotype as a datatype for the purposes of filename validation, and then carve out some exceptions that allow us to use it that way without it being an official datatype. If we make it a datatype, then the exception can be removed. That it fits with very little modification to the schema and validation (https://github.com/bids-standard/bids-validator/pull/1957), seems to me to be an argument for this classification.
The alternative, as I see it, is to consider phenotype a completely unique category of thing, and all implementations will need to have special code for handling it.
The text was updated successfully, but these errors were encountered: