Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Duplicate tasks in experiment description file #728

Open
k1o0 opened this issue Oct 16, 2024 · 1 comment
Open

[Bug] Duplicate tasks in experiment description file #728

k1o0 opened this issue Oct 16, 2024 · 1 comment
Assignees

Comments

@k1o0
Copy link
Contributor

k1o0 commented Oct 16, 2024

There are potentially >100 sessions where the same task has been erroneously extracted as multiple chained protocols. This appears to happen because the experiment description file contains duplicate tasks. For example the session 13565016-ac41-4b32-b71d-474fd60a5052 KM_023/2024-08-21/001 contains the following experiment description:

{'devices': {'cameras': {'left': {'collection': 'raw_video_data',
    'sync_label': 'audio'}},
  'microphone': {'microphone': {'collection': 'raw_task_data_00',
    'sync_label': 'audio'}}},
 'procedures': ['Behavior training/tasks'],
 'projects': ['u19_proj1_multiareacom'],
 'sync': {'bpod': {'acquisition_software': 'pybpod',
   'collection': 'raw_task_data_00',
   'extension': '.jsonable'}},
 'tasks': [{'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}}],
 'version': '1.0.0'}

The following Django query uncovers 118 potential sessions with this same issue:

from actions.models import Session
from django.db.models import Q, Count, F
n_raw_task_files = Count('data_dataset_session_related', distinct=True, filter=Q(data_dataset_session_related__name='_iblrig_taskData.raw.jsonable'))
n_task_tables = Count('data_dataset_session_related', distinct=True, filter=Q(data_dataset_session_related__name='_ibl_trials.table.pqt'))
ses = (Session
       .objects
       .prefetch_related('data_dataset_session_related')
       .annotate(n_raw_task_files=n_raw_task_files)
       .annotate(n_task_tables=n_task_tables)
       .filter(n_raw_task_files__lt=F('n_task_tables')))

This issue may be related to the aggregation of experiment description stubs, especially if behaviour data is somehow copied more than once.

  1. This copy and merge of a sub should only be happening once per session (for a given acquisition PC)
  2. Given that the collection is always provided, the tasks key should never contain duplicates.
k1o0 added a commit to int-brain-lab/ibllib that referenced this issue Oct 17, 2024
oliche pushed a commit to int-brain-lab/ibllib that referenced this issue Oct 18, 2024
@oliche
Copy link
Member

oliche commented Nov 15, 2024

For the UCL in London, we have 2 instances of duplicate tasks in the description file:

{'id': 'd0e1b460-9439-4aa5-8cf5-bd8f127a6154', 'subject': 'CQ001', 'start_time': '2024-10-07T15:24:19.975000', 'number': 1, 'lab': 'cortexlab', 'projects': ['ibl_fibrephotometry'], 'url': 'https://alyx.internationalbrainlab.org/sessions/d0e1b460-9439-4aa5-8cf5-bd8f127a6154', 'task_protocol': '_iblrig_tasks_trainingChoiceWorld8.19.6'}
(S3) /mnt/s1/spikesorting/raw_data/cortexlab/Subjects/CQ001/2024-10-07/001/_ibl_experiment.description.yaml: 100%|██████████| 581/581 [00:00<00:00, 1.83kB/s]
Multiple tasks: [{'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00', 'sync_label': 'bpod'}}, {'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00', 'sync_label': 'bpod'}}]
{'id': 'd3ebb4bb-2790-421d-bd71-1a062f2c4a6e', 'subject': 'CQ001', 'start_time': '2024-11-01T14:34:40.200000', 'number': 1, 'lab': 'cortexlab', 'projects': ['ibl_fibrephotometry'], 'url': 'https://alyx.internationalbrainlab.org/sessions/d3ebb4bb-2790-421d-bd71-1a062f2c4a6e', 'task_protocol': '_iblrig_tasks_trainingChoiceWorld8.24.7'}
(S3) /mnt/s1/spikesorting/raw_data/cortexlab/Subjects/CQ001/2024-11-01/001/_ibl_experiment.description.yaml: 100%|██████████| 537/537 [00:00<00:00, 1.60kB/s]
Multiple tasks: [{'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00'}}, {'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00'}}]

Many other similar experiments over one month do not exhibit the behaviour, so it is not a systematic bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants