[Bug] Duplicate tasks in experiment description file #728

k1o0 · 2024-10-16T11:13:39Z

There are potentially >100 sessions where the same task has been erroneously extracted as multiple chained protocols. This appears to happen because the experiment description file contains duplicate tasks. For example the session 13565016-ac41-4b32-b71d-474fd60a5052 KM_023/2024-08-21/001 contains the following experiment description:

{'devices': {'cameras': {'left': {'collection': 'raw_video_data',
    'sync_label': 'audio'}},
  'microphone': {'microphone': {'collection': 'raw_task_data_00',
    'sync_label': 'audio'}}},
 'procedures': ['Behavior training/tasks'],
 'projects': ['u19_proj1_multiareacom'],
 'sync': {'bpod': {'acquisition_software': 'pybpod',
   'collection': 'raw_task_data_00',
   'extension': '.jsonable'}},
 'tasks': [{'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}},
  {'_iblrig_tasks_biasedChoiceWorld': {'collection': 'raw_task_data_00'}}],
 'version': '1.0.0'}

The following Django query uncovers 118 potential sessions with this same issue:

from actions.models import Session
from django.db.models import Q, Count, F
n_raw_task_files = Count('data_dataset_session_related', distinct=True, filter=Q(data_dataset_session_related__name='_iblrig_taskData.raw.jsonable'))
n_task_tables = Count('data_dataset_session_related', distinct=True, filter=Q(data_dataset_session_related__name='_ibl_trials.table.pqt'))
ses = (Session
       .objects
       .prefetch_related('data_dataset_session_related')
       .annotate(n_raw_task_files=n_raw_task_files)
       .annotate(n_task_tables=n_task_tables)
       .filter(n_raw_task_files__lt=F('n_task_tables')))

This issue may be related to the aggregation of experiment description stubs, especially if behaviour data is somehow copied more than once.

This copy and merge of a sub should only be happening once per session (for a given acquisition PC)
Given that the collection is always provided, the tasks key should never contain duplicates.

The text was updated successfully, but these errors were encountered:

* Fix for int-brain-lab/iblrig#728 * Handle empty tasks key

oliche · 2024-11-15T13:13:13Z

For the UCL in London, we have 2 instances of duplicate tasks in the description file:

{'id': 'd0e1b460-9439-4aa5-8cf5-bd8f127a6154', 'subject': 'CQ001', 'start_time': '2024-10-07T15:24:19.975000', 'number': 1, 'lab': 'cortexlab', 'projects': ['ibl_fibrephotometry'], 'url': 'https://alyx.internationalbrainlab.org/sessions/d0e1b460-9439-4aa5-8cf5-bd8f127a6154', 'task_protocol': '_iblrig_tasks_trainingChoiceWorld8.19.6'}
(S3) /mnt/s1/spikesorting/raw_data/cortexlab/Subjects/CQ001/2024-10-07/001/_ibl_experiment.description.yaml: 100%|██████████| 581/581 [00:00<00:00, 1.83kB/s]
Multiple tasks: [{'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00', 'sync_label': 'bpod'}}, {'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00', 'sync_label': 'bpod'}}]

{'id': 'd3ebb4bb-2790-421d-bd71-1a062f2c4a6e', 'subject': 'CQ001', 'start_time': '2024-11-01T14:34:40.200000', 'number': 1, 'lab': 'cortexlab', 'projects': ['ibl_fibrephotometry'], 'url': 'https://alyx.internationalbrainlab.org/sessions/d3ebb4bb-2790-421d-bd71-1a062f2c4a6e', 'task_protocol': '_iblrig_tasks_trainingChoiceWorld8.24.7'}
(S3) /mnt/s1/spikesorting/raw_data/cortexlab/Subjects/CQ001/2024-11-01/001/_ibl_experiment.description.yaml: 100%|██████████| 537/537 [00:00<00:00, 1.60kB/s]
Multiple tasks: [{'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00'}}, {'_iblrig_tasks_trainingChoiceWorld': {'collection': 'raw_task_data_00'}}]

Many other similar experiments over one month do not exhibit the behaviour, so it is not a systematic bug

k1o0 added a commit to int-brain-lab/ibllib that referenced this issue Oct 17, 2024

Fix for int-brain-lab/iblrig#728

063c713

oliche pushed a commit to int-brain-lab/ibllib that referenced this issue Oct 18, 2024

Duplicate experiment description tasks (#867)

8540443

* Fix for int-brain-lab/iblrig#728 * Handle empty tasks key

chris-langfield assigned chris-langfield, k1o0 and bimac Oct 21, 2024

mayofaulkner assigned mayofaulkner and unassigned chris-langfield Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Duplicate tasks in experiment description file #728

[Bug] Duplicate tasks in experiment description file #728

k1o0 commented Oct 16, 2024

oliche commented Nov 15, 2024

[Bug] Duplicate tasks in experiment description file #728

[Bug] Duplicate tasks in experiment description file #728

Comments

k1o0 commented Oct 16, 2024

oliche commented Nov 15, 2024