Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of unique track ids in dataset zoo #539

Open
youonlytrackonce opened this issue Apr 13, 2023 · 3 comments
Open

Number of unique track ids in dataset zoo #539

youonlytrackonce opened this issue Apr 13, 2023 · 3 comments

Comments

@youonlytrackonce
Copy link

Hello,

As the title states, I have been investigating the numbers of unique track ids in data_all dataset. Because there are more than one dataset and each may have more than one sequence, how can the network learn reid embeddings for each unique ids? There can be colliding track ids in the different datasets.

@janthmueller
Copy link

Regarding the subsets of a dataset, the track ids already exist in this format, or are created using the gen_label modules. Regarding multiple datasets, for example, within the JointDataset class these ids are processed one more time so that the maximum id of the preceding datasets is added to all ids of the following dataset.

@sompt22
Copy link

sompt22 commented Aug 17, 2023

Regarding the subsets of a dataset, the track ids already exist in this format, or are created using the gen_label modules. Regarding multiple datasets, for example, within the JointDataset class these ids are processed one more time so that the maximum id of the preceding datasets is added to all ids of the following dataset.

Could you please point the file and line of the code where maximum id of the preceding datasets is added to all ids of the following dataset? Is it on the fly process or applied on the disk?

@janthmueller
Copy link

Regarding the subsets of a dataset, the track ids already exist in this format, or are created using the gen_label modules. Regarding multiple datasets, for example, within the JointDataset class these ids are processed one more time so that the maximum id of the preceding datasets is added to all ids of the following dataset.

Could you please point the file and line of the code where maximum id of the preceding datasets is added to all ids of the following dataset? Is it on the fly process or applied on the disk?

It is applied on the fly. See the JointDataset class (in .\src\lib\datasets\dataset\jde.py) at the beginning of the __getitem__ method (line 426-428):

        for i, _ in enumerate(labels):
            if labels[i, 1] > -1:
                labels[i, 1] += self.tid_start_index[ds]

The self.tid_start_index attribute gets computed in the __init__ (line 524-527 with tid_num computed just before that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants