Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix in split_index method #5292

Merged
merged 15 commits into from
Apr 18, 2024
Merged

Conversation

bm-synth
Copy link
Contributor

@bm-synth bm-synth commented Mar 17, 2024

Bug description: on a dataset of 20 samples, when running 4 workers with 8 threads per worker, then the split_dataset would return for worker id 1:

self.worker_splits
[[0, 5], [5, 10], [10, 15], [15, 20]]


self.thread_splits
[[5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 10], [11, 10], [12, 10]]

thread_splits is wrong and causes a crash in the DataAnalyzer: the end sample id is lower than the initial one on the last 2 threads.
This PR fixes that by fixing the behaviour of split_index

@bm-synth bm-synth marked this pull request as ready for review March 17, 2024 15:37
@bm-synth bm-synth requested a review from conglongli as a code owner March 17, 2024 15:37
@bm-synth
Copy link
Contributor Author

@loadams @conglongli what's holding this PR?

@loadams
Copy link
Contributor

loadams commented Apr 16, 2024

@loadams @conglongli what's holding this PR?

We need the correct folks to review

@loadams loadams enabled auto-merge April 18, 2024 17:39
@loadams loadams added this pull request to the merge queue Apr 18, 2024
Merged via the queue into microsoft:master with commit aaaf8bc Apr 18, 2024
12 checks passed
@bm-synth bm-synth deleted the bug_fix_split_index branch April 18, 2024 20:29
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
Bug description: on a dataset of 20 samples, when running 4 workers with
8 threads per worker, then the `split_dataset` would return for worker
id `1`:

```
self.worker_splits
[[0, 5], [5, 10], [10, 15], [15, 20]]


self.thread_splits
[[5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 10], [11, 10], [12, 10]]
```

`thread_splits` is wrong and causes a crash in the `DataAnalyzer`: the
end sample id is lower than the initial one on the last 2 threads.
This PR fixes that by fixing the behaviour of `split_index`

---------

Co-authored-by: Logan Adams <[email protected]>
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
Bug description: on a dataset of 20 samples, when running 4 workers with
8 threads per worker, then the `split_dataset` would return for worker
id `1`:

```
self.worker_splits
[[0, 5], [5, 10], [10, 15], [15, 20]]


self.thread_splits
[[5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 10], [11, 10], [12, 10]]
```

`thread_splits` is wrong and causes a crash in the `DataAnalyzer`: the
end sample id is lower than the initial one on the last 2 threads.
This PR fixes that by fixing the behaviour of `split_index`

---------

Co-authored-by: Logan Adams <[email protected]>
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
Bug description: on a dataset of 20 samples, when running 4 workers with
8 threads per worker, then the `split_dataset` would return for worker
id `1`:

```
self.worker_splits
[[0, 5], [5, 10], [10, 15], [15, 20]]


self.thread_splits
[[5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 10], [11, 10], [12, 10]]
```

`thread_splits` is wrong and causes a crash in the `DataAnalyzer`: the
end sample id is lower than the initial one on the last 2 threads.
This PR fixes that by fixing the behaviour of `split_index`

---------

Co-authored-by: Logan Adams <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants