-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot replicate results from object detection task guide #30557
Comments
Hi @adam-homeboost, thanks for reporting the problem! I am working on refining object detection examples. |
Thank you @qubvel . I definitely appreciate the work you are doing to update the example! Based on your comments, I ran the original example's training out to 100 epochs instead of the documented 10 and got much better results that matched close enough. So, definitely a documentation issue there. I see that your new example properly show the correct number of epochs. As a beginner in ML, can I offer a couple of suggestions for your new examples? These are questions I have:
|
If you are wondering about detr architecture, it should be the same as here and you can find the paper in the documentation page as well.
My understanding is |
@adam-homeboost great questions, it will help to improve examples! @g1y5x3 thanks for the answers, I will add more here:
We might consider two types of transformations:
I have a bit different opinion rather than @g1y5x3. Both APIs support multigpu training. The Please, let me know if it becomes a bit clearer and if you have any follow-up questions 🤗 |
Thank you for the clarification. It makes total sense. Quick question, after looking through your PR, it didn't touch the example. However, I remember that this needs a bit clarification as training that dataset for 10 epochs won't yield any good predictions, I tried to run it with 100 epochs, would take ~3 hours on a A6000. Does it need to be updated? |
@g1y5x3 yes, you are right, it needs to be updated. Any help is appreciated, in case you want to contribute, you can open a PR that aligns notebook example with python one and ping me for any help or review 🙂 |
That's what my PR #29967 addresses, I'd prefer to have it merged to unblock people. |
Yes, let's have it merged. And the next PR can address other points in a notebook example from your issue (taking as a base #30422) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers
version: 4.40.1I am following the examples given in https://huggingface.co/docs/transformers/en/tasks/object_detection
I am following it as closely as I possibly can. The only difference is that I am not pushing the training results up to hugging face and am instead saving (and reloading them) locally.
When I run the evaluation I get terrible results that look nothing like what the examples do. Instead of mAPs in the 0.3 - 0.7 range, I am getting results well under 0.1.
Instead of the expected:
This much difference makes for a "tuned" model that does not work at all.
If I use the tuned dataset on huggingface as documented (uploaded by the author?) then I get expected results. For some reason, following the same steps, I cannot get to a tuned model with anywhere near the performance the author did.
I have tried this on different hardware and gotten different results as well. The above numbers are from nvidia gpus. I tried this on mac m3 gpus and got even worse numbers (all less than 0.001).
I am new to machine learning and this toolset. I would appreciate any suggestions or guidelines as to what I could be doing wrong here. I also do not understand why running this on different hardware and different cpu vs gpu mixes ends up with different scores.
Any help or suggestions appreciated!
Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Follow task code exactly, except:
After trainer.train(), do trainer.save_model()
in eval steps, instantiate AutoImageProcessor and AutoModelForObjectDetection from_pretrained from the saved model in step 1.
Expected behavior
Expect same or reasonably close scores in the eval step.
The text was updated successfully, but these errors were encountered: