-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make training of segnet, unet and classifiers easier by providing a single entry point to all training steps #54
Conversation
…ment them in code There is likely an official definition somewhere but I just couldn't find it. So I looked at example and tried to reconstruct the mapping. Unknown basically means that I just couldn't see the symbol on the picture.
…an out of memory exception after it used up about 30GB of memory
Fixed TypeError at start of unet or segnet training (BreezeWhite#52)
Since I am just too far away from the initial development and model training, I may not answer all the questions correctly above. I assume the failure one which you are trying to train is the second UNet model, which I refer it as "model 2" in the Model Prediction section. I guess the most possible failure reason is that you fixed it to predict 3 channels, but it should be 4 which the 4th channel corresponds to "nothing" channel. When using the crossentropy loss for training, such additional channel is always required to fulfill the formula itself. Maybe you could try adjust it and train it again, otherwise I cannot come out any other possible reasons. For the PR itself, I will review it later when I have time, since the modifications are relatively large. |
Thanks! By now, I'm also sure that I mixed up the models. That's something you can see if you compare the arch.json to the ones which are checked in. That alone didn't improve the results yet, but I'm optimistic that fixing that together with what you proposed gives better results. I'll update the PR and let you know about the results as soon as I found the time for this. |
I spent some hours now with diffing the arch.json files and consolidated them so that the arch.json which comes out of the training matches the one which are already checked in into git. That required to change the network definitions in the code a bit, which came as a surprise to me. But it can be easily verified by taking a look the arch.json files. I will double-check the results, but in first runs the models seem to be good now - as they give results similar to the pretrained models. |
Glad to hear that! By the way, I don't see any modifications to the |
No, I never committed the training results to git. From what I experienced, you should always keep weights and arch.json in sync. And right now, I don't intend to update the weights, as the results with this PR are the same as the ones oemer already has. If I find the time, then I might contribute to improved weights in the future. If you want to verify this PR, then I think the easiest is to run the training for just 1, 2 or 3 epochs. That's not enough epochs to give you good weights, but if you take a look at the predictions to see that they are converging towards the ones oemer uses right now. And you then also can then diff the arch.json files against the current ones. |
That's a bit more work then I expected, and perhaps more than I could afford now 😂 Maybe you could put it somewhere and show the training results and the performance? Like under your forked repo. |
* Fixed typos in comments * IndexError while scanning for a dot should not abort the whole process * Bound check while getting the note label * Added check if label is in the note_type_map * Filter staffs instead of aborting with an exception * Bound check during symbol extraction * Marking notes as invalid instead of aborting with an exception * Bound check * Fixed type error * Fixed TypeError at start of unet or segnet training (BreezeWhite#52) * Fixed 'TypeError: Cannot convert 4.999899999999999e-07 to EagerTensor of dtype int64' in training, fixes BreezeWhite#39 https://stackoverflow.com/questions/76511182/tensorflow-custom-learning-rate-scheduler-gives-unexpected-eagertensor-type-erro * --format was deprecated in ruff and replaced wtih --output-format * HoughLinesP can return None if no lines are found * Fixed error which happens if no rest bboxes were found * Limited try/except block * Fixed typo * Use fixed versions for the linter dependencies to avoid that results are different for the same source code level on different test runs due to update of the dependencies * Fixed type errors which came up with the recent version of cv2 * Going back to the newest version of ruff and mypy as the type errors were introduced by cv2
I've added weights and predictions from one example to this dummy release https://github.com/liebharc/oemer/releases/tag/v0.0.1 |
The results looks awesome! I think it's good to merge this PR when you are ready. |
Thank you! The PR is complete, so you can hit the merge button. |
Great! By the way, I'd like to adding you to this project with write permission, since you've contributed a lot to this project and have a deep understanding of |
Thanks @BreezeWhite , that's a big complement! However, I'm not sure if I'll continue working on |
@liebharc I see. I still want to promote you as one of the maintainer of this project, even though you might not have enough time to really maintain. It's my stubbornly thanks to you ^^ |
Hello @BreezeWhite , reusing this PR felt like the easiest way to contact you directly. I was able to train It also seems to be able to deal with two of the images provided in the issues section of |
Hi @liebharc , glad to hear the good news! I would definitely be the first one to try it out~ |
Thanks!
Right, but it still is good to show some respect to fellow contributors, and so I think it's good to first ask you :). The staff detection of I didn't link the issues, as github would then create a back reference from the issues to this PR and the original PR has nothing to do with the issues. |
Thanks for your kindness~ I haven't tried on the sheet in 51 and 59, but I've tried Maybe it's too much for this CLOSED PR lol, how about change the place for discussion? Maybe under |
Thanks for the feedback!
Sure, perhaps just create an issue in homr and we can continue there.
The weights TrOMR I added to the repo might have issues, as I recognized yesterday. A new training run will take two days or so, and then I know more. But I can easily imagine that
Always :) |
I see. Just can't wait to see the full power of fully trained weights ^^ |
Hi,
This PR should make it easier to retrain segnet, unet and classifiers. It adds a train.py script as entry point to train all models. The goal of this PR is to reproduce the current models without any modifications.
With the changes, I was able to get classifiers and unet models which work as well as the current models.
segnet fails to give meaningful results. It seems to converge in a local optimum where all resulting predictions are zero - or black predictions. @BreezeWhite, do you recall if there is anything I missed which might improve the segnet training? I assume that adding regularization or changing parameters might improve this. But in order to reproduce the current models, it would be good to not experiment too much here.
Assumptions I made:
Fixes:
Changes which are not mandatory but appeared useful for me: