-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USPTO #71
USPTO #71
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think it is basically ready to merge!
Can you write a quick README.md on how to do everything? what to install (node, the uspto requirements.txt, etc) what order to run scripts in, etc?
Thanks for the review @blester125! I think i have addressed all your points but let me know if I missed anything! Added the readme as well. The only thing left are the tables. They are mostly fine but the whitespace formatting is quite off. Will bring it up in the meeting! |
If you rebase on main and push again the lint error should go away and we can get this merged! |
Thanks! Feel free to merge when ready! |
Whenever you're ready! |
This PR adds the code to process the USPTO dataset extracted from Google Patents Public Dataset and uploaded to HF. The dataset covers all US patent applications until Oct 27, 2023 (including historical).
closes #9
Edit: switched over to pandoc as that does the latex formatting pretty well, and also works well with the overall html.