Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to specify langauges for each pdf seperately in new version? #420

Open
aumungray opened this issue Dec 11, 2024 · 0 comments
Open

How to specify langauges for each pdf seperately in new version? #420

aumungray opened this issue Dec 11, 2024 · 0 comments

Comments

@aumungray
Copy link

Hello,

Thanks for the great library.

In last version we had parameter called metadata-files which took a json file as an input that had mapping of language(s) for each pdf file. In the new version this parameter is not there and the config takes the language in comma separated form. What I have observed is the OCR sucks for few pdf where the right language is not specified.
Any solution to specify language for each pdf in bulk in new version.?
Also in surya is there an inbuilt mechanism which detects the language of the page?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant