You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In last version we had parameter called metadata-files which took a json file as an input that had mapping of language(s) for each pdf file. In the new version this parameter is not there and the config takes the language in comma separated form. What I have observed is the OCR sucks for few pdf where the right language is not specified.
Any solution to specify language for each pdf in bulk in new version.?
Also in surya is there an inbuilt mechanism which detects the language of the page?
Thanks
The text was updated successfully, but these errors were encountered:
Hello,
Thanks for the great library.
In last version we had parameter called
metadata-files
which took a json file as an input that had mapping of language(s) for each pdf file. In the new version this parameter is not there and the config takes the language in comma separated form. What I have observed is the OCR sucks for few pdf where the right language is not specified.Any solution to specify language for each pdf in bulk in new version.?
Also in surya is there an inbuilt mechanism which detects the language of the page?
Thanks
The text was updated successfully, but these errors were encountered: