How to specify langauges for each pdf seperately in new version? #420

aumungray · 2024-12-11T18:31:41Z

Hello,

Thanks for the great library.

In last version we had parameter called metadata-files which took a json file as an input that had mapping of language(s) for each pdf file. In the new version this parameter is not there and the config takes the language in comma separated form. What I have observed is the OCR sucks for few pdf where the right language is not specified.
Any solution to specify language for each pdf in bulk in new version.?
Also in surya is there an inbuilt mechanism which detects the language of the page?

Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to specify langauges for each pdf seperately in new version? #420

How to specify langauges for each pdf seperately in new version? #420

aumungray commented Dec 11, 2024

How to specify langauges for each pdf seperately in new version? #420

How to specify langauges for each pdf seperately in new version? #420

Comments

aumungray commented Dec 11, 2024