New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

s3-ocr file command to process a single PDF #12

Open

simonw opened this issue Jun 30, 2022 · 1 comment

Labels

Owner

simonw commented Jun 30, 2022

Would still require a bucket since PDFs through Textract need to go through a bucket.

Maybe has an option to block and poll for completion?

Default operation can be to put the object to the bucket and then start an OCR run against it.

Can use the same filename, but return an error if a file of that name exists already.

simonw added the enhancement label

Owner Author

simonw commented Jun 30, 2022 •

edited

Loading

s3-ocr file my-bucket document.pdf

Default mode outputs a message saying that the file has been uploaded and put in the OCR queue.

Option --wait waits for it to complete and then returns the text version of the OCR.

--wait --json blocks and then returns the output of fetch --combine to standard output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment