Skip to content
@pd3f

pd3f

PDF text extraction pipeline: self-hosted, local-first and Docker-based

Pinned Loading

  1. pd3f pd3f Public

    🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

    HTML 299 40

  2. pd3f-core pd3f-core Public

    📑 Python Package to reconstruct the original continuous text from PDFs with language models

    Jupyter Notebook 33 8

  3. dehyphen dehyphen Public

    📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

    Python 38 4

Repositories

Showing 7 of 7 repositories
  • pd3f Public

    🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

    pd3f/pd3f’s past year of commit activity
    HTML 299 AGPL-3.0 40 16 3 Updated Oct 13, 2023
  • pd3f-core Public

    📑 Python Package to reconstruct the original continuous text from PDFs with language models

    pd3f/pd3f-core’s past year of commit activity
    Jupyter Notebook 33 AGPL-3.0 8 2 23 Updated Sep 8, 2023
  • pd3f.com Public

    📝 Website to advertise & document pd3f

    pd3f/pd3f.com’s past year of commit activity
    JavaScript 1 MIT 2 0 1 Updated Jan 22, 2023
  • pd3f-dataset-bmjv Public

    Dataset of (mostly German) PDFs used to develop pd3f

    pd3f/pd3f-dataset-bmjv’s past year of commit activity
    Python 1 MIT 1 0 5 Updated Dec 8, 2022
  • dehyphen Public

    📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

    pd3f/dehyphen’s past year of commit activity
    Python 38 GPL-3.0 4 6 1 Updated Mar 8, 2022
  • pd3-flair Public Forked from flairNLP/flair

    Flair's language models without unnecessary dependencies

    pd3f/pd3-flair’s past year of commit activity
    Python 3 2,113 0 0 Updated Sep 15, 2020
  • pd3f-results Public

    Results with pd3f on some PDF datasets

    pd3f/pd3f-results’s past year of commit activity
    Jupyter Notebook 1 GPL-3.0 1 0 0 Updated Aug 21, 2020

Top languages

Loading…