Skip to content

Commit

Permalink
[BOT] post-merge updates
Browse files Browse the repository at this point in the history
  • Loading branch information
OCA-git-bot committed Oct 24, 2023
1 parent b985b6e commit 80bf234
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 39 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ addon | version | maintainers | summary
[account_invoice_import](account_invoice_import/) | 14.0.3.3.0 | [![alexis-via](https://github.com/alexis-via.png?size=30px)](https://github.com/alexis-via) | Import supplier invoices/refunds as PDF or XML files
[account_invoice_import_facturx](account_invoice_import_facturx/) | 14.0.1.0.0 | [![alexis-via](https://github.com/alexis-via.png?size=30px)](https://github.com/alexis-via) | Import Factur-X/ZUGFeRD supplier invoices/refunds
[account_invoice_import_invoice2data](account_invoice_import_invoice2data/) | 14.0.2.1.1 | [![alexis-via](https://github.com/alexis-via.png?size=30px)](https://github.com/alexis-via) | Import supplier invoices using the invoice2data lib
[account_invoice_import_simple_pdf](account_invoice_import_simple_pdf/) | 14.0.3.2.1 | [![alexis-via](https://github.com/alexis-via.png?size=30px)](https://github.com/alexis-via) | Import simple PDF vendor bills
[account_invoice_import_simple_pdf](account_invoice_import_simple_pdf/) | 14.0.3.3.0 | [![alexis-via](https://github.com/alexis-via.png?size=30px)](https://github.com/alexis-via) | Import simple PDF vendor bills
[account_invoice_import_ubl](account_invoice_import_ubl/) | 14.0.1.0.1 | | Import UBL XML supplier invoices/refunds
[account_invoice_ubl](account_invoice_ubl/) | 14.0.1.0.1 | | Generate UBL XML file for customer invoices/refunds
[account_invoice_ubl_email_attachment](account_invoice_ubl_email_attachment/) | 14.0.1.0.0 | | Automatically adds the UBL file to the email.
Expand Down
31 changes: 18 additions & 13 deletions account_invoice_import_simple_pdf/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Account Invoice Import Simple PDF
!! This file is generated by oca-gen-addon-readme !!
!! changes will be overwritten. !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! source digest: sha256:af61863555148623cb6861214a0e0113c434295232ae77cb4659e4b40779b742
!! source digest: sha256:6100c6a40bc7b960417d1a8d58f8c0de938fdd8650cd7c98e588ba8783bb6b93
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
.. |badge1| image:: https://img.shields.io/badge/maturity-Beta-yellow.png
Expand Down Expand Up @@ -74,34 +74,28 @@ Installation

The most important technical component of this module is the tool that converts the PDF to text. Converting PDF to text is not an easy job. As outlined in this `blog post <https://dida.do/blog/how-to-extract-text-from-pdf>`_, different tools can give quite different results. The best results are usually achieved with tools based on a PDF viewer, which exclude pure-python tools. But pure-python tools are easier to install than tools based on a PDF viewer. It is important to understand that, if you change the PDF to text tool, you will certainly have a slightly different text output, which may oblige you to update the field extraction rule, which can be time-consuming if you have already configured many vendors.

The module supports 4 different extraction methods:
The module supports 5 different extraction methods:

1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
#. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
#. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
#. `pdfplumber <https://pypi.org/project/pdfplumber/>`_, which is a python library built on top the of the python library `pdfminer.six <https://pypi.org/project/pdfminer.six/>`_. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.
#. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.

PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber often gives lower-quality text output, but its advantage is that it's a pure-Python solution, so you will always be able to install it whatever your technical environnement is.
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.

You can choose one extraction method and only install the tools/libs for that method.

Install PyMuPDF
~~~~~~~~~~~~~~~

To install **PyMuPDF**, if you use Debian (Bullseye aka v11 or higher) or Ubuntu (20.04 or higher), run the following command:
Install it via pip:

.. code::
sudo apt install python3-fitz
sudo pip3 install --upgrade pymupdf
You can also install it via pip:

.. code::
sudo pip3 install --upgrade PyMuPDF
but beware that *PyMuPDF* is just a binding on MuPDF, so it will require MuPDF and all the development libs required to compile the binding. That's why *PyMuPDF* is much easier to install via the packages of your Linux distribution (package name **python3-fitz** on Debian/Ubuntu, but the package name may be different in other distributions) than with pip.
Beware that *PyMuPDF* is not a pure-python library: it uses MuPDF, which is written in C language. If a python wheel for your OS, CPU architecture and Python version is available on pypi (check the `list of PyMuPDF wheels <https://pypi.org/project/PyMuPDF/#files>`_ on pypi), it will install smoothly. Otherwize, the installation via pip will require MuPDF and all its development libs to compile the binding.

Install pdftotext python lib
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -138,6 +132,16 @@ To install the **pdfplumber** python lib, run:
sudo pip3 install --upgrade pdfplumber
Install pypdf
~~~~~~~~~~~~~

To install the **pypdf** python lib, run:

.. code::
sudo pip3 install --upgrade pypdf
Other requirements
~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -179,6 +183,7 @@ If you want to force Odoo to use a specific text extraction method, go to the me
#. pdftotext.lib
#. pdftotext.cmd
#. pdfplumber
#. pypdf

In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.

Expand Down
2 changes: 1 addition & 1 deletion account_invoice_import_simple_pdf/__manifest__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

{
"name": "Account Invoice Import Simple PDF",
"version": "14.0.3.2.1",
"version": "14.0.3.3.0",
"category": "Accounting/Accounting",
"license": "AGPL-3",
"summary": "Import simple PDF vendor bills",
Expand Down
54 changes: 30 additions & 24 deletions account_invoice_import_simple_pdf/static/description/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ <h1 class="title">Account Invoice Import Simple PDF</h1>
!! This file is generated by oca-gen-addon-readme !!
!! changes will be overwritten. !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! source digest: sha256:af61863555148623cb6861214a0e0113c434295232ae77cb4659e4b40779b742
!! source digest: sha256:6100c6a40bc7b960417d1a8d58f8c0de938fdd8650cd7c98e588ba8783bb6b93
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -->
<p><a class="reference external image-reference" href="https://odoo-community.org/page/development-status"><img alt="Beta" src="https://img.shields.io/badge/maturity-Beta-yellow.png" /></a> <a class="reference external image-reference" href="http://www.gnu.org/licenses/agpl-3.0-standalone.html"><img alt="License: AGPL-3" src="https://img.shields.io/badge/licence-AGPL--3-blue.png" /></a> <a class="reference external image-reference" href="https://github.com/OCA/edi/tree/14.0/account_invoice_import_simple_pdf"><img alt="OCA/edi" src="https://img.shields.io/badge/github-OCA%2Fedi-lightgray.png?logo=github" /></a> <a class="reference external image-reference" href="https://translation.odoo-community.org/projects/edi-14-0/edi-14-0-account_invoice_import_simple_pdf"><img alt="Translate me on Weblate" src="https://img.shields.io/badge/weblate-Translate%20me-F47D42.png" /></a> <a class="reference external image-reference" href="https://runboat.odoo-community.org/builds?repo=OCA/edi&amp;target_branch=14.0"><img alt="Try me on Runboat" src="https://img.shields.io/badge/runboat-Try%20me-875A7B.png" /></a></p>
<p>This module is an extension of the module <em>account_invoice_import</em>: it adds support for simple PDF invoices i.e. PDF invoice that don’t have an embedded XML file. This module has been developped to solve the drawbacks of the OCA module <strong>account_invoice_import_invoice2data</strong> ; its advantages are the following:</p>
Expand Down Expand Up @@ -411,42 +411,40 @@ <h1 class="title">Account Invoice Import Simple PDF</h1>
<li><a class="reference internal" href="#install-pdftotext-python-lib" id="toc-entry-3">Install pdftotext python lib</a></li>
<li><a class="reference internal" href="#install-pdftotext-command-line" id="toc-entry-4">Install pdftotext command line</a></li>
<li><a class="reference internal" href="#install-pdfplumber" id="toc-entry-5">Install pdfplumber</a></li>
<li><a class="reference internal" href="#other-requirements" id="toc-entry-6">Other requirements</a></li>
<li><a class="reference internal" href="#install-pypdf" id="toc-entry-6">Install pypdf</a></li>
<li><a class="reference internal" href="#other-requirements" id="toc-entry-7">Other requirements</a></li>
</ul>
</li>
<li><a class="reference internal" href="#configuration" id="toc-entry-7">Configuration</a></li>
<li><a class="reference internal" href="#bug-tracker" id="toc-entry-8">Bug Tracker</a></li>
<li><a class="reference internal" href="#credits" id="toc-entry-9">Credits</a><ul>
<li><a class="reference internal" href="#authors" id="toc-entry-10">Authors</a></li>
<li><a class="reference internal" href="#contributors" id="toc-entry-11">Contributors</a></li>
<li><a class="reference internal" href="#maintainers" id="toc-entry-12">Maintainers</a></li>
<li><a class="reference internal" href="#configuration" id="toc-entry-8">Configuration</a></li>
<li><a class="reference internal" href="#bug-tracker" id="toc-entry-9">Bug Tracker</a></li>
<li><a class="reference internal" href="#credits" id="toc-entry-10">Credits</a><ul>
<li><a class="reference internal" href="#authors" id="toc-entry-11">Authors</a></li>
<li><a class="reference internal" href="#contributors" id="toc-entry-12">Contributors</a></li>
<li><a class="reference internal" href="#maintainers" id="toc-entry-13">Maintainers</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="installation">
<h1><a class="toc-backref" href="#toc-entry-1">Installation</a></h1>
<p>The most important technical component of this module is the tool that converts the PDF to text. Converting PDF to text is not an easy job. As outlined in this <a class="reference external" href="https://dida.do/blog/how-to-extract-text-from-pdf">blog post</a>, different tools can give quite different results. The best results are usually achieved with tools based on a PDF viewer, which exclude pure-python tools. But pure-python tools are easier to install than tools based on a PDF viewer. It is important to understand that, if you change the PDF to text tool, you will certainly have a slightly different text output, which may oblige you to update the field extraction rule, which can be time-consuming if you have already configured many vendors.</p>
<p>The module supports 4 different extraction methods:</p>
<p>The module supports 5 different extraction methods:</p>
<ol class="arabic simple">
<li><a class="reference external" href="https://github.com/pymupdf/PyMuPDF">PyMuPDF</a> which is a Python binding for <a class="reference external" href="https://mupdf.com/">MuPDF</a>, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company <a class="reference external" href="https://artifex.com/">Artifex Software</a>.</li>
<li><a class="reference external" href="https://pypi.org/project/pdftotext/">pdftotext python library</a>, which is a python binding for the pdftotext tool.</li>
<li><a class="reference external" href="https://en.wikipedia.org/wiki/Pdftotext">pdftotext command line tool</a>, which is based on <a class="reference external" href="https://poppler.freedesktop.org/">poppler</a>, a PDF rendering library used by <a class="reference external" href="https://www.xpdfreader.com/">xpdf</a> and <a class="reference external" href="https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions">Evince</a> (the PDF reader of <a class="reference external" href="https://www.gnome.org/">Gnome</a>).</li>
<li><a class="reference external" href="https://pypi.org/project/pdfplumber/">pdfplumber</a>, which is a python library built on top the of the python library <a class="reference external" href="https://pypi.org/project/pdfminer.six/">pdfminer.six</a>. pdfplumber is a pure-python solution, so it’s very easy to install on all OSes.</li>
<li><a class="reference external" href="https://github.com/py-pdf/pypdf/">pypdf</a>, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it’s very easy to install on all OSes.</li>
</ol>
<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pdfplumber often gives lower-quality text output, but its advantage is that it’s a pure-Python solution, so you will always be able to install it whatever your technical environnement is.</p>
<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.</p>
<p>You can choose one extraction method and only install the tools/libs for that method.</p>
<div class="section" id="install-pymupdf">
<h2><a class="toc-backref" href="#toc-entry-2">Install PyMuPDF</a></h2>
<p>To install <strong>PyMuPDF</strong>, if you use Debian (Bullseye aka v11 or higher) or Ubuntu (20.04 or higher), run the following command:</p>
<p>Install it via pip:</p>
<pre class="code literal-block">
sudo apt install python3-fitz
sudo pip3 install --upgrade pymupdf
</pre>
<p>You can also install it via pip:</p>
<pre class="code literal-block">
sudo pip3 install --upgrade PyMuPDF
</pre>
<p>but beware that <em>PyMuPDF</em> is just a binding on MuPDF, so it will require MuPDF and all the development libs required to compile the binding. That’s why <em>PyMuPDF</em> is much easier to install via the packages of your Linux distribution (package name <strong>python3-fitz</strong> on Debian/Ubuntu, but the package name may be different in other distributions) than with pip.</p>
<p>Beware that <em>PyMuPDF</em> is not a pure-python library: it uses MuPDF, which is written in C language. If a python wheel for your OS, CPU architecture and Python version is available on pypi (check the <a class="reference external" href="https://pypi.org/project/PyMuPDF/#files">list of PyMuPDF wheels</a> on pypi), it will install smoothly. Otherwize, the installation via pip will require MuPDF and all its development libs to compile the binding.</p>
</div>
<div class="section" id="install-pdftotext-python-lib">
<h2><a class="toc-backref" href="#toc-entry-3">Install pdftotext python lib</a></h2>
Expand Down Expand Up @@ -474,8 +472,15 @@ <h2><a class="toc-backref" href="#toc-entry-5">Install pdfplumber</a></h2>
sudo pip3 install --upgrade pdfplumber
</pre>
</div>
<div class="section" id="install-pypdf">
<h2><a class="toc-backref" href="#toc-entry-6">Install pypdf</a></h2>
<p>To install the <strong>pypdf</strong> python lib, run:</p>
<pre class="code literal-block">
sudo pip3 install --upgrade pypdf
</pre>
</div>
<div class="section" id="other-requirements">
<h2><a class="toc-backref" href="#toc-entry-6">Other requirements</a></h2>
<h2><a class="toc-backref" href="#toc-entry-7">Other requirements</a></h2>
<p>This module also requires the following Python libraries:</p>
<ul class="simple">
<li><a class="reference external" href="https://pypi.org/project/regex/">regex</a> which is backward-compatible with the <em>re</em> module of the Python standard library, but has additional functionalities.</li>
Expand All @@ -496,7 +501,7 @@ <h2><a class="toc-backref" href="#toc-entry-6">Other requirements</a></h2>
</div>
</div>
<div class="section" id="configuration">
<h1><a class="toc-backref" href="#toc-entry-7">Configuration</a></h1>
<h1><a class="toc-backref" href="#toc-entry-8">Configuration</a></h1>
<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pdfplumber</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
<p>If you want to force Odoo to use a specific text extraction method, go to the menu <em>Configuration &gt; Technical &gt; Parameters &gt; System Parameters</em> and create a new System Parameter:</p>
<ul class="simple">
Expand All @@ -506,36 +511,37 @@ <h1><a class="toc-backref" href="#toc-entry-7">Configuration</a></h1>
<li>pdftotext.lib</li>
<li>pdftotext.cmd</li>
<li>pdfplumber</li>
<li>pypdf</li>
</ol>
</li>
</ul>
<p>In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.</p>
<p>You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this <a class="reference external" href="https://www.youtube.com/watch?v=edsEuXVyEYE">screencast</a>.</p>
</div>
<div class="section" id="bug-tracker">
<h1><a class="toc-backref" href="#toc-entry-8">Bug Tracker</a></h1>
<h1><a class="toc-backref" href="#toc-entry-9">Bug Tracker</a></h1>
<p>Bugs are tracked on <a class="reference external" href="https://github.com/OCA/edi/issues">GitHub Issues</a>.
In case of trouble, please check there if your issue has already been reported.
If you spotted it first, help us to smash it by providing a detailed and welcomed
<a class="reference external" href="https://github.com/OCA/edi/issues/new?body=module:%20account_invoice_import_simple_pdf%0Aversion:%2014.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**">feedback</a>.</p>
<p>Do not contact contributors directly about support or help with technical issues.</p>
</div>
<div class="section" id="credits">
<h1><a class="toc-backref" href="#toc-entry-9">Credits</a></h1>
<h1><a class="toc-backref" href="#toc-entry-10">Credits</a></h1>
<div class="section" id="authors">
<h2><a class="toc-backref" href="#toc-entry-10">Authors</a></h2>
<h2><a class="toc-backref" href="#toc-entry-11">Authors</a></h2>
<ul class="simple">
<li>Akretion</li>
</ul>
</div>
<div class="section" id="contributors">
<h2><a class="toc-backref" href="#toc-entry-11">Contributors</a></h2>
<h2><a class="toc-backref" href="#toc-entry-12">Contributors</a></h2>
<ul class="simple">
<li>Alexis de Lattre &lt;<a class="reference external" href="mailto:alexis.delattre&#64;akretion.com">alexis.delattre&#64;akretion.com</a>&gt;</li>
</ul>
</div>
<div class="section" id="maintainers">
<h2><a class="toc-backref" href="#toc-entry-12">Maintainers</a></h2>
<h2><a class="toc-backref" href="#toc-entry-13">Maintainers</a></h2>
<p>This module is maintained by the OCA.</p>
<a class="reference external image-reference" href="https://odoo-community.org"><img alt="Odoo Community Association" src="https://odoo-community.org/logo.png" /></a>
<p>OCA, or the Odoo Community Association, is a nonprofit organization whose
Expand Down

0 comments on commit 80bf234

Please sign in to comment.