- Mailing Lists
- Contributors
- Re: Module to read and extract information from PDF's
Archives
- By thread 1419
-
By date
- August 2019 59
- September 2019 118
- October 2019 165
- November 2019 97
- December 2019 35
- January 2020 58
- February 2020 204
- March 2020 121
- April 2020 172
- May 2020 50
- June 2020 158
- July 2020 85
- August 2020 94
- September 2020 193
- October 2020 277
- November 2020 100
- December 2020 159
- January 2021 38
- February 2021 87
- March 2021 146
- April 2021 73
- May 2021 90
- June 2021 86
- July 2021 123
- August 2021 50
- September 2021 68
- October 2021 66
- November 2021 74
- December 2021 75
- January 2022 98
- February 2022 77
- March 2022 68
- April 2022 31
- May 2022 59
- June 2022 87
- July 2022 141
- August 2022 38
- September 2022 73
- October 2022 152
- November 2022 39
- December 2022 50
- January 2023 93
- February 2023 49
- March 2023 106
- April 2023 47
- May 2023 69
- June 2023 92
- July 2023 64
- August 2023 103
- September 2023 91
- October 2023 101
- November 2023 94
- December 2023 46
- January 2024 75
- February 2024 79
- March 2024 104
- April 2024 63
- May 2024 40
- June 2024 160
- July 2024 80
- August 2024 70
- September 2024 62
- October 2024 121
- November 2024 117
- December 2024 89
- January 2025 59
- February 2025 104
- March 2025 96
- April 2025 107
- May 2025 52
- June 2025 72
- July 2025 60
- August 2025 81
- September 2025 124
- October 2025 63
- November 2025 22
Contributors
Re: Module to read and extract information from PDF's
Re: Module to read and extract information from PDF's
Re: Module to read and extract information from PDF's
import pdftotext
import pytesseract
from pdf2image import convert_from_bytes
You can try with invoice2data extractor.It can extract data from PDF (not only invoice info)El mié, 6 sept 2023 a las 22:42, Samuel Macias Oropeza (<notifications@odoo-community.org>) escribió:Hello everyone.
We have a client using Odoo 16 that needs to extract information from a PDF file and update a res.partner record with this info. The PDF contains data like name, address, ZIP Code, VAT number, etc. Does anyone know of any module/python library that could help us with this?
Thank you!--SAMUEL MACIAS OROPEZA
TECH LEAD
smacias@opensourceintegrators.com
P.O. BOX 940, HIGLEY, AZ 85236
_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
--Enric Tobella AlomarCEO & Founder_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
by Graeme Gellatly - 11:26 - 6 Sep 2023
Reference
-
Module to read and extract information from PDF's
Hello everyone.
We have a client using Odoo 16 that needs to extract information from a PDF file and update a res.partner record with this info. The PDF contains data like name, address, ZIP Code, VAT number, etc. Does anyone know of any module/python library that could help us with this?
Thank you!--SAMUEL MACIAS OROPEZA
TECH LEAD
smacias@opensourceintegrators.com
P.O. BOX 940, HIGLEY, AZ 85236

by Samuel Macias Oropeza - 10:41 - 6 Sep 2023-
Re: Module to read and extract information from PDF's
Out of the box Odoo is capable to extract the text content from a it.attachment.
You just need to make sure the pdfminer.six Python library is installed.
When hat is the case, the attachment document text is extracted and written in a ir.attachment text field.
You can then do content search or even implement business logic based on it.
Reference:
https://github.com/odoo/odoo/blob/55423cbdeeb1ce35fb257624ea0d04d4be99a943/addons/attachment_indexation/__manifest__.py#L13
Thanks
Daniel
On 06/09/2023 21:42, Samuel Macias Oropeza wrote:
Hello everyone.
We have a client using Odoo 16 that needs to extract information from a PDF file and update a res.partner record with this info. The PDF contains data like name, address, ZIP Code, VAT number, etc. Does anyone know of any module/python library that could help us with this?
Thank you!
--
SAMUEL MACIAS OROPEZA
TECH LEAD
smacias@opensourceintegrators.com
P.O. BOX 940, HIGLEY, AZ 85236

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
--
DANIEL REIS
MANAGING PARTNERM: +351 919 991 307
E: dreis@OpenSourceIntegrators.com
A: Avenida da República 3000, Estoril Office B, 3º Escr.34, 2649-517 Cascais
by Daniel Reis - 09:26 - 7 Sep 2023 -
Re: Module to read and extract information from PDF's
invoice2data is becoming a bit more unstable we are finding with new maintainers. For years it was fairly static and unchanging and fairly dedicated to Odoo, now it is more generalised. Also for this purpose it would need a bit of customization and it really only suits when you know the document beforehand. We still use it, but wouldn't for a requirement like this.For our recent requirements to integrate with DMS and also enterprise Documents module to auto receive records and attach to correct record in this area, we have gone with what is listed below with a simple custom frontend model to define patterns. This was for a backscanning project of some 1m pages, multipage detection, multi doctype kind of thing. Basically, scan 150 pages on a scanner, it comes in, gets parsed and page breaks made and separate files with a copy of extracted text, then auto attached to correct record.pdftotext works as advertised. tesseract has some dependencies and quirks, which is fine, just needs some error and ambiguous bit handling. To do really well, you would also want opencv etc to do things like contrast and deskew images from scanned files, but we found actually for the overhead, for the documents we were doing it didn't really add any value. We offered to clean up and put this work to OCA but were refused on basis that noone does OCR anymore.Alternatively, you can just push to something like GVision for images. That was our first implementation, it is maybe 1/3 of the code, but harder to test in isolated dev and the results, and while much more comprehensive, for our use case weren't really value for money.import pdftotext
import pytesseract
from pdf2image import convert_from_bytesOn Thu, Sep 7, 2023 at 8:51 AM Enric Tobella Alomar <notifications@odoo-community.org> wrote:You can try with invoice2data extractor.It can extract data from PDF (not only invoice info)El mié, 6 sept 2023 a las 22:42, Samuel Macias Oropeza (<notifications@odoo-community.org>) escribió:Hello everyone.
We have a client using Odoo 16 that needs to extract information from a PDF file and update a res.partner record with this info. The PDF contains data like name, address, ZIP Code, VAT number, etc. Does anyone know of any module/python library that could help us with this?
Thank you!--SAMUEL MACIAS OROPEZA
TECH LEAD
smacias@opensourceintegrators.com
P.O. BOX 940, HIGLEY, AZ 85236

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
--Enric Tobella AlomarCEO & Founder_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
by Graeme Gellatly - 11:26 - 6 Sep 2023 -
Re: Module to read and extract information from PDF's
You can try with invoice2data extractor.It can extract data from PDF (not only invoice info)El mié, 6 sept 2023 a las 22:42, Samuel Macias Oropeza (<notifications@odoo-community.org>) escribió:Hello everyone.
We have a client using Odoo 16 that needs to extract information from a PDF file and update a res.partner record with this info. The PDF contains data like name, address, ZIP Code, VAT number, etc. Does anyone know of any module/python library that could help us with this?
Thank you!--SAMUEL MACIAS OROPEZA
TECH LEAD
smacias@opensourceintegrators.com
P.O. BOX 940, HIGLEY, AZ 85236

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
--Enric Tobella AlomarCEO & Founder
by Enric Tobella Alomar - 10:51 - 6 Sep 2023
-