- Mailing Lists
- Contributors
- Re: Large Data Files
Archives
- By thread 1419
-
By date
- August 2019 59
- September 2019 118
- October 2019 165
- November 2019 97
- December 2019 35
- January 2020 58
- February 2020 204
- March 2020 121
- April 2020 172
- May 2020 50
- June 2020 158
- July 2020 85
- August 2020 94
- September 2020 193
- October 2020 277
- November 2020 100
- December 2020 159
- January 2021 38
- February 2021 87
- March 2021 146
- April 2021 73
- May 2021 90
- June 2021 86
- July 2021 123
- August 2021 50
- September 2021 68
- October 2021 66
- November 2021 74
- December 2021 75
- January 2022 98
- February 2022 77
- March 2022 68
- April 2022 31
- May 2022 59
- June 2022 87
- July 2022 141
- August 2022 38
- September 2022 73
- October 2022 152
- November 2022 39
- December 2022 50
- January 2023 93
- February 2023 49
- March 2023 106
- April 2023 47
- May 2023 69
- June 2023 92
- July 2023 64
- August 2023 103
- September 2023 91
- October 2023 101
- November 2023 94
- December 2023 46
- January 2024 75
- February 2024 79
- March 2024 104
- April 2024 63
- May 2024 40
- June 2024 160
- July 2024 80
- August 2024 70
- September 2024 62
- October 2024 121
- November 2024 117
- December 2024 89
- January 2025 59
- February 2025 104
- March 2025 96
- April 2025 107
- May 2025 52
- June 2025 72
- July 2025 60
- August 2025 81
- September 2025 124
- October 2025 63
- November 2025 22
Contributors
Re: Large Data Files
Queue job and batching can work. It is commonplace. But if it is CPU/Memory then honestly, after optimising what you can within the framework (e.g. as per Holger) a lot of the time you just get away with running a seperate worker on a separate port for long running jobs and set the limits/timeouts high. That is how a lot of people deploy cron workers these days and in older Odoo we used to have to do it to run financial reports and seemingly again now. 30,000 simple records is not so much.There may also be some db tuning you can do around WAL files, checkpoints etc if they get in the way.On Wed, Aug 21, 2024 at 9:57 AM Jerôme Dewandre <notifications@odoo-community.org> wrote:Hello,Thank you very much for your quick responses :)
Tom Blauwendraat: I am running on v16Holger Brunn: adapting the script with .with_context(tracking_disable=True) to Disable email notification divides the running time by at least 4
Goran Sunjka: It is indeed an interesting idea, I was wondering if I could store a hash of the row in Postgres to check if an existing record was updated to separate "create" and "update" action
Daniel Reis: This is indeed the problem I encountered.
Thank you all for your replies, it helps a lot :)JérômeOn Tue, Aug 20, 2024 at 7:47 PM Daniel Reis <notifications@odoo-community.org> wrote:I would expect this code to just abort for a non trivial quantity of records.
The reason why is that this is a single worker doing a single database transaction.
So the worker process will probably hit the time and CPU limits and be killed, and no records would be saved because of a transaction rollback.
And if you increase those limits a lot, you will probably cause long table locks on the database, and hurt other users and processes.
Going direct to the database can work if the data is pretty simple.
It can work but it can also be a can of worms.
One approach is to have an incremental approach to the data loading.
In the past I have used external ETL tools or scripts to do this.
Keeping it inside Odoo, one of the tools that can help is the Job Queue, possibly along with something like base_import_async:
https://github.com/OCA/queue/tree/16.0/base_import_async
Thanks
--
DANIEL REIS
MANAGING PARTNERMeet with me.
M: +351 919 991 307
E: dreis@OpenSourceIntegrators.com
A: Avenida da República 3000, Estoril Office Center, 2649-517 Cascais
On 20/08/2024 16:32, Jerôme Dewandre wrote:
Hello,
I am currently working on a syncro with a legacy system (adesoft) containing a large amount of data that must be synchronized on a daily basis (such as meetings).
It seems everything starts getting slow when I import 30.000 records with the conventional "create()" method.
I suppose the ORM might be an issue here. Potential workaround:
1. Bypass the ORM to create a record with self.env.cr.execute (but if I want to delete them I will also need a custom query)2. Bypass the ORM with stored procedures (https://www.postgresql.org/docs/current/sql-createprocedure.html)3. Increase the CPU/RAM/Worker nodes4. Some better ideas?
What would be the best way to go?
A piece of my current test (df is a pandas dataframe containing the new events):
@api.modeldef create_events_from_df(self, df):Event = self.env['event.event']events_data = []for _, row in df.iterrows():event_data = {'location': row['location'],'name': row['name'],'date_begin': row['date_begin'],'date_end': row['date_end'],}events_data.append(event_data)# Create all events in a single batchEvent.create(events_data)
Thanks in advance if you read this, and thanks again if you replied :)
Jérôme_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe
by David BEAL - 09:16 - 21 Aug 2024
Reference
-
Large Data Files
Hello,I am currently working on a syncro with a legacy system (adesoft) containing a large amount of data that must be synchronized on a daily basis (such as meetings).It seems everything starts getting slow when I import 30.000 records with the conventional "create()" method.I suppose the ORM might be an issue here. Potential workaround:1. Bypass the ORM to create a record with self.env.cr.execute (but if I want to delete them I will also need a custom query)2. Bypass the ORM with stored procedures (https://www.postgresql.org/docs/current/sql-createprocedure.html)3. Increase the CPU/RAM/Worker nodes4. Some better ideas?What would be the best way to go?A piece of my current test (df is a pandas dataframe containing the new events):@api.modeldef create_events_from_df(self, df):Event = self.env['event.event']events_data = []for _, row in df.iterrows():event_data = {'location': row['location'],'name': row['name'],'date_begin': row['date_begin'],'date_end': row['date_end'],}events_data.append(event_data)# Create all events in a single batchEvent.create(events_data)Thanks in advance if you read this, and thanks again if you replied :)Jérôme
by "Jerôme Dewandre" <jerome.dewandre.mail@gmail.com> - 05:31 - 20 Aug 2024