Contributors

Re: Large Data Files

Nice to see pandas run fast.

For those for those that want run very very ... very fast consider to use polars

https://pola.rs

Regards

David BEAL

Akretion

Consultant ERP Odoo

Le mer. 21 août 2024 à 01:57, Graeme Gellatly <notifications@odoo-community.org> a écrit :

Queue job and batching can work. It is commonplace. But if it is CPU/Memory then honestly, after optimising what you can within the framework (e.g. as per Holger) a lot of the time you just get away with running a seperate worker on a separate port for long running jobs and set the limits/timeouts high. That is how a lot of people deploy cron workers these days and in older Odoo we used to have to do it to run financial reports and seemingly again now. 30,000 simple records is not so much.

There may also be some db tuning you can do around WAL files, checkpoints etc if they get in the way.

On Wed, Aug 21, 2024 at 9:57 AM Jerôme Dewandre <notifications@odoo-community.org> wrote:
Hello,

Thank you very much for your quick responses :)

Tom Blauwendraat: I am running on v16
Holger Brunn: adapting the script with .with_context(tracking_disable=True) to Disable email notification divides the running time by at least 4
Goran Sunjka: It is indeed an interesting idea, I was wondering if I could store a hash of the row in Postgres to check if an existing record was updated to separate "create" and "update" action

Daniel Reis: This is indeed the problem I encountered.

Thank you all for your replies, it helps a lot :)

Jérôme

On Tue, Aug 20, 2024 at 7:47 PM Daniel Reis <notifications@odoo-community.org> wrote:
I would expect this code to just abort for a non trivial quantity of records.
The reason why is that this is a single worker doing a single database transaction.
So the worker process will probably hit the time and CPU limits and be killed, and no records would be saved because of a transaction rollback.
And if you increase those limits a lot, you will probably cause long table locks on the database, and hurt other users and processes.

Going direct to the database can work if the data is pretty simple.
It can work but it can also be a can of worms.

One approach is to have an incremental approach to the data loading.
In the past I have used external ETL tools or scripts to do this.
Keeping it inside Odoo, one of the tools that can help is the Job Queue, possibly along with something like base_import_async:
https://github.com/OCA/queue/tree/16.0/base_import_async

Thanks

--

DANIEL REIS
MANAGING PARTNER
Meet with me.
M: +351 919 991 307
E: dreis@OpenSourceIntegrators.com
A: Avenida da República 3000, Estoril Office Center, 2649-517 Cascais

On 20/08/2024 16:32, Jerôme Dewandre wrote:

Hello,

I am currently working on a syncro with a legacy system (adesoft) containing a large amount of data that must be synchronized on a daily basis (such as meetings).

It seems everything starts getting slow when I import 30.000 records with the conventional "create()" method.

I suppose the ORM might be an issue here. Potential workaround:

1. Bypass the ORM to create a record with self.env.cr.execute (but if I want to delete them I will also need a custom query)

2. Bypass the ORM with stored procedures (https://www.postgresql.org/docs/current/sql-createprocedure.html)

3. Increase the CPU/RAM/Worker nodes

4. Some better ideas?

What would be the best way to go?

A piece of my current test (df is a pandas dataframe containing the new events):

@api.model
def create_events_from_df(self, df):
Event = self.env['event.event']
events_data = []
for _, row in df.iterrows():
event_data = {
'location': row['location'],
'name': row['name'],
'date_begin': row['date_begin'],
'date_end': row['date_end'],
}
events_data.append(event_data)

# Create all events in a single batch
Event.create(events_data)

Thanks in advance if you read this, and thanks again if you replied :)

Jérôme

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe

by David BEAL - 09:16 - 21 Aug 2024

Reference

Large Data Files

Hello,

I am currently working on a syncro with a legacy system (adesoft) containing a large amount of data that must be synchronized on a daily basis (such as meetings).

It seems everything starts getting slow when I import 30.000 records with the conventional "create()" method.

I suppose the ORM might be an issue here. Potential workaround:

1. Bypass the ORM to create a record with self.env.cr.execute (but if I want to delete them I will also need a custom query)
2. Bypass the ORM with stored procedures (https://www.postgresql.org/docs/current/sql-createprocedure.html)
3. Increase the CPU/RAM/Worker nodes
4. Some better ideas?

What would be the best way to go?

A piece of my current test (df is a pandas dataframe containing the new events):

@api.model
def create_events_from_df(self, df):
Event = self.env['event.event']
events_data = []
for _, row in df.iterrows():
event_data = {
'location': row['location'],
'name': row['name'],
'date_begin': row['date_begin'],
'date_end': row['date_end'],
}
events_data.append(event_data)

# Create all events in a single batch
Event.create(events_data)

Thanks in advance if you read this, and thanks again if you replied :)

Jérôme

by "Jerôme Dewandre" <jerome.dewandre.mail@gmail.com> - 05:31 - 20 Aug 2024

Archives

Contributors

Re: Large Data Files

Re: Large Data Files

Re: Large Data Files

Holger Brunn: adapting the script with .with_context(tracking_disable=True) to Disable email notification divides the running time by at least 4

Goran Sunjka: It is indeed an interesting idea, I was wondering if I could store a hash of the row in Postgres to check if an existing record was updated to separate "create" and "update" action

Daniel Reis: This is indeed the problem I encountered.

Reference

Large Data Files

Follow us