Replies: 7 comments
-
Please provide the actual style used. |
Beta Was this translation helpful? Give feedback.
-
I attached the style as ZIP. However, I don't think it is actually that relevant. This is more of a generic question: is it efficient to delete large numbers of records in batches using IN statements, or is it potentially faster to process using the suggested SQL statement? I have never tested this in PostgreSQL, and don't know if there are potential differences, although the PostgreSQL Help suggests there may be differences depending on the used SQL (in relation to sub-select versus USING clause there though). |
Beta Was this translation helpful? Give feedback.
-
Ah, so you are not reporting an issue but want to open a discussion on use of SQL. Let me move the issue into the discussion section then. |
Beta Was this translation helpful? Give feedback.
-
Have you looked into what EXPLAIN says about the two forms of the query? That would be the first step in the investigation. You can just take one of the DELETEs you see and run them against the final database. |
Beta Was this translation helpful? Give feedback.
-
After downgrading to PostgreSQL 13.5 due to a bug in PG 14.2, I can't seem to reproduce this issue. I see it finish in about 15 minutes now + another 15 minutes for some indexing (Planet). This seems reasonable. Either there was an issue with PG 14.2, or simply some other (maintenance?) process of PostgreSQL causing the slow processing, although I can't remember seeing any activity like that displayed in pgAdmin when I examined the processing delay. Will keep a look out though for issues if I attempt to upgrade to PG 14 another time once 14.3 is released. |
Beta Was this translation helpful? Give feedback.
-
After upgrading a second time from PostgreSQL 13 to 14, this time to the just released PostgreSQL 14.3 that contains a fix for a crucial blocking bug that I experienced with 14.2, I again experience a significant slow down in the processing of the DELETE statement as shown in the original post #1642 (comment). As mentioned there, at that stage of the osm2pgsql processing, the only activity going on and initiated by osm2pgsql is these repititive DELETE statements with large IN clauses. As I have now experienced this issue a second time after upgrade from 13 to 14, I am strongly suspecting this is a performance regression issue in PostgreSQL 14. Does one of you have experienced similar issues yourself when upgrading from 13 to 14 and specifically at that stage of "There are X ways to reprocess..."? Or have you possibly heard of other people experiencing such issues after upgrade to PostgreSQL 14? I may try to put a question out on the PostGIS list to see if there any other people who experienced a performance regression on DELETE with large IN clauses after upgrade. By the way: Am I right these DELETE statements aren't PREPAREd? Could that possibly make a difference here? |
Beta Was this translation helpful? Give feedback.
-
Another interesting observation that I now see is that, despite pgAdmin clearly showing the indexing statement for the "osm_id" column used in the DELETE, that the actual index is missing on the "planet_osm_point/line/polygon" tables: there are no indexes visible in pgAdmin and DBeaver. The lack of indexes, could definitely explain the poor performance. But why does this fail in PostgreSQL 14, and not 13??: |
Beta Was this translation helpful? Give feedback.
-
What version of osm2pgsql are you using?
osm2pgsql version 1.6.0 (1.3.0-580-g061d4013), compiled from source.
What operating system and PostgreSQL/PostGIS version are you using?
Ubuntu 20.04.1 LTS, PostgreSQL 14.2, PostGIS 3.2.0.
Tell us something about your system
Bare metal Windows Hyper-V VM with 190 GB RAM assigned to VM, 2x Xeon E5-2680 v4, 14C/28T each, 4x2TB NVMe PCIe 3.0 Windows Storage Space on HP Z840 workstation.
What did you do exactly?
Imported Planet using a slightly modified version of the in-development flex version of 'openstreetmap-carto' Lua style file as available here:
https://github.com/gravitystorm/openstreetmap-carto/blob/e7af9bd90799103955f1d3996201ce0904be1665/openstreetmap-carto.lua
What did you expect to happen?
The style mentioned above has special handling for administrative boundaries, and creates a custom 'planet-osm-admin' line geometry table with de-duplicated lines representing administrative boundaries of the highest level using stage 2 flex processing.
The amount of data to be processed here is significant, as being administrative boundaries, but the total number of records to re-process limited, osm2pgsql reports:
"There are 2335433 ways to reprocess"
I would expect this to execute relatively fast considering the limited number of records of just over 2M.
What did happen instead?
The re-processing "appears" slow. Note that this is a completely arbitrary statement, as I don't know exactly what is going on in this stage (except for what it needs to create as output). I just notice it takes many hours of processing for a fairly limited amount of records (albeit being administrative boundaries, which can be huge).
What did you do to try analyzing the problem?
One thing I noticed, and this is actually why I opened this issue, is that at this stage the only thing I see happening on the database - at least from the limited perspective of what the pgAdmin interface shows - is that pgAdmin shows a cycle of DELETE statements, where the WHERE clause appears to show batches of records being deleted:
This makes me wonder if the actual issue is not so much the processing for stage 2 flex processing, or that the slowness might be caused by this DELETE operation going in batches over the records to delete using large WHERE clauses with lots of IN record references.
Wouldn't it possibly be much more efficient and faster to create a secondary table with the osm_id of the records to delete, and then use a
DELETE FROM <table1> USING <table2> WHERE <table1>.osm_id = <table2>.osm_id
type DELETE record statement with USING as also shown and documented on the PostgreSQL Help pages:
https://www.postgresql.org/docs/14/sql-delete.html
I know from my own experience in another application I wrote, that this seems quite efficient at deleting large numbers of records from an existing table. The PostgreSQL Help also seems to recommend it, although in reference to a comparison with a sub-select there:
Beta Was this translation helpful? Give feedback.
All reactions