Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistency between layer providers in geometry checker #58132

Closed
wants to merge 4 commits into from

Conversation

Djedouas
Copy link
Member

Geometry checker does not work properly with memory layers, so we change how the cache in the QgsFeaturePool object is managed, to refresh it more often.

Fixes #58113

@github-actions github-actions bot added this to the 3.40.0 milestone Jul 16, 2024
Copy link

github-actions bot commented Jul 16, 2024

🪟 Windows builds ready!

Windows builds of this PR are available for testing here. Debug symbols for this build are available here.

(Built from commit 542c10a)

@Djedouas Djedouas marked this pull request as draft July 17, 2024 09:48
@Djedouas Djedouas force-pushed the featurepool-fix-inconsistency branch from 5680f9a to e2f52c1 Compare July 17, 2024 16:57
@Djedouas
Copy link
Member Author

Hi @nyalldawson,

Thanks for your review. I also discussed with @troopa81 about this issue #58132 .

In fact, QgsFeaturePool exists for the geometry checker to have a feature cache, but memory layers don't need a feature cache as they are already in memory.

I force-pushed and propose another solution which is to bypass the cache management in QgsFeaturePool when the underlying layer is a memory layer.

I could have inherited QgsFeaturePool for a QgsMemoryLayerFeaturePool of some sort, but I would have needed to check for the layer type every time before creating a feature pool to choose the right class... and make a lot of methods virtuals.

Open to the discussion here, thanks a lot.

@Djedouas Djedouas force-pushed the featurepool-fix-inconsistency branch from e2f52c1 to d068fdd Compare July 17, 2024 17:03
@Djedouas Djedouas marked this pull request as ready for review July 17, 2024 17:03
@Djedouas Djedouas force-pushed the featurepool-fix-inconsistency branch 2 times, most recently from c75787d to b5de5b0 Compare July 18, 2024 12:35
@Djedouas
Copy link
Member Author

@lbartoletti

@nyalldawson
Copy link
Collaborator

I have to look further into this one, but I believe there's thread safety issues in this change. Please don't merge yet.

src/analysis/vector/geometry_checker/qgsfeaturepool.cpp Outdated Show resolved Hide resolved
// a memory layer is already a cache itself. retrieve feature from the feature source.
if ( mIsMemoryLayer )
{
if ( !mLayer->getFeatures( QgsFeatureRequest( id ) ).nextFeature( feature ) )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't thread safe -- eg. in the geometry checker plugin the feature pool is used from a background thread after constructing the source (which IS a thread safe approach, the feature source was created on the main thread), but we can't just swap this out to access the layer here because accessing the layer from a different thread is not thread safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see the difference between "being in another thread and accessing the feature source created on the main thread" and "being in another thread and accessing the layer created on the main thread".

It is complicated, today the geometry checker only works because updating the feature source does indeed update the underlying data (we discussed this in #58113)... which is not the case with a memory layer...

In fact, the question is "how do I get a feature that is up-to-date for a memory layer that lies in another thread?"

What are your thoughts about:

  • option 1: encapsulate this not-thread-safe statement in a lambda with QgsThreadingUtils::runOnMainThread()
  • option 2: have a check that the layer is in the correct thread and return false or handle an exception?
  • option 3: make getFeatures() for a memory layer thread-safe, with mutex etc. ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nyalldawson this issue is very important, because the PR for exposing all geometry checker algorithms to the processing toolbox depends on it (the code is ready, I could open the PR, but it relies on this commit at the moment).

Copy link
Collaborator

@nyalldawson nyalldawson Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Djedouas

Honestly, I don't particularly like any of those 3 options. Especially not 3. Don't do 3. Run from that one 🤣

I think trying to patch around this in the QgsFeaturePool class like this might be the wrong approach. My understanding of the issue is that the geometry checker is changing features in the layer, and then the feature cached in the pool is out of date with that. If so, then a couple of things jump to mind:

  1. Since the geometry checker has knowledge of the feature pool already, what about adding an explicit "set updated feature" method to the pool? There could be a QHash< QgsFeatureId, QgsFeature > to handle this, so when setUpdatedFeature is called then the feature is added to this hash. Then getFeature would first check if the feature is the updated feature hash, and if so, it just returns that feature instead of querying the feature source. Admittedly it's not the most elegant solution, but it would be a pragmatic fix for the problem which doesn't have any risk of deadlocks (as runOnMainThread would have), and doesn't involve messing with any of the fundamental, low-level threading design around the feature sources/vector layers (as adding the mutex to the memory provider would do).
  2. I think there's potentially a deeper issue at play here too 😱 . Are you updating features in the vector layer from a background thread? (eg in a processing algorithm's processAlgorithm function). If so, that's not thread safe, and should NOT be done. You instead need to queue up the changes for the layer, and then apply them ONLY in the postProcessAlgorithm step, which runs in the main thread after processAlgorithm finishes.

Copy link
Member Author

@Djedouas Djedouas Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nyalldawson thanks, see my thoughts below

About point 1.

Your point made me think of another way to fix the specific memory layer issue. You're right, we have a cache that is out of date with the real geometry, and the geometry checker gets the feature from the cache when fixing geometry checks. But I thought that, other than creating yet another sub-cache with up-do-date features, I could take a look at the method refreshing the cache.

So here is a new commit / new way to fix the problem: in refreshCache (which was using getFeature, and falling in the no-cache branch there, and getting the out-of-date geometry from the feature source) we can directly set the up-to-date feature and its geometry.

Everything ends working as before for non-memory layers.

For memory layers, the feature pool cache contains more up-to-date features than the underlying feature source.

About point 2.

In all incoming processing algorithms that will use the geometry checker and feature pool logic, a copy of the input layer is created within the processAlgorithm function, I don't use the multi-threading code of the current geometry checker plugin, and everything checking/modifying the layer is done in the processing thread, where the layer is from. We can discuss this further on in the dedicated PR I will submit.

@nyalldawson
Copy link
Collaborator

@Djedouas see some notes here -- #58132

@Djedouas Djedouas force-pushed the featurepool-fix-inconsistency branch 2 times, most recently from e398577 to 25f1c9c Compare August 1, 2024 12:36
Geometry checker cache does not work properly with memory layers.
refreshCache now directly adds the geometry in the feature pool cache.
@@ -174,7 +174,7 @@ class ANALYSIS_EXPORT QgsFeaturePool : public QgsFeatureSink SIP_ABSTRACT
* \note This method can safely be called from a different thread vs the object's creation thread or
* the original layer's thread.
*/
void refreshCache( const QgsFeature &feature );
void refreshCache( QgsFeature feature );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QgsSpatialIndex::addFeature doesn't have const qualifier on its input feature, so I made this change to pass by copy.

{
QgsReadWriteLocker locker( mCacheLock, QgsReadWriteLocker::Write );
mFeatureCache.remove( feature.id() );
mFeatureCache.insert( feature.id(), new QgsFeature( feature ) );
mIndex.deleteFeature( feature );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I'm actually unsure how this ever worked without crashes 🙃

QgsSpatialIndex requires that the ORIGINAL feature added to the index is passed. It's not operating on the feature ID alone, but rather the combination of the feature's bounding box + feature ID. So here we ask the spatial index to remove a feature which has a different bounding box from the spatial index, which will corrupt the index! 😱

::removeFeature works OK, because it's retrieving the original feature from the feature ID and then removing that from the index. We need to do similar here -- first retrieve the original feature, remove from the index, and THEN update the cache/spatial index.

But that's still fraught with potential traps -- eg the subclasses are updating the original feature first before calling refreshCache, so if the feature has been evicted from mFeatureCache already then we'll end up retrieving the UPDATED feature from the call to getFeature and the spatial index will still be corrupted.

Maybe we'll need to add a QHash< QgsFeatureId, QgsRectangle > mOriginalFeatureBounds here and use that with QgsSpatialIndex::deleteFeature( QgsFeatureId id, const QgsRectangle &bounds ) instead...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right... I think we are getting closer to the solution... 😆

We need to do similar here -- first retrieve the original feature, remove from the index, and THEN update the cache/spatial index.

I see 😄

Here is a new commit, I think we can do without an additional class member, as we can manage to get the original feature geometry each time we need to refresh the index and give it to ::refreshCache

Also adapted QgsVectorLayerFeaturePool::onGeometryChanged and its test, because the intent of was clearly to keep the cache up to date when an external geometry change occurs.

@@ -108,6 +108,9 @@ bool QgsVectorLayerFeaturePool::addFeatures( QgsFeatureList &features, QgsFeatur

void QgsVectorLayerFeaturePool::updateFeature( QgsFeature &feature )
{
QgsFeature origFeature;
getFeature( feature.id(), origFeature );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be getting confused here, but I think there's still a risk of index corruption in this scenario:

  1. Feature A is retrieved (version "1"), it is cached in the pool. Bounds are used in the spatial index.
  2. More features are fetched, cache fills up, feature A is evicted from cache
  3. Some external caller changes the geometry on feature A in the vector layer or data provider, now there's version 2 of feature A.
  4. A call to updateFeature for feature A is made. getFeature won't find it in the pool's cache (it was evicted), so it's retrieved from the layer. We get feature A version 2.
  5. Version 2 is passed to refreshCache, and the index is corrupted because we try to remove different bounds (version 2, whereas the cache is using v1)

I don't see any alternative but keeping a local map/hash of the feature bounds which were used in the index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I did not expect that the spatial index could evict a feature... It is an index, not strictly a cache, isn't it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't get evicted from the spatial index, but will from mFeatureCache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course 🙈 obviously.

Okay, what about "synchronizing" the cache and the index?

In fact, in the end of step 4. inside the index, we have both feature A version 2 and feature A version 1.

We could fix this: when getFeature doesn't find a cached feature, we can proactively delete any pre-existing corresponding feature in the index (see new commit).

If the feature was not present in the index, it does nearly nothing, otherwise it removes the possibly out-of-date feature from the index, to re-add the possibly same or up-to-date feature.

@@ -66,6 +66,9 @@ bool QgsFeaturePool::getFeature( QgsFeatureId id, QgsFeature &feature )
}
locker.changeMode( QgsReadWriteLocker::Write );
mFeatureCache.insert( id, new QgsFeature( feature ) );

//cleanup the index, in case the feature was evicted from the cache but is still in the index
mIndex.deleteFeature( id );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do what it appears to do 😱

  • there's no QgsSpatialIndex::deleteFeature( QgsFeatureId id ) method
  • The feature ID is triggering an implicit construction of QgsFeature via the QgsFeature( QgsFeatureID id ) constructor (which should possibly be marked explicit..., but don't touch that here!). So then QgsSpatialIndex::deleteFeature is being called with a QgsFeature which has NO geometry
  • Follow the logic in QgsSpatialIndex::deleteFeature -- it requires the passed feature geometry to exist and match the geometry of the feature currently present in the index.

I think you better add some tests triggering the cache eviction logic bug first before implementing a fix 😝

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG 😱 -- good catch!

Okay 🤔 , thinking back about your proposal about

Maybe we'll need to add a QHash< QgsFeatureId, QgsRectangle > mOriginalFeatureBounds here and use that with QgsSpatialIndex::deleteFeature( QgsFeatureId id, const QgsRectangle &bounds ) instead...

Could we use QgsGeometry QgsSpatialIndex::geometry( QgsFeatureId ) as our original feature bounds hash?

See new commit.


If this new commit causes other problems, I think that fixing the index corruption from happening is kind of another topic than the original goal of my PR... We have a consistent behavior between memory layers and other providers now, and this index issue was already present.

As instructive and interesting as it is, this prevents me from continuing my work on making geometry checker checks available into processing algorithms (incoming PRs).

Also, QgsFeaturePool classes being used only for and by the geometry checker, it would not be critical to address this issue later on. Having the geometry externally modified during the run of the geometry checker is highly improbable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still -1 to introducing further known issues into this class, when we know upfront that their around and will cause someone to lose a lot of time re-discovering this and fixing in future.

Why not just go with my original proposal of adding the bounding box hash to the feature pool? That's a couple of lines of code to add, and will fix the issue properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just go with my original proposal of adding the bounding box hash to the feature pool? That's a couple of lines of code to add, and will fix the issue properly.

Yes I was hoping we could use the spatial index ::geometry() method like such a hash, it seems to behave exactly like this (with FlagStoreFeatureGeometries set 😆 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but it would store whole geometry data rather than only small bounding boxes...

Ok let's go with the hash, I have the impression to triple the data with feature cache, index, bounding boxes, but I agree that it is necessary in the end (and I have learned a lot 👍🏼 thanks)

@Djedouas Djedouas force-pushed the featurepool-fix-inconsistency branch from 2912ca6 to 542c10a Compare August 29, 2024 08:57
@@ -66,7 +66,21 @@ bool QgsFeaturePool::getFeature( QgsFeatureId id, QgsFeature &feature )
}
locker.changeMode( QgsReadWriteLocker::Write );
mFeatureCache.insert( id, new QgsFeature( feature ) );
mIndex.addFeature( feature );

QgsGeometry indexGeom = mIndex.geometry( id );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index isn't created with the FlagStoreFeatureGeometries flag, so this will always be null.

Can you please add some tests before trying more solutions? Right now we're just fumbling in the dark 🤣

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index isn't created with the FlagStoreFeatureGeometries flag, so this will always be null.

The index creation here https://github.com/qgis/QGIS/blob/master/src/analysis/vector/geometry_checker/qgsfeaturepool.cpp#L82 has default flags which seems ON for FlagStoreFeatureGeometries https://github.com/qgis/QGIS/blob/master/src/core/qgsspatialindex.h#L77

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how qflags work 😂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...yes I was expecting that 😆 writing a test at the moment... ^^

@agiudiceandrea agiudiceandrea added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Sep 8, 2024
Copy link

The QGIS project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 14 days and is being automatically marked as "stale". If you think this pull request should be merged, please check

  • that all unit tests are passing

  • that all comments by reviewers have been addressed

  • that there is enough information for reviewers, in particular

    • link to any issues which this pull request fixes

    • add a description of workflows which this pull request fixes

    • add screenshots if applicable

  • that you have written unit tests where possible
    In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request.
    If there is no further activity on this pull request, it will be closed in a week.

@github-actions github-actions bot added the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Sep 23, 2024
@Djedouas Djedouas removed the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Sep 23, 2024
Copy link

github-actions bot commented Oct 8, 2024

The QGIS project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 14 days and is being automatically marked as "stale". If you think this pull request should be merged, please check

  • that all unit tests are passing

  • that all comments by reviewers have been addressed

  • that there is enough information for reviewers, in particular

    • link to any issues which this pull request fixes

    • add a description of workflows which this pull request fixes

    • add screenshots if applicable

  • that you have written unit tests where possible
    In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request.
    If there is no further activity on this pull request, it will be closed in a week.

@github-actions github-actions bot added the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Oct 8, 2024
@Djedouas Djedouas removed the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Oct 8, 2024
Copy link

The QGIS project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 14 days and is being automatically marked as "stale". If you think this pull request should be merged, please check

  • that all unit tests are passing

  • that all comments by reviewers have been addressed

  • that there is enough information for reviewers, in particular

    • link to any issues which this pull request fixes

    • add a description of workflows which this pull request fixes

    • add screenshots if applicable

  • that you have written unit tests where possible
    In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request.
    If there is no further activity on this pull request, it will be closed in a week.

@github-actions github-actions bot added the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Oct 23, 2024
@Djedouas Djedouas removed the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Oct 23, 2024
@Djedouas
Copy link
Member Author

Djedouas commented Nov 4, 2024

I prefered to create a new PR to facilitate discussion, as I followed @nyalldawson advice to use a QHash<QgsFeatureId, QgsFeature> mUpdatedFeatures hash in the class.

@Djedouas Djedouas closed this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QgsFeatureSource not synchronised with QgsVectorLayer for memory provider
3 participants