-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unload content of external tilesets to save memory usage #876
base: main
Are you sure you want to change the base?
Conversation
Maybe it helps for debugging:
The tileset is here, for convenience: externalsWithTransform 2024-05-11.zip It looks like this: It consists of three external tilesets, each containing a unit cube.
The core of the
This means that it will...
The output will be
Showing that the Now, one can probably more easily call some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look and have a few comments below. I noticed that in this branch, tile content never seems to be getting unloaded at all. The immediate cause is that in Tileset::_unloadCachedTiles
, the head of the _loadedTiles
linked list is nullptr. The tail isn't nullptr, however, which is an invalid and nonsensical state. So the linked list is getting corrupted, possibly by the attempt to clear the _externalTilesPendingClear
list by assignment (see below).
} | ||
|
||
// Clear list of pending external tiles | ||
this->_externalTilesPendingClear = Tile::LoadedLinkedList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't do what you want. In fact, we should probably disable the assignment operator on this class. Take a look at how this LoadedLinkedList works. It's an "intrusive" linked list, where the links between nodes are actually stored in the data (Tiles) themselves. This means that a tile can only be in one list at a time. And clearing the tiles from a list requires traversing the entire list and setting the previous and next pointers in each tile to nullptr.
// Make sure we unload all children | ||
gsl::span<Cesium3DTilesSelection::Tile> children = tile.getChildren(); | ||
for (Tile& child : children) { | ||
this->unloadTileContent(child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises some performance red flags, though I don't know exactly how to tell you to fix it. When we decide to unload a tile, we'll traverse all of its descendants in isReadyToUnload
to see if it can unloaded. And if it can, we'll then unload its children (here), which will involve calling isReadyToUnload
again on each child, which will again traverse the entire tree. A tile subtree can have thousands of tiles, and this is an n^2
operation in the number of tiles, so 😱.
// then they might get reloaded before we've had a chance to clear their | ||
// children and cause an error. They'll get their children cleared and their | ||
// state set to Unloaded before next clean up | ||
tile.setState(TileLoadState::Done); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is safe. Setting it to Done
will give the selection algorithm free reign to render this tile. Which will leads to holes. Perhaps Unloading
?
@@ -65,6 +68,7 @@ class TilesetContentManager | |||
void updateTileContent(Tile& tile, const TilesetOptions& tilesetOptions); | |||
|
|||
bool unloadTileContent(Tile& tile); | |||
bool handleUpsampledTileChildren(Tile& tile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use more descriptive names. "Handle" how?
@@ -145,6 +149,7 @@ class TilesetContentManager | |||
int32_t _tileLoadsInProgress; | |||
int32_t _loadedTilesCount; | |||
int64_t _tilesDataUsed; | |||
Tile::LoadedLinkedList& _loadedTiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dangerous. The TilesetContentManager
can and sometimes does outlive the Tileset
. So by having this reference here, we need to be careful that we'll never try to use it after the Tileset
has been destroyed.
After consulting with the rest of the team, we're going to put this PR on hold for now as it's taking an unexpectedly large amount of effort to get working. In its current state the code works fine with @javagl's example above, and does load and unload Google P3DT when tested with CesiumGS/cesium-unreal#1415, but currently only loads a limited amount of tiles closest to the camera. In theory, any tile that's being visited as part of building the list of tiles to render should never be unloaded, but for some reason this only applies to the closest tiles and not to ones farther away. This likely has something to do with |
I haven't beem following the details and evolution of |
// Don't clear the children of this tile next frame | ||
this->_externalTilesPendingClear.remove(&tile); | ||
if (tile.getState() == TileLoadState::Unloaded && | ||
!tile.getChildren().empty()) { | ||
// We were going to clear this tile's children, but it's still in use, so we | ||
// should restore it to Done instead. | ||
tile.setState(TileLoadState::Done); | ||
tile.setContentShouldContinueUpdating(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cause of the missing tiles. tile.getState() == TileLoadState::Unloaded && !tile.getChildren().empty()
will be true for a very large number of tiles before the first time they're loaded. Setting them to Done
will ensure they never get loaded. Do you need to check that it's also in _externalTilesPendingClear
maybe?
Let me try to describe the problem and possible solution, as I see it. Most of this probably isn't new to you at this point, but hopefully it helps to have it in one place. Ok, so first things first, what are we actually trying to do? We're concerned with tiles which are the root of an external tileset. In other words, tiles for which But what does it mean to unload the content for such a tile? Usually, when we talk about unloading tile content, we're talking about clearing its _content field. For renderable tiles, this contains the tile's whole glTF, including geometry and textures, so it's pretty big. But for a tile with external content, it's actually pretty tiny. That However, when we load an external tileset, the parsed tileset.json is turned into a tree of So to get started, we can simply allow (Note that we should never clear the The problem with this simple approach is that clearing the So then the question becomes: where might we hold Here are the places I know about. I'm not sure if it's exhaustive, but hopefully it's close:
Checking all of these things can be expensive, especially because we need to check every tile in an external tileset in order to determine if we can unload it. That can easily be thousands of tiles. So one idea is keep track of unload safety all along rather than computing it when we need it. Here's how I think I'd approach that (warning: this may not be a fully baked idea): Add Making sure these reference counts are prefectly balanced can be tricky and error prone, so a debug mode that tracks not just the count but also the source of the count (perhaps as a descriptive string) can be helpful. I haven't thought of the flaw with this plan yet (I'll update if I do), so hopefully it'll work well, and I don't think it'll be too difficult to implement, either. There are probably other reasonable solutions as well. |
What I've called |
I'm watching this, from a huge distance (from on an island in Indonesia, so to speak), and see quite some "reference counting" going on in various forms. And I'm wondering how much code (and how many bugs) could be avoided by just saying: Let's handle each (I'm just wondering. I know, it won't happen. So 🤞 that you can sort this out in another way. But don't forget to add comments accordingly, e.g. in |
Even putting aside costs, using a std::shared_ptr wouldn't work here. We can't destroy individual tiles, we only want to destroy entire subtrees loaded from external tilesets. So it is reference counting, but even though Tile stores the count, it's not really the individual Tile instances we're controlling the lifetime of. With std::shared_ptr, the thing counted and the thing whose lifetime is controlled are always one and the same. |
I don't understand external tilesets well, but if the problem is that the lifetime of the external tileset structure is different than the individual tiles in it, then it seems that a shared_ptr would work. You don't really care about the precise reference count, just whether or not any references to a tile are held. As an old Lisp guy, I get a bit jaded about performance and shared_ptr, putting complaints in the same category as complaints about garbage collection performance. If you need to track object lifetimes, then you need to pay for it somewhere. Was shared_ptr deemed to be too expensive at some point in Cesium Native's history? If so, I'd look at nearby solutions like std::make_shared and intrusive pointers before tossing the whole idea. |
Some of this may be too unrelated to this PR (and a too large topic in general) to be discussed in this PR. I just threw it in here, because of the difficulties of the whole "lifetime and ownership" question that seems to be at the core here, and the attempt to solve it with manual reference counting is bound to involve intricate handling of the reference count, and will require deeeep knowledge about what is happening where on which thread. (An aside: I also doubt that replacing manual reference counting with a A few abstraction levels higher: I think that the "lifetime and ownership" question that is hidden in the point about 'individual tiles vs. subtrees' is crucial. But I assume that the state space, state transitions, lifetimes, and ownerships will raise enough questions to be documented incrementally. One overly specific example is that I assume that (The question about "tile-vs-subtree" may, at the bottom, involve seemingly trivial aspects like the |
There is no external tileset structure. Well, there's TileExternalContent, but there are no Tiles in it (just metadata). It would be interesting to consider moving to a model where there's an explicit representation of an external tileset, and it owns all of the Tile instances within it. Perhaps But tackling that as part of this PR definitely risks turning a relatively focused (if slightly tricky and possibly error-prone) task into a major refactoring effort.
No, we use shared_ptr. But not for tiles. Not so much because of the cost of shared_ptr, but because of the cost of heap allocating every Tile individually. I'm aware of We also have |
cesium-native's rule is that Tile instances can only ever be accessed from the main thread (or the thread that created them if they haven't been passed to the main thread yet, i.e., during loading). So there's no tricky threading issue here, at least. A non-owning smart pointer class might be a nice elegant way to solve the "manual" part of the reference counting, though. On construction with a Tile*, it increments
Yeah and I agree with you on this point anyway. As I mentioned above, it's not the reference counting part that's costly, it's the memory allocation. I don't think I need to do a benchmark to be convinced that heap-allocating every Tile would be a mistake, though. In fact, I bet we'd see a small but measurable performance improvement if we could avoid heap-allocating (via a std::vector) each block of child Tiles, as we're currently doing. Avoiding unnecessary heap allocations is just good practice, particularly in code where performance matters. It's even in the C++ Core Guidelines. I'm not saying we have to obsess over it of course, sometimes dynamic allocation is the best solution.
ReferenceCounted is an optional, convenience base class that can be inherited from in order to easily enable a particular type for use with I could try to explain the tradeoffs of IntrusivePointer versus shared_ptr, but I'll just refer you to Boost's documentation on their intrusive pointer to start (not exactly the same, but same idea and same tradeoffs):
Yes, no question, we need to be clear about the lifetime rules. We've gotten away with this with Tiles until now because Tiles, once created, were never destroyed, other than when the entire Tileset was destroyed. |
That seems to be focussed on the usage side (i.e. what is supposed to be visible for clients). One of my concerns about manual reference counting referred to the guts of the implementation. I think that it could easily happen that someone is doing something on some thread and this affects the reference counter. This does not refer to something explicit and obvious like
that could easily spotted during a review. It rather referred to the "implicit" changes, in scenarios like
With very hazy memories about some details, but roughly looking at your list above, there may be something like
Which thread is calling 'upsample'? You probably know that. You could claim that "this sort of operation is always done by thread X". But it's probably not documented (and even less enforced) explicitly. Now, there has to be a matching (All this could probably be summarized with: "See The Boost documentation does not go into details about reasons for using But as a baseline: I doubt any performance claim that is made without a benchmark. And when there is a benchmark, then I doubt that benchmark 😈 . Quickly looking at the first, naive, shallow search results like https://stackoverflow.com/questions/14837829/intrusive-ptr-shared-ptr-performance-tests or https://groups.google.com/g/benchmark-discuss/c/uENrnU4J3AM :
(Whether or not One reason why I didn't look into all the details: Imagine there is a Proving or disproving this with a benchmark could be difficult: One can not just introduce some |
I'm talking about implementation, too. What you describe can't happen as long as we follow our rule of never passing a
Upsampling is done in a worker, but it does not require access to a The glTF, however, is made available to the upsample process, which is ugly and dangerous but necessary.
It lists three reasons:
shared_ptr is quite complicated, by necessity. Primarily because of two design decisions: the count being stored separately from the object being controlled, and the availability of weak_ptr. An intrusive pointer is simpler. That makes it smaller and faster.
You missed a critical detail. As a completely unsurprising (at least to me) point of reference, at one point I tried to switch the Tile loading code from its current approach, which reads the tile hierarchy from a tileset.json into a RapidJSON Document and then creates Cesium3DTilesSelection::Tile instances for each tile represented in the Document, to a slightly different approach that reads a Cesium3DTiles::Tileset and then creates Cesium3DTilesSelection::Tile instances from those. It took somewhere around twice as long to load the tile hierarchy, adding as much as a full second to the initial load time of a Tile-rich tileset like Melbourne Photogrammetry. Why? The JSON reader for Believe it or not, I'm not some performance zealot trying to make your life hard by imposing questionable rules without evidence. 😀 |
I assume that you did tests and comparisons (beyond what you already know), and the point about "many tiles being created at once (and sometimes only few of them being filled later)" is certainly a valid one: It sounds like the two factors here are "multiplicative": "Heap-vs-Stack" (with the first one being slower) and "Individual-vs-Bulk" (with the first one being slower). I'm lacking some context, but
I don't see a And an aside: I assume that this experiment was in the context (or at least related to) the consideration to replace the whole third-party-JSON parsing with the auto-generated In my naive, convenient Java world, there is no real "stack", and nothing that is not a "shared pointer", so to speak. And that may be a reason for me to (always keep performance in mind, but) lean towards the "safe and simple" approach, Often, time is wasted in unexpected places. For example, for that JSON parsing layer: I've heard rumors about ~20 MB |
Yep, that's the vector of tiles I'm referring to.
Yes, that would help significantly. Switching instead to a In theory it's possible to use a custom allocator to get
Yes that's right.
It's not that "manual JSON twiddling" is faster. It's that RapidJSON Document is obsessively focused on avoiding heap allocations. Basically, it's fast in all the ways the naive implementation is not. And that makes it win even though it requires some manual JSON twiddling.
Interestingly, one of the advantages of having a garbage collector - especially a compacting, generational one - is that heap allocations are much faster. |
The other "advantage" is that as a developer, you don't have to give a ... ... care about why this is the case 🙂 It's surprisingly hard to find concise, reliable information about the performance differences of heap-vs-stack in C++. (By "hard" I mean that there doesn't seem to be much in the first 10 search results. By "concise" I mean that there are things like stack overflow answers that talk about this, and why one is faster than the other, but not how much. And by "reliable" I mean that https://publicwork.wordpress.com/2019/06/27/stack-allocation-vs-heap-allocation-performance-benchmark/ is easy to find and concise, but too narrowly focused on a too artifical microbenchmark). One approach for avoiding the heap costs could be some sort of "pool" - that's probably what you referred to with..
And the trickiness obviously is: What can you do manually that is faster than what the |
Yep. An allocator for a single type of object is a bit simpler than a general one, so that helps. It's also possible to do the sort of tricks that RapidJSON does: https://rapidjson.org/md_doc_internals.html#MemoryPoolAllocator Actually now that I'm looking at the stackoverflow link you posted above, it actually kind of says what I just wrote above better than I did. By the way, you don't have to convince me that good garbage collectors are awesome. I completely agree. |
From a naive, high-level point, this now sounds like something that could actually be ... "less non-trivial" ... than many other forms of pooling: Storing the |
As described in #739, external tilesets are currently never unloaded by cesium-native. This results in memory accumulating as tiles are loaded, eventually causing the device to run out of memory. This change allows tiles loaded from external sources to be unloaded when they're no longer used, as long as all of their child tiles are also ready to unload.