Performance issues with large amounts of dynamic data #1

sotex · 2020-04-07T16:18:47Z

This is really a great project. It is very convenient to use and the performance is very good.
But I'm having some problems with it and hope to get help here.

Question 1

I have a large amount of geojson object data in my program. In order to use jmespath for operation, I had to combine it into a large array.
Similar to the following code:

   // These geojson object data are located in a large map, which is dynamically added and deleted
   // std::map<std::string,jp::Json> mydata; 

   // When I need to perform jmespath operation
   std::vector<jp::Json> vec;
   vec.reserve( mydata.size() );
   for( auto& kvpair : mydata ) {
      vec.push_back(kvpair.second);
   }
   jp::Json data = {
            {"data",std::move(vec)}
        };
   jp::Expression expr = "avg(data[properties.area<`100`].properties.area)";  // Simple example, not fixed
  auto result = jp::search(expr, data);

I can change mydata directly to use jp::Json array object storage to avoid conversion every time.
However, I wonder if there is a better way?

Question 2

Because I have a large number of data, I test the filtering operation of 10000 objects, it takes about 0.34 s.But I have more than 200,000 objects.
test environment:

OS : Linux x-mini 5.3.0-45-generic
CPU : Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz x4
MEM : 8G ddr3 1333`
Compiler and compilation options: g++10.0 use -O2

I can use multi-threading for parallel filtering, but it will get multiple results, which requires secondary processing.
I want to know if there is any good way to do it without secondary processing? Tanks.

The text was updated successfully, but these errors were encountered:

sotex · 2020-04-15T06:46:26Z

The problem I raised above has been solved by introducing shared_hash_map to define JSONType.

Details of the changes are here: Add shared map/vector implementation for performance optimization .

It's not an excellent scenario, there are aspects of thread security, etc. not considered, but it's working for now.

Thanks to @robertmrk for a great contribution, it's an excellent project worth learning from.
I see robertmrk's last activity time was May 15, 2019 with no subsequent activity, not sure if he switched to other platforms for selfless giving.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with large amounts of dynamic data #1

Performance issues with large amounts of dynamic data #1

sotex commented Apr 7, 2020 •

edited

Loading

sotex commented Apr 15, 2020 •

edited

Loading

Performance issues with large amounts of dynamic data #1

Performance issues with large amounts of dynamic data #1

Comments

sotex commented Apr 7, 2020 • edited Loading

Question 1

Question 2

sotex commented Apr 15, 2020 • edited Loading

sotex commented Apr 7, 2020 •

edited

Loading

sotex commented Apr 15, 2020 •

edited

Loading