Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with large amounts of dynamic data #1

Open
sotex opened this issue Apr 7, 2020 · 1 comment
Open

Performance issues with large amounts of dynamic data #1

sotex opened this issue Apr 7, 2020 · 1 comment

Comments

@sotex
Copy link

sotex commented Apr 7, 2020

This is really a great project. It is very convenient to use and the performance is very good.
But I'm having some problems with it and hope to get help here.

Question 1

I have a large amount of geojson object data in my program. In order to use jmespath for operation, I had to combine it into a large array.
Similar to the following code:

   // These geojson object data are located in a large map, which is dynamically added and deleted
   // std::map<std::string,jp::Json> mydata; 

   // When I need to perform jmespath operation
   std::vector<jp::Json> vec;
   vec.reserve( mydata.size() );
   for( auto& kvpair : mydata ) {
      vec.push_back(kvpair.second);
   }
   jp::Json data = {
            {"data",std::move(vec)}
        };
   jp::Expression expr = "avg(data[properties.area<`100`].properties.area)";  // Simple example, not fixed
  auto result = jp::search(expr, data);

I can change mydata directly to use jp::Json array object storage to avoid conversion every time.
However, I wonder if there is a better way?

Question 2

Because I have a large number of data, I test the filtering operation of 10000 objects, it takes about 0.34 s.But I have more than 200,000 objects.
test environment:

OS : Linux x-mini 5.3.0-45-generic
CPU : Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz x4
MEM : 8G ddr3 1333`
Compiler and compilation options: g++10.0 use -O2

I can use multi-threading for parallel filtering, but it will get multiple results, which requires secondary processing.
I want to know if there is any good way to do it without secondary processing? Tanks.

@sotex
Copy link
Author

sotex commented Apr 15, 2020

The problem I raised above has been solved by introducing shared_hash_map to define JSONType.

Details of the changes are here: Add shared map/vector implementation for performance optimization .

It's not an excellent scenario, there are aspects of thread security, etc. not considered, but it's working for now.

Thanks to @robertmrk for a great contribution, it's an excellent project worth learning from.
I see robertmrk's last activity time was May 15, 2019 with no subsequent activity, not sure if he switched to other platforms for selfless giving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant