Skip to content

Commit

Permalink
Merge pull request #516 from GIScience/fix-streaming-endpoint
Browse files Browse the repository at this point in the history
Fix performance degradation in ignite streaming endpoint
  • Loading branch information
tyrasd authored Sep 29, 2023
2 parents e2ab9ae + 1f9a759 commit 6e5cab8
Show file tree
Hide file tree
Showing 21 changed files with 100 additions and 76 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ Add any other context about the problem here.
Please complete the following information:
- OS: [e.g. Ubuntu 20.04 LTS]
- Java Version: [e.g. openjdk version "11.0.9.1"]
- OSHDB Version: [e.g. 1.2.0]
- OSHDB Version: [e.g. 1.2.1]
- Maven version: [e.g. 3.6.3]
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@ Changelog
## 1.3.0-SNAPSHOT (current master)


## 1.2.1

* Fix performance degradation in the streaming endpoints when running on Ignite using the `AFFINITY_CALL` backend ([#516])

[#516]: https://github.com/GIScience/oshdb/pull/516


## 1.2.0

### new features
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cff-version: 1.2.0
cff-version: 1.2.1
message: "If you use this software, please cite it as below."
authors:
- family-names: "Raifer"
Expand All @@ -12,7 +12,7 @@ authors:
- family-names: "Schott"
given-names: "Moritz"
title: "OSHDB - OpenStreetMap History Data Analysis"
version: 1.2.0
version: 1.2.1
doi: 10.5281/zenodo.4146990
date-released: 2021-07-22
url: "https://github.com/GIScience/oshdb"
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,14 @@ The API is based on the MapReduce programming model and offers powerful methods
Installation
------------

The OSHDB is available as a pre-compiled maven library and can be incorporated easily in any maven project. If you're starting a new project, take a look at how your IDE handles maven projects (for example, here you find instructions how to create a new maven project using [IntelliJ](https://www.jetbrains.com/help/idea/maven-support.html#maven_create_project)). Our [first steps tutorial](https://github.com/GIScience/oshdb/tree/1.2.0/documentation/first-steps#2-add-maven-dependency) includes further information about how to add the OSHDB as a maven dependency to your projects.
The OSHDB is available as a pre-compiled maven library and can be incorporated easily in any maven project. If you're starting a new project, take a look at how your IDE handles maven projects (for example, here you find instructions how to create a new maven project using [IntelliJ](https://www.jetbrains.com/help/idea/maven-support.html#maven_create_project)). Our [first steps tutorial](https://github.com/GIScience/oshdb/tree/1.2.1/documentation/first-steps#2-add-maven-dependency) includes further information about how to add the OSHDB as a maven dependency to your projects.

Documentation
-------------

* [first steps tutorial](documentation/first-steps/README.md)
* [User Manual](documentation/manual/README.md)
* [OSHDB Javadoc](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/)
* [OSHDB Javadoc](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/)

Examples
--------
Expand Down
2 changes: 1 addition & 1 deletion documentation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ Here you find OSHDB related documentation material:
Explains the design of the OSHDB data model and shows the different features of the OSHDB API and how they can be used to efficiently query the OSM history data.
* [Examples](https://gitlab.gistools.geog.uni-heidelberg.de/giscience/big-data/ohsome/oshdb-examples) <br>
Contains some example code for how to use the OSHDB to analyze the OSM history data.
* [OSHDB Javadoc](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/) <br>
* [OSHDB Javadoc](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/) <br>
This lists all methods offered by the various OSHDB modules, packages and classes.
4 changes: 2 additions & 2 deletions documentation/first-steps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ If you already have an existing Java maven project, the OSHDB-API can be added t
<dependency>
<groupId>org.heigit.ohsome</groupId>
<artifactId>oshdb-api</artifactId>
<version>1.2.0</version>
<version>1.2.1</version>
</dependency>
```

Expand Down Expand Up @@ -80,7 +80,7 @@ In our example, we only want to look at OSM way objects which have the `building
.filter("type:way and building=*")
```

There are a variety of available filter selectors which can be combined into a [filter](https://github.com/GIScience/oshdb/tree/1.2.0/documentation/first-steps) string: each one specifies a property which OSM objects can have. These selectors can be combined into a filter string using boolean operators and parentheses. If multiple `filter`s are set, the result will contain only the OSM objects which match all given filters.
There are a variety of available filter selectors which can be combined into a [filter](https://github.com/GIScience/oshdb/tree/1.2.1/documentation/first-steps) string: each one specifies a property which OSM objects can have. These selectors can be combined into a filter string using boolean operators and parentheses. If multiple `filter`s are set, the result will contain only the OSM objects which match all given filters.

## 7. Calculating intermediate results

Expand Down
2 changes: 1 addition & 1 deletion documentation/first-steps/example-pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<dependency>
<groupId>org.heigit.ohsome</groupId>
<artifactId>oshdb-api</artifactId>
<version>1.2.0</version>
<version>1.2.1</version>
</dependency>
</dependencies>

Expand Down
12 changes: 6 additions & 6 deletions documentation/manual/aggregation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ Often, when querying OSM history data one is interested in getting multiple resu

The OSHDB API provides a flexible and powerful way to produce aggregated results that are calculated for arbitrary subsets of the data. This `aggregateBy` functionality also supports the combination of multiple such grouping functions chained after each other.

When executing any of the below listed aggregateBy methods, the query's MapReducer is transformed into a [`MapAggregator`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapAggregator.html) object which is (mostly) functionally equivalent to a MapReducer, with the difference that instead of returning single result values when calling any [reduce](map-reduce.md#reduce) method, an associative list of multiple values is returned instead: The result contains one entry for each requested grouping.
When executing any of the below listed aggregateBy methods, the query's MapReducer is transformed into a [`MapAggregator`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapAggregator.html) object which is (mostly) functionally equivalent to a MapReducer, with the difference that instead of returning single result values when calling any [reduce](map-reduce.md#reduce) method, an associative list of multiple values is returned instead: The result contains one entry for each requested grouping.

aggregateBy
-----------

This is the most generic grouping method, that allows to produce aggregated results that refer to arbitrary subsets of the input data. The [`aggregateBy`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateBy(org.heigit.ohsome.oshdb.util.function.SerializableFunction)) method accepts a function that must return an “index” value by which the respective result should be grouped by. For example, when one wants to group results by OSM type, the aggregateBy method should simply return the OSM type value, as in the following example using the OSHDB snapshot view:
This is the most generic grouping method, that allows to produce aggregated results that refer to arbitrary subsets of the input data. The [`aggregateBy`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateBy(org.heigit.ohsome.oshdb.util.function.SerializableFunction)) method accepts a function that must return an “index” value by which the respective result should be grouped by. For example, when one wants to group results by OSM type, the aggregateBy method should simply return the OSM type value, as in the following example using the OSHDB snapshot view:

```java
Map<OSMType, Integer> countBuildingsByType = OSMEntitySnapshotView.on(…)
Expand All @@ -21,7 +21,7 @@ Map<OSMType, Integer> countBuildingsByType = OSMEntitySnapshotView.on(…)
.count();
```

Optionally, the [`aggregateBy`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateBy(org.heigit.ohsome.oshdb.util.function.SerializableFunction,java.util.Collection)) method allows to specify a collection of groups which are expected to be present in the result. If for a particular group, no matching OSM entities are found in the query, the result will then still contain this key, filled with a “zero” value (e.g. `[]` for a set).
Optionally, the [`aggregateBy`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateBy(org.heigit.ohsome.oshdb.util.function.SerializableFunction,java.util.Collection)) method allows to specify a collection of groups which are expected to be present in the result. If for a particular group, no matching OSM entities are found in the query, the result will then still contain this key, filled with a “zero” value (e.g. `[]` for a set).

> For example, if the count reducer is used in a query, the result contains `0` integer values in entries for which no results were found. If instead the collect reduce method is used, empty lists are used to fill no-data entries.
Expand All @@ -40,12 +40,12 @@ This is a specialized method for grouping results by timestamps. Depending on th
> For example, when in a query the following three timestamps are set: `2014-01-01`, `2015-01-01` and `2016-01-01`, then a contribution happening at `2015-03-14` will be associated to the time interval between `2015-01-01` and `2016-01-01` (which is represented in the output as the starting time of the interval: `2015-01-01`).

There are two variants that allow this grouping by a timestamp: [`aggregateByTimestamp`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateByTimestamp()) tries to automatically fetch the timestamps from the queried data (i.e. the snapshot, or the contribution objects), while the second variant of [`aggregateByTimestamp`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateByTimestamp(org.heigit.ohsome.oshdb.util.function.SerializableFunction)) takes a callback function that returns an arbitrary timestamp value. The second variant has to be used in some cases where the automatic matching of objects to its timestamps isn't possible, for example when using the [groupByEntity](views.md#groupbyentity) option in a query, or when using multiple [aggregateBy](#combining-multiple-aggregateby)s in a query.
There are two variants that allow this grouping by a timestamp: [`aggregateByTimestamp`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateByTimestamp()) tries to automatically fetch the timestamps from the queried data (i.e. the snapshot, or the contribution objects), while the second variant of [`aggregateByTimestamp`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateByTimestamp(org.heigit.ohsome.oshdb.util.function.SerializableFunction)) takes a callback function that returns an arbitrary timestamp value. The second variant has to be used in some cases where the automatic matching of objects to its timestamps isn't possible, for example when using the [groupByEntity](views.md#groupbyentity) option in a query, or when using multiple [aggregateBy](#combining-multiple-aggregateby)s in a query.

aggregateByGeometry
-------------------

Calculating results for multiple sub-regions of an area of interest at once is possible through [`aggregateByGeometry`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateByGeometry(java.util.Map)). It accepts an associative list of polygonal geometries with corresponding index values. The result will then use these index values to represent the individual sub-region results.
Calculating results for multiple sub-regions of an area of interest at once is possible through [`aggregateByGeometry`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#aggregateByGeometry(java.util.Map)). It accepts an associative list of polygonal geometries with corresponding index values. The result will then use these index values to represent the individual sub-region results.

When using the aggregateByGeometry functionality, any OSM entity geometry that is contained in multiple sub-regions will be split and clipped to the respective geometries.

Expand All @@ -54,7 +54,7 @@ The given grouping geometries are allowed to overlap each other, but they should
combining multiple aggregateBy
------------------------------

When writing an OSHDB query, it is possible to perform multiple of the above mentioned aggregateBy operations. For example, it is possible to write a query that returns results that are aggregated by timestamps and by OSM type. In this case, the final result will contain one entry for each possible combination of the specified groupings. These combined indices are encoded as [`OSHDBCombinedIndex`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/generic/OSHDBCombinedIndex.html) objects in the final result map.
When writing an OSHDB query, it is possible to perform multiple of the above mentioned aggregateBy operations. For example, it is possible to write a query that returns results that are aggregated by timestamps and by OSM type. In this case, the final result will contain one entry for each possible combination of the specified groupings. These combined indices are encoded as [`OSHDBCombinedIndex`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/generic/OSHDBCombinedIndex.html) objects in the final result map.

```java
Map<OSHDBCombinedIndex<OSHDBTimestamp, OSMType>, Integer> countBuildingsByTimeAndType = OSMEntitySnapshotView.on(…)
Expand Down
10 changes: 5 additions & 5 deletions documentation/manual/database-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,27 @@ Database backends can implement different algorithms that control how a query is
OSHDBJdbc / OSHDBH2
-------------------

The [`ODHSBJDBC`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/db/OSHDBJdbc.html) backend is often used in the `OSHDBH2` variant, which expects data to be stored in a single H2 database file. A few example OSHDB extracts in the H2 format are available as download from [downloads.ohsome.org](https://downloads.ohsome.org/OSHDB/v1.0/).
The [`ODHSBJDBC`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/db/OSHDBJdbc.html) backend is often used in the `OSHDBH2` variant, which expects data to be stored in a single H2 database file. A few example OSHDB extracts in the H2 format are available as download from [downloads.ohsome.org](https://downloads.ohsome.org/OSHDB/v1.0/).

Alternatively, the OSHDB data can also be stored in any JDBC compatible database (e.g. a [PostgreSQL](https://www.postgresql.org/) database). The OSHDB data is however always processed and analyzed locally on the machine from which the OSHDB query is started. It is therefore advisable to keep the OSHDB data as local as possible in order to minimize network traffic when using the OSHDBJdbc backend.

OSHDBIgnite
-----------

The [`OSHDBIgnite`](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/db/OSHDBIgnite.html) backend executes computations on a distributed cluster of computers running the [Apache Ignite](https://ignite.apache.org/) big data platform. Each of the computers of the cluster only holds a subset of the global OSHDB data set and can therefore execute its part of an OSHDB query more quickly than a single computer having to process the whole data set.
The [`OSHDBIgnite`](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/db/OSHDBIgnite.html) backend executes computations on a distributed cluster of computers running the [Apache Ignite](https://ignite.apache.org/) big data platform. Each of the computers of the cluster only holds a subset of the global OSHDB data set and can therefore execute its part of an OSHDB query more quickly than a single computer having to process the whole data set.

There are currently three different [compute modes](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/db/OSHDBIgnite.html#computeMode()) available in the OSHDBIgnite backend:
There are currently three different [compute modes](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/db/OSHDBIgnite.html#computeMode()) available in the OSHDBIgnite backend:

* *LOCAL_PEEK* - (default) is optimized for small to mid scale queries.
* *SCAN_QUERY* - works better for large scale (e.g. global) analysis queries.
* *AFFINITY_CALL* - is generally slower than the other two compute modes, but supports [streaming](https://docs.ohsome.org/java/oshdb/1.2.0/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#stream()) of results.
* *AFFINITY_CALL* - is generally slower than the other two compute modes, but supports [streaming](https://docs.ohsome.org/java/oshdb/1.2.1/aggregated/org/heigit/ohsome/oshdb/api/mapreducer/MapReducer.html#stream()) of results.

In order to use the OSHDB Ignite backend, it is necessary to add the maven module `oshdb-api-ignite` to your project's maven dependencies:

```xml
<dependency>
<groupId>org.heigit.ohsome</groupId>
<artifactId>oshdb-api-ignite</artifactId>
<version>1.2.0</version>
<version>1.2.1</version>
</dependency>
```
Loading

0 comments on commit 6e5cab8

Please sign in to comment.