Skip to content

Commit

Permalink
another format fix
Browse files Browse the repository at this point in the history
  • Loading branch information
robstoll authored Nov 12, 2020
1 parent a24771c commit f1093b3
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion docs/components.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ Data importation (see “data importer” in figure below) into the data structu
3. Provide missing metadata (e.g. city = ‘Geneva’ or water_quality=’excellent’)
4. Set the authority level of the datasource (Zurich OGD is of higher authority than Wikidata), relevant for the merging process
5. Provide information on the estimated accuracy of the fountain coordinates (e.g. +/- 1 m)
##Data exporting/merging

## Data exporting/merging

The data served to the web app must meet certain quality standards (no duplicates, certain fields required). The data export step polishes the data quality and formats the data as a json for the web app:
1. Merge duplicates:
- The rows of the data structure are grouped by similarity of location and given name. For the location, a distance threshold can be defined. For the comparison of names, many algorithms are available: Hamming distance, Levenshtein distance, Damerau–Levenshtein distance, Jaro–Winkler distance. A smart combination of the two distances must be designed (e.g. if the name matches perfectly, then the location doesn’t matter as much). It would be clever to normalize the geometric distance with the estimated accuracy of the coordinates. Warning: two empty names must have a non-zero distance.
Expand Down

0 comments on commit f1093b3

Please sign in to comment.