From f1093b3f428d97fa45324d934d872550d359302d Mon Sep 17 00:00:00 2001 From: Robert Stoll Date: Thu, 12 Nov 2020 09:31:45 +0100 Subject: [PATCH] another format fix --- docs/components.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/components.md b/docs/components.md index 2a7b876d..a4d0c1c3 100644 --- a/docs/components.md +++ b/docs/components.md @@ -14,7 +14,9 @@ Data importation (see “data importer” in figure below) into the data structu 3. Provide missing metadata (e.g. city = ‘Geneva’ or water_quality=’excellent’) 4. Set the authority level of the datasource (Zurich OGD is of higher authority than Wikidata), relevant for the merging process 5. Provide information on the estimated accuracy of the fountain coordinates (e.g. +/- 1 m) -##Data exporting/merging + +## Data exporting/merging + The data served to the web app must meet certain quality standards (no duplicates, certain fields required). The data export step polishes the data quality and formats the data as a json for the web app: 1. Merge duplicates: - The rows of the data structure are grouped by similarity of location and given name. For the location, a distance threshold can be defined. For the comparison of names, many algorithms are available: Hamming distance, Levenshtein distance, Damerau–Levenshtein distance, Jaro–Winkler distance. A smart combination of the two distances must be designed (e.g. if the name matches perfectly, then the location doesn’t matter as much). It would be clever to normalize the geometric distance with the estimated accuracy of the coordinates. Warning: two empty names must have a non-zero distance.