Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cities.json generation - Remove canton from cities' names #51

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kira0269
Copy link
Contributor

Many of cities from the cities.json end with a canton abbreviation, which is not always the same as the indicated city's canton.
Also, it improves the search by zipcode and city name.

This fix removes all potential canton abbreviation from cities names.

@kira0269 kira0269 force-pushed the feature/fix-cities-dataset branch from d4b5aff to e43a658 Compare June 21, 2024 12:33
@stefanzweifel
Copy link
Owner

Thanks for the PR.
I'm not sure what exact problem this solves. In my eyes it currently just introduces new errors.

For example, there are 4 cities called "Buchs" in Switzerland. One in AG, one in LU, one in SG and one in ZH.
Wouldn't removing the canton abbreviation from the name cause more issues.

Could you maybe provide a code example where you run into an issue?
Maybe we need to update the "search by name" by doing a str_contains instead of searching an exact match?
https://github.com/stefanzweifel/php-swiss-cantons/blob/main/src/CantonManager.php#L99

@kira0269
Copy link
Contributor Author

Do you aggree that Buchs is the real name of the city, and not Buchs LU or Buchs ZH ?
Moreover, I removed the cantons for the case below:
image

@kira0269
Copy link
Contributor Author

Oh ! I found that Biel-Benken real canton is Bâle... So event this record is wrong :-/
It looks like this dataset is really weird, I don't know where we could find a better one..

@kira0269 kira0269 marked this pull request as draft June 21, 2024 13:47
@stefanzweifel
Copy link
Owner

stefanzweifel commented Jun 21, 2024

Yeah, just wanted to mention that. :)

In this specific example with "Biel-Benken" the dataset by geo.admin.ch seems wrong.
The city exists twice in the CSV.

Screenshot 2024-06-21 at 15 51 31

Maybe we can somehow get to a dataset by the Swiss Post. That's ususally very up to date; but the Swiss Post seems to have taken down the public site.


As you can see, not all data set are equal.
At work, we built our own API endpoints that returns cities for the given zipcode or city name. We actually have 2 different API endpoints built with different datasets as their source, as results differ for what they are used. 🙃

For example here are results for cities that are used when calculating insurance premiums.

This is the city where I live in. I don't know why the datasource has "Rümlang" in here, as this is a neighbouring city that has nothing to do with my city.

{
    "data":
    [
        {
            "name": "Glattbrugg",
            "zipcode": 8152,
            "admin_area_level_1": "ZH",
            "admin_area_level_2": "Bülach",
            "admin_area_level_3": "Opfikon",
            "federal_statistical_office_number": 66,
            "alternate_names":
            []
        },
        {
            "name": "Glattbrugg",
            "zipcode": 8152,
            "admin_area_level_1": "ZH",
            "admin_area_level_2": "Zürich",
            "admin_area_level_3": "Zürich",
            "federal_statistical_office_number": 261,
            "alternate_names":
            []
        },
        {
            "name": "Glattbrugg",
            "zipcode": 8152,
            "admin_area_level_1": "ZH",
            "admin_area_level_2": "Dielsdorf",
            "admin_area_level_3": "Rümlang",
            "federal_statistical_office_number": 97,
            "alternate_names":
            []
        },
        {
            "name": "Glattpark (Opfikon)",
            "zipcode": 8152,
            "admin_area_level_1": "ZH",
            "admin_area_level_2": "Bülach",
            "admin_area_level_3": "Opfikon",
            "federal_statistical_office_number": 66,
            "alternate_names":
            []
        },
        {
            "name": "Opfikon",
            "zipcode": 8152,
            "admin_area_level_1": "ZH",
            "admin_area_level_2": "Bülach",
            "admin_area_level_3": "Opfikon",
            "federal_statistical_office_number": 66,
            "alternate_names":
            []
        }
    ]
}

Or here are the results for "Buchs".
Some cities even have alternate names for the different languages spoken there or because they have merged with other cities in the past.

{
    "data":
    [
        {
            "name": "Buchs AG",
            "zipcode": 5033,
            "admin_area_level_1": "AG",
            "admin_area_level_2": "Aarau",
            "admin_area_level_3": "Buchs (AG)",
            "federal_statistical_office_number": 4003,
            "alternate_names":
            []
        },
        {
            "name": "Buchs LU",
            "zipcode": 6211,
            "admin_area_level_1": "LU",
            "admin_area_level_2": "Willisau",
            "admin_area_level_3": "Dagmersellen",
            "federal_statistical_office_number": 1125,
            "alternate_names":
            []
        },
        {
            "name": "Buchs SG",
            "zipcode": 9470,
            "admin_area_level_1": "SG",
            "admin_area_level_2": "Werdenberg",
            "admin_area_level_3": "Buchs (SG)",
            "federal_statistical_office_number": 3271,
            "alternate_names":
            [
                {
                    "name": "Burgerau",
                    "name_short": "Burgerau",
                    "zipcode": 9470
                },
                {
                    "name": "Räfis",
                    "name_short": "Räfis",
                    "zipcode": 9470
                }
            ]
        },
        {
            "name": "Buchs SG",
            "zipcode": 9470,
            "admin_area_level_1": "SG",
            "admin_area_level_2": "Werdenberg",
            "admin_area_level_3": "Sevelen",
            "federal_statistical_office_number": 3275,
            "alternate_names":
            [
                {
                    "name": "Burgerau",
                    "name_short": "Burgerau",
                    "zipcode": 9470
                },
                {
                    "name": "Räfis",
                    "name_short": "Räfis",
                    "zipcode": 9470
                }
            ]
        },
        {
            "name": "Buchs ZH",
            "zipcode": 8107,
            "admin_area_level_1": "ZH",
            "admin_area_level_2": "Dielsdorf",
            "admin_area_level_3": "Buchs (ZH)",
            "federal_statistical_office_number": 83,
            "alternate_names":
            []
        },
        {
            "name": "Buchs ZH",
            "zipcode": 8107,
            "admin_area_level_1": "ZH",
            "admin_area_level_2": "Dielsdorf",
            "admin_area_level_3": "Otelfingen",
            "federal_statistical_office_number": 94,
            "alternate_names":
            []
        },
        {
            "name": "Büchslen",
            "zipcode": 3215,
            "admin_area_level_1": "FR",
            "admin_area_level_2": "See",
            "admin_area_level_3": "Murten",
            "federal_statistical_office_number": 2275,
            "alternate_names":
            [
                {
                    "name": "Buchillon",
                    "name_short": "Buchillon",
                    "zipcode": 3215
                }
            ]
        }
    ]
}

I'm currently torn, if we should go a step further and introduce the "admin area levels" and/or ISO 3166 and ISO 3166-2 to this project as well.

Maybe we can include also the data from other projects like

@kira0269
Copy link
Contributor Author

Is it acceptable for you if, in that case the city name ends with a canton, we keep only the records where the canton found in the cit name matches with the canton in the record ?

@stefanzweifel
Copy link
Owner

stefanzweifel commented Jun 21, 2024

@kira0269 Yeah I think that could work. We would remove cases like "Biel-Benken" from the dataset.

On the other hand, I'm not the biggest fan of manipulating source files in that way. But you seem to be the only one using the library or this specific class; so feel free to submit a PR / update this PR.


Is there any way you could share / describe how you use this package in your project? Hearing how other use the package would help immensely when making decisions or shaping features.

@kira0269
Copy link
Contributor Author

Is there any way you could share / describe how you use this package in your project? Hearing how other use the package would help immensely when making decisions or shaping features.

Sure. In my application, we generate documents for customers. I need to find the customer's canton, so I use this library in order to match a canton with the customer's data, like the zipcode or the city name if it failed with the zipcode.

I really need to find the canton as good as possible, and i want to avoid false positive. I choose by myself what to do if I don't find a specific canton.

@stefanzweifel
Copy link
Owner

A quick update on this. I just sent an email to the Open-Data team of the Swiss post to check-in, if there is a way to get access to the zipcode data.

I think switching back to their data set would solve a couple of problems which popped up, after we've switched to the Swisstopo dataset.

I hope we get an answer soon.

@stefanzweifel
Copy link
Owner

The email bounced / I've got the following automated message back:

Dear User
The OpenDataSoft site is no longer available since Friday 17th November 2023. In addition to its actual postal services, Swiss Post provided information for interested parties on this page, such as a zip code directory, surnames per zip code or street names and house numbers.

Why is the site no longer available?
We have noticed that usage has stagnated at a very low level in recent months. Due to the low demand, Swiss Post has decided not to renew the expiring licenses and to deactivate the site.

However, if you still need certain information, the following offers for business customers can help you:
The following offer is available for address data: Swiss addresses: no one knows them better | Swiss Post
You will also find the following information in the Digital Commerce section: Personalized API integration | Swiss Post and Plugins für Onlineshops | Die Post.
Developer Portal: Swiss Post Developer Portal

The mailbox [email protected] is no longer actively maintained. In case of questions please forward your request to the corresponding offer above.

It looks like their public APIs do not reflect what we need. Will forward my mail to another department.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants