Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better source for Swedish data? #20

Open
PierreMesure opened this issue Dec 22, 2023 · 3 comments
Open

Better source for Swedish data? #20

PierreMesure opened this issue Dec 22, 2023 · 3 comments

Comments

@PierreMesure
Copy link

Hi,

Amazing project! I actually found about it after I made one based on the exact same principle based on Swedish data for my own needs. I just published the code here.

I'm both frustrated and happy I found your project (as well as name-dataset) because I couldn't find anything when I first looked and felt like I had to write my own code. But now that I've done it, I'm bummed someone implemented it better and with more data. Oh well... 😊

Anyway, I'm reaching out since I saw that you seem to be using newborn data for Sweden. I've been using a different dataset which I think works better. SCB has a list of all the names born by at least two people living in Sweden (first, middle and last names). They can be found on this page (the files called Namnsök 2021 and 2022).

I did the math and this amounts to 98% of the population (e.g. 2% of the population have a unique name and are hence not in this list). So it's way more exhaustive than the lists of newborns, even if you go back a few decades. In total, there are 97386 unique first names to compare with the 1518 in your newborn dataset.

Would you be interested in a PR to use this dataset instead?

@aserebrenik
Copy link
Member

Dear Pierre, sure! While we are not actively working on genderComputer, it is great to see that it is still useful :)

@salgo60
Copy link

salgo60 commented Mar 27, 2024

@PierreMesure I guess you can use Wikidata and Linked data

I have been part of adding the names for all Swedish PM people as WD entities with sources and also tried to get ISOF interested in Linked data for names see Koppla Namnobjekt - ISOF <-> Wikidata?


Wikidata properties for name

see also #129

image image image

All properties used for Swedish PMs in Wikidata - 510 different properties used

image

image image image image image image image image image

All external properties used for Swedish PMs in Wikidata - 377

image

Alla externa svenska egenskaper som finns på Svenska PM i Wikidata - 39 stycken

image

Alla externa egenskaper som har med Auktoritetsfiler enl. Wikidata - för svenska riksdagsmän

image

Alla egenskaper som har med namn Q19643892 enl. Wikidata - för svenska riksdagsmän

image

@salgo60
Copy link

salgo60 commented Mar 27, 2024

OT Interesting work done with names in Wikidata by OCCRP - see Wikidata and OCCRP - minutes of session

The Organized Crime and Corruption Reporting Project (OCCRP) is using Wikidata to find synonyms for people's names. In this short talk we will present how we use Wikidata's data to support reporting on crime and corruption

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants