-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better source for Swedish data? #20
Comments
Dear Pierre, sure! While we are not actively working on genderComputer, it is great to see that it is still useful :) |
@PierreMesure I guess you can use Wikidata and Linked data I have been part of adding the names for all Swedish PM people as WD entities with sources and also tried to get ISOF interested in Linked data for names see Koppla Namnobjekt - ISOF <-> Wikidata? Wikidata properties for namesee also #129 All properties used for Swedish PMs in Wikidata - 510 different properties used
All external properties used for Swedish PMs in Wikidata - 377Alla externa svenska egenskaper som finns på Svenska PM i Wikidata - 39 stycken
Alla externa egenskaper som har med Auktoritetsfiler enl. Wikidata - för svenska riksdagsmänAlla egenskaper som har med namn Q19643892 enl. Wikidata - för svenska riksdagsmän
|
OT Interesting work done with names in Wikidata by OCCRP - see Wikidata and OCCRP - minutes of session The Organized Crime and Corruption Reporting Project (OCCRP) is using Wikidata to find synonyms for people's names. In this short talk we will present how we use Wikidata's data to support reporting on crime and corruption |
Hi,
Amazing project! I actually found about it after I made one based on the exact same principle based on Swedish data for my own needs. I just published the code here.
I'm both frustrated and happy I found your project (as well as name-dataset) because I couldn't find anything when I first looked and felt like I had to write my own code. But now that I've done it, I'm bummed someone implemented it better and with more data. Oh well... 😊
Anyway, I'm reaching out since I saw that you seem to be using newborn data for Sweden. I've been using a different dataset which I think works better. SCB has a list of all the names born by at least two people living in Sweden (first, middle and last names). They can be found on this page (the files called Namnsök 2021 and 2022).
I did the math and this amounts to 98% of the population (e.g. 2% of the population have a unique name and are hence not in this list). So it's way more exhaustive than the lists of newborns, even if you go back a few decades. In total, there are 97386 unique first names to compare with the 1518 in your newborn dataset.
Would you be interested in a PR to use this dataset instead?
The text was updated successfully, but these errors were encountered: