Israeli Dubbers Collection as Data

A university project for the Digital Humanities course in Ben Gurion University.

Intial Data Description

The data initially came from the website "Ishim", which is a vast database about Israeli media creations. As such, it also contains data about dubbed movies, and who dubbed them. The details about the data you can find there, however, are pretty scarce. Each actor page contains very little data, and the way the site is built prevents us from arriving at any meaningful conclusions of the data in it.

Current Data Description

My final data is available here in 2 different types of files: a csv file, and an OpenRefine project. Each entry in the table represents a dubber. Every entry includes the following data:

Number of movies the actor dubbed in
A list of those movies, and the year the (dubbed) movie was released

Aside from those (which were gathered directly from "Ishim" site), using OpenRefine and WikiData, some of the entries have additional data:

Additional occupations
Date of birth
Place of birth
Sex or gender

Using all these files we can now use the data in a more convenient way and reach conclusions based on what we are looking for. For example, we can see in what years more movies where dubbed; how many male actors are there with a big list of movies in comparison to female ones; and so on. In addition, the OpenRefine project file also includes a reconciled version of the movies, for further use.

Code

The code was written in Python, and is mostly used for the initial data collection from "Ishim". The code parse the HTML in "Ishim" to make a csv file named "parsed_data", where each entry is a movie and the list of the dubbers who participated in it. To run it, simply perform: python html_parse.py

Required Libraries

Main dependencies:

Dependencies for extra featurs:

Extra Features

Order

Will make another csv file named "ordered_data", where each entry is a dubber with the following data:

Number of movies they participated in
A list of those movies
A URL to their WikiData page (if exists)

To run it, simply perform: python html_parse.py order=="true"

Note: Performing this step of the code will make the process significantly slower because it is accessing the WikiData API.

Graph

Initially, I planned to display the data I collected as some kind of graph. While personally, I didn't end up using it, I decided to keep the code of it for anyone who might find it useful. Allowing this step will create a graphml file named "graph", which will include the data of a multi-graph, where every node is a dubber. Each edge in the graph connects two dubbers who worked together on a movie, and the edge label is the name of said movie.

To run it, simply perform: python html_parse.py graph=="true"

Note: Most of the common graphing applications can open and manipulate graphml files. But please understand that the code as-is is using the entire parsed database to make the graph, and so it's a very big graph. When opening this file in such application, it might not be easy to read/manipulate, depending on the application you use and your computer performance.

Display

The way I decided to display the data is to contribute to WikiData. Out of the 1478 dubbers entries I worked with, only about ~34% had a WikiData page. Thus OpenRefine was used once again, to push the following changes into WikiData:

Creating a WikiData page for each dubber who didn't already have one
Adding the occupation "dub actor" to every new page and to every existing page that didn't have any occupation listed before

It ended up being more than 1200 edits, which are reflected in my contribuation page in WikiData.

In Conclusion

Starting from a website that is an archive for humans, the dubbers data changed and evolved to a proper dataset. But the dubbing industry does not stop, and so the tools and methods described here can be used to update the collection as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
html_parse.py		html_parse.py
ordered_data-csv.csv		ordered_data-csv.csv
ordered_data-csv.openrefine.tar.gz		ordered_data-csv.openrefine.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Israeli Dubbers Collection as Data

Intial Data Description

Current Data Description

Code

Required Libraries

Extra Features

Order

Graph

Display

In Conclusion

About

Releases

Packages

Contributors 2

Languages

TsUNaMyWaVe/israeli-dubbers

Folders and files

Latest commit

History

Repository files navigation

Israeli Dubbers Collection as Data

Intial Data Description

Current Data Description

Code

Required Libraries

Extra Features

Order

Graph

Display

In Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages