Skip to content

Israeli movie dubbers collection as data (BGU project).

Notifications You must be signed in to change notification settings

TsUNaMyWaVe/israeli-dubbers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Israeli Dubbers Collection as Data

A university project for the Digital Humanities course in Ben Gurion University.

Intial Data Description

The data initially came from the website "Ishim", which is a vast database about Israeli media creations. As such, it also contains data about dubbed movies, and who dubbed them. The details about the data you can find there, however, are pretty scarce. Each actor page contains very little data, and the way the site is built prevents us from arriving at any meaningful conclusions of the data in it.

Current Data Description

My final data is available here in 2 different types of files: a csv file, and an OpenRefine project. Each entry in the table represents a dubber. Every entry includes the following data:

  • Number of movies the actor dubbed in
  • A list of those movies, and the year the (dubbed) movie was released

Aside from those (which were gathered directly from "Ishim" site), using OpenRefine and WikiData, some of the entries have additional data:

  • Additional occupations
  • Date of birth
  • Place of birth
  • Sex or gender

Using all these files we can now use the data in a more convenient way and reach conclusions based on what we are looking for. For example, we can see in what years more movies where dubbed; how many male actors are there with a big list of movies in comparison to female ones; and so on. In addition, the OpenRefine project file also includes a reconciled version of the movies, for further use.

Code

The code was written in Python, and is mostly used for the initial data collection from "Ishim". The code parse the HTML in "Ishim" to make a csv file named "parsed_data", where each entry is a movie and the list of the dubbers who participated in it. To run it, simply perform: python html_parse.py

Required Libraries

Main dependencies:

Dependencies for extra featurs:

Extra Features

Order

Will make another csv file named "ordered_data", where each entry is a dubber with the following data:

  • Number of movies they participated in
  • A list of those movies
  • A URL to their WikiData page (if exists)

To run it, simply perform: python html_parse.py order=="true"

Note: Performing this step of the code will make the process significantly slower because it is accessing the WikiData API.

Graph

Initially, I planned to display the data I collected as some kind of graph. While personally, I didn't end up using it, I decided to keep the code of it for anyone who might find it useful. Allowing this step will create a graphml file named "graph", which will include the data of a multi-graph, where every node is a dubber. Each edge in the graph connects two dubbers who worked together on a movie, and the edge label is the name of said movie.

To run it, simply perform: python html_parse.py graph=="true"

Note: Most of the common graphing applications can open and manipulate graphml files. But please understand that the code as-is is using the entire parsed database to make the graph, and so it's a very big graph. When opening this file in such application, it might not be easy to read/manipulate, depending on the application you use and your computer performance.

Display

The way I decided to display the data is to contribute to WikiData. Out of the 1478 dubbers entries I worked with, only about ~34% had a WikiData page. Thus OpenRefine was used once again, to push the following changes into WikiData:

  • Creating a WikiData page for each dubber who didn't already have one
  • Adding the occupation "dub actor" to every new page and to every existing page that didn't have any occupation listed before

It ended up being more than 1200 edits, which are reflected in my contribuation page in WikiData.

In Conclusion

Starting from a website that is an archive for humans, the dubbers data changed and evolved to a proper dataset. But the dubbing industry does not stop, and so the tools and methods described here can be used to update the collection as needed.

About

Israeli movie dubbers collection as data (BGU project).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages