Skip to content

winhows5/filmsInTenYears

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Films in Ten Years / 观影十年

Data visualization (2017/08/21)

Data clean (2017/08/20)

Actually the data cleaning process was amostly done in the Data source section, still there are several things to deal with:

  • the release days are negative before the release date, and those items should be eliminated while their box offices added to the next item.
  • the countries of film are in chaos, therefore the nationality are determined according to the production company, producer, screenwriter and director.

Data source (2017/08/18)

  • main data: 中国票房
    Crawl data of top 25 films in box office every year from 2008 to 2017, including film names, categories, countries, box offices, ticket prices and dates.
  • contrasting data: Mtimes(时光网)
    In order to ensure the data is reliable, the total box office of every film is contrasted between 中国票房 and Mtimes. If the difference is more than 8%, the box data should be compared manually with the third site (IMDb for example).
    The disadvantage of Mtimes: There is no weekly box offices data.
  • corrected data: IMDb
    In some cases, especially foreign films, the weekly data is dismissed. The IMDb data about the films is used to correct when this happens.
    The disadvantage of IMDb: Some of the China films data is dismissed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published