Skip to content

Latest commit

 

History

History
27 lines (20 loc) · 1.53 KB

README.md

File metadata and controls

27 lines (20 loc) · 1.53 KB

TopicModelling in R

Read the 10-K reports of 30 companies from 2005 -2014 and help build an intuition about top few areas these companies where heading towards then and cross-validate that with data available now.

  1. Read the 10K reports for tech firms between 2005 & 2014.
  2. Clean the text, remove stop words, stemming.
  3. Lemmatize tokens use of chuncks for nouns.
  4. Create a DTM.
  5. Create DTMS for each year.
  6. Visualize the same as word clouds for each year.

The code base has the R files as well as the HTML generated that explains the code flow.

#In order to run the code, download the R file and execute the same in R Studio.

Basic analysis of the 10-K documents shows:

#Key Takeaways Year wise:

#2006 higlights the year of Macbook, youtube , rsa, Zune. #2007 Motorazr,Goto webinar cisco's online tool sees a mention. #2009 the tech players are talking cloud with Azure,Omniture an online marketing and business analytics unit was acquired by Adobe in this year, smartbooks gets a worthy mention,MotoBLur -Motorolo's user system on remote servers gets a mention. #2010 ipad & iOS has taken center stage,emerging technology nfc , also we see ddos attacks highligted #2011 Cloud is the center stage with iCloud, heroku #2012 talks about Cloud Platforms, byod, "Kaggle :)" ,cloud evangelism! #2013 see the advent of wearables with chromecast getting lots of mentions #2014 Big data platforms being talked about - Cloudera!!, geospatial, bluemix -IBM cloud platform, intresting capture here autonomic and Mojang which was acquired by Microsoft in 2014.