Requirements:
- GoogleNews-vectors-negative300.bin data file
- SMSSpamCollection dataset
Output format: for each tweet(SMS_data object), following attrs exist
- label: indicates spam or ham
- words: list of words extracted from tweet
- vectors: list of vectors of each word in tweet. Each Word vector has 300 dimensions constructed using Word2Vec(gensim), WordNet,ConceptNet