How well might psychedelic drugs work to treat mental illness, compared with prescription psych meds? What insights can be drawn from anonymously-submitted psychedelic experience reports?
Please see a summary of findings and explanation of methodoligical judgements in the final report. For a breakdown of how to explore the underlying work throughout this project, read on:
Functions used throughout the notebooks are saved in the source folder and tested here.
In notebook 1 and notebook 2, I wrangle data from multiple scientific studies measuring the efficacy of prescription psych meds based on user ratings with accompanying reviews. In notebook 3, I begin to conduct exploratory data analysis comparing the target variable "rating" and other variables such as "drug," "condition," or the "date" the review was subitted. I begin parsing the text of the narratve reviews. In notebook 4, I conduct natural language processing to remove symbols other than those associated with sentiment (emoji or !!!), correct spelling, remove stopwords other than those associated with sentiment (i.e. 'not', 'very'), and lemmatize the text.
In notebook 5, I engineer features based on the text of the drug reviews, for example, length, complexity, subjectivity, and polarity. The feature most correlated with the target variable "rating" is text sentiment polarity, as seen below. In notebook 6, I explore options for quantifying the words of the text itself for each review and settle upon creating a sparse matrix using CountVectorizer.
In notebook 7, I use a randomized grid search to selet a best model and hyperparameters. Final ComplementNB model evaluation metrics include: 0.58 F1 score, 1.27 log loss, 0.75 roc_auc, and accuracy with best k=2 of 0.73.
In notebook 8, I scrape psychedelic experience reports from erowid.org. In notebook 9, I clean these reports and engineer features to mimic those of the prescription psych med reviews from the trained model. In notebook 10, I apply the model to the new scraped data and use predicted ratings to compare psychedelic and prescription drugs.
This is likely due to the fact that more psychedelic experience reports' sentiments are positive.
Please see the project proposal for citations/links to original data sources
Project based on the cookiecutter data science project template.