NB_TimeAndMentions.Rmd

---
title: "Time as an important variable"
output: html_notebook
---

```{r setup, include = FALSE}
library(tidyverse)
library(lubridate)
articles <- read_csv("data/master.csv")
```

This notebook focuses on time as an important variable.

# Does the mentions increase with the time since publication
```{r mentions with time}
articles_since_publication <- mutate(
  filter(articles, print_publication_date > "2013-01-01"), 
  days_since_publication = ymd("2018-07-13")-as_date(print_publication_date) 
)

ggplot(data = articles_since_publication) +
  geom_point(mapping = aes(x = days_since_publication, y = total_mentions))


```

It turns out that the Scopus publication dates and Altmetric dates do not match. The Scopus data should all have been published from 1/1/13 but the Altmetric data shows a number of articles have been published before this date including 1965!


# average time to first news mention in open versus closed articles

The below filters out just the news mentions, groups them and summarises on the min(date_posted) to give us the earliest news mention. This is a direct copy of the average time to first policy mention code but with 'news' replacing 'policy'.

Then it subtracts the date of publication from the first news mention date to give us the number of days between publication and mention in news.

And then it takes and average and standard deviation of those times for open and closed papers.


```{r Average time to first news mention - used in paper}

open_closed_with_first_news_mention <- filter(open_closed_with_mentions, source == 'news') %>%
                                          group_by(article_title, journal_title, print_publication_date, known_to_be_open) %>%
                                          summarise(first_news_pub_date = min(date_posted)) %>%
                                          mutate(
                                             days_to_first_news_mention = as_date(first_news_pub_date) - as_date(print_publication_date) 
                                          )

summarise(
  group_by(open_closed_with_first_news_mention, known_to_be_open),
  mean_average_days_to_first_news_mention = mean(days_to_first_news_mention, na.rm = TRUE),
  sd_average_days_to_first_news_mention = sd(days_to_first_news_mention, na.rm = TRUE)
)


```

# Simple how many are open v how many are closed and have a news mention
```{r open v closed for news - used in paper}
summarise(
group_by(open_closed_with_first_news_mention, known_to_be_open),
total = n()
)
```

# average time to first blog mention in open versus closed articles

The below filters out just the blog mentions, groups them and summarises on the min(date_posted) to give us the earliest blog mention. This is a direct copy of the average time to first policy mention code but with 'blog' replacing 'policy'.

Then it subtracts the date of publication from the first news mention date to give us the number of days between publication and mention in blogs.

And then it takes and average and standard deviation of those times for open and closed papers.


```{r Average time to first blogs mention - used in paper}

open_closed_with_first_blogs_mention <- filter(open_closed_with_mentions, source == 'blogs') %>%
                                          group_by(article_title, journal_title, print_publication_date, known_to_be_open) %>%
                                          summarise(first_blogs_pub_date = min(date_posted)) %>%
                                          mutate(
                                             days_to_first_blogs_mention = as_date(first_blogs_pub_date) - as_date(print_publication_date) 
                                          )

summarise(
  group_by(open_closed_with_first_blogs_mention, known_to_be_open),
  mean_average_days_to_first_blogs_mention = mean(days_to_first_blogs_mention, na.rm = TRUE),
  sd_average_days_to_first_blogs_mention = sd(days_to_first_blogs_mention, na.rm = TRUE)
)


```

# Simple how many are open v how many are closed and have a blogs mention
```{r open v closed for blogs - used in paper}
summarise(
group_by(open_closed_with_first_blogs_mention, known_to_be_open),
total = n()
)
```