# TidyTutorial 3: Plotting with ggplot (and writing a function!)

In this tutorial we will load in the data created in the second tutorial and make some plots

Plotting in Tidyvserse is handled by the ggplot package

library

``library(tidyverse)``

###Read in the data we created in TidyTutorial 2

``dat<-read.csv("Tidy2_dat.csv")``

###This data contains five variables: id, group, time, score, and score_2

###We will use this data to go over the basics of plotting with ggplot

``head(dat)``
``````##   id group time score score_2
## 1  1     0    0  0.30    0.30
## 2  1     0    1  0.26    0.76
## 3  1     0    2  0.53    1.53
## 4  1     0    3  1.90    3.40
## 5  1     0    4 -1.77    0.23
## 6  1     0    5 -3.61   -1.11``````

#Tidyverse part 1: ggplot (it’s for plotting)

##We can “initiate” a plot with the ggplot function

``dat %>% ggplot()`` ``#Here we have a "blank" plot``

##We can then add a some ‘aesthetics’ with aes

``dat %>% ggplot(aes(x=time, y=score))`` ``#This sets the x and y axis, but we still have not added any data to the plot``

##Now we can add a ‘layer’ to the plot using ‘+ geom’

``dat %>% ggplot(aes(x=time, y=score))+geom_point()`` ``#This is a scatter plot showing the time series of the "score" variable ``

##Here we can try and connect the dots using “geom_line”

``dat %>% ggplot(aes(x=time, y=score))+geom_point()+geom_line()`` ###Hmm….that seems wrong

##To get what we want we need to add a ‘group’ variable

``````dat %>%
ggplot(aes(x=time, y=score, group=id))+ #set group to id
geom_point()+
geom_line()`````` ``#Person-specific time-series of scores``

##We can set the lines to have a different color for each ID. This will automatically add a legend

``````dat %>%
ggplot(aes(x=time, y=score, group=id, color=factor(id)))+geom_point()+geom_line()`````` ``#Person-specific time-series``

##We can also modify the ‘theme’ of the plot; themes change how a plot looks

``````dat %>%
ggplot(aes(x=time, y=score, group=id, color=factor(id)))+
geom_point()+
geom_line()+
theme_bw() #This is may favorite theme, there are also lots of others `````` ``#Person-specific time-series with black+white theme``

##Let’s get rid of the legend on the right side of the plot

``````dat %>%
ggplot(aes(x=time, y=score, group=id, color=factor(id)))+
geom_point()+
geom_line()+
theme_bw()+
theme(legend.position = "none") #This makes legends go away`````` ``#Person-specific time-series with black+white theme and no legend``

##Now let’s produce the same plot for score 2

``````dat %>%
ggplot(aes(x=time, y=score_2, group=id, color=factor(id)))+ #note the change in the 'y=' input
geom_point()+
geom_line()+
theme_bw()+
theme(legend.position = "none")`````` ``#Person-specific time-series with black+white theme and no legend``

All of these seem to increase in a linear fashion (which we set them to do in the previous tutorial)

##Now we can add a regression line

``````dat %>%
ggplot(aes(x=time, y=score_2))+ #note the change in the 'y=' input
geom_point()+
geom_smooth(method = "lm")+ #This adds a linear regression line
geom_line(aes(group=factor(id), color=factor(id)), alpha=0.75)+
theme_bw()+
theme(legend.position = "none")``````
``## `geom_smooth()` using formula 'y ~ x'`` ``#Person-specific time-series with black+white theme and no legend``

##We can add a linear model (it’s in base R- tidy options are available but I use base for lm)

``````dat %>%
ggplot(aes(x=time, y=score_2))+ #note the change in the 'y=' input
geom_point()+
geom_smooth(method = "lm")+ #This adds a linear regression line
geom_line(aes(group=factor(id), color=factor(id)), alpha=0.75)+
theme_bw()+
theme(legend.position = "none")``````
``## `geom_smooth()` using formula 'y ~ x'`` ``dat %>% lm(score_2 ~ time, data=.) %>% summary()``
``````##
## Call:
## lm(formula = score_2 ~ time, data = .)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.7459 -0.9966  0.0721  0.9601  4.2048
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.50915    0.29216   1.743   0.0845 .
## time         0.42535    0.05473   7.772 7.75e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.572 on 98 degrees of freedom
## Multiple R-squared:  0.3813, Adjusted R-squared:  0.375
## F-statistic: 60.41 on 1 and 98 DF,  p-value: 7.745e-12``````

This regression model shows that there is a significant increase in score 2 over time, on average

##We could also plot the distributions of score across all time point for each ID

``````dat %>%
ggplot(aes(x=score_2))+ #note the change in the 'y=' input
geom_density(aes(group=factor(id), fill=factor(id)), alpha=0.8)+
theme_bw()+
theme(legend.position = "none")+
facet_wrap(~id, ncol = 5)`````` ##We could also plot the bivariate relationship between ‘score’ and ‘score_2’

``````dat %>%
ggplot(aes(x=score, y=score_2))+
geom_point(aes(group=factor(id), color=factor(id)))+
theme_bw()+
theme(legend.position = "none")`````` ##Now we can write our first function. Functions are great if you plan to make the same kind of plot a bunch of times

``````bivariate_plot_function<-function(x){x %>% ggplot(aes(x=score, y=score_2))+
geom_point(aes(group=factor(id), color=factor(id)))+
theme_bw()+
theme(legend.position = "none")}``````

##Let’s apply this function to to our data set

``bivariate_plot_function(dat)`` ##Alright, no data to write out for tutorial 3. But, what if we wanted to save one of these plots?

##To save a plot, first assign it to an object and then use ‘ggsave’

``````plot_1<-dat %>%
ggplot(aes(x=time, y=score_2))+ #note the change in the 'y=' input
geom_point()+
geom_smooth(method = "lm")+ #This adds a linear regression line
geom_line(aes(group=factor(id), color=factor(id)), alpha=0.75)+
theme_bw()+
theme(legend.position = "none")

#Here is our person-specific time-series plot with a regression line``````

##Let’s view the plot object we just assigned

``plot_1``
``## `geom_smooth()` using formula 'y ~ x'`` ##Alright, now we can save it using ggsave. This will be saved as a pdf but you can alos use .png or other formats

``ggsave(plot_1, file="plot_1.pdf", width = 12, height = 8)``
``## `geom_smooth()` using formula 'y ~ x'``

###Plots are useful for looking at trends in your data. Modeling these trends statistically will be covered in tutorial #4 