12 Answer Key

12.1 Chapter 4 - Object Types in R Programming

  1. This object was an array.
  2. This object was a vector.
  3. This would output a vector.
  4. This would output a data frame.
  5. To output a factor, you would run the following code:
data(mtcars)
factor(mtcars$gear)

12.2 Chapter 5 - How to Filter and Transform Data in Base R

  1. Filter the following vector to values greater than 2
  q1 <- seq(1,20,2)
  q1[q1 > 2]
  1. Filter the following vector to values between 20 and 30, but only for the first three entries that meet that criteria. (Hint: add [n:n] for the range of values after you determine which values meet that criteria)
  q2 <- round(rnorm(20,32,7),0)
  q2 >= 20 & q2 <= 30
  q2[q2 >= 20 & q2 <= 30][1:3]
  1. Multiply the following matrices together.
  q3_1 <- matrix(round(seq(1,40,3.27),0),3)
  q3_2 <- matrix(seq(1,8,1),4)
  q3_1 %*% q3_2
  1. Subtract 41 from every entry in the second column of the following matrix. Replace the column with those new values.
  q4 <- matrix(seq(1,120,4),10,3)
  q4
  q4[,2] <- q4[,2] - 41
  q4
  1. Select the second row each matrix in the following array. Subtract 5 from those rows.
  q5 <- array(data=c(matrix(seq(1,15,1),5,3),
                  matrix(seq(4,60,4),5,3),
                  matrix(seq(2,30,2),5,3)),
                dim=c(5,3,3))
  q5
  q5[2,,]-5
  1. Filter the following data frame to Bond films starring Roger Moore.
  bond[bond["actor"]=="Roger Moore",]
  1. Filter the following data frame to Bond films starring Sean Connery made after 1966.
  bond[bond["actor"]=="Sean Connery" & bond["year"] > 1966,]

12.3 Chapter 6 - How to Filter and Transform Data with the Dplyr Package

  1. You would use the %>% notation, filter() function, and the operates |, ==, and > to accomplish this.
  data(mtcars)
  library(dplyr)
  mtcars %>% filter(gear==4 | hp > 115)
  1. In addition to the same script as above, you would use the select() function to reduce the columns.
  data(mtcars)
  library(dplyr)
  mtcars %>% 
    filter(gear==4 | hp > 115) %>%
    select(mpg,cyl,gear,hp)
  1. Instead of using select() in the previous script, you would use transmute(). This function allows you to both transform a column and select only those that are mentioned.
  data(mtcars)
  library(dplyr)
  mtcars %>% 
    filter(gear==4 | hp > 115) %>%
    transmute(mpg_log=log(mpg),cyl,gear,hp)
  1. You would use the filter(), group_by(), and summarize() functions to pull this summary data.
  data(mtcars)
  library(dplyr)
  mtcars %>% 
    filter(wt > 2) %>%
    group_by(gear) %>%
    summarize(avg_mpg=mean(mpg))

12.4 Chapter 7 - Understanding and Using R Packages

  1. To install the tidyverse set of packages, run the script install.packages("tidyverse").
  2. To load the dplyr package, run the script library(dplyr).

12.5 Chapter 8 - How to Write Functions

  1. Modify the simply standard deviation function we wrote and change it to calculate mean. Do this without using the built-in mean function.
  avg.simple <- function(data,field) {
    field <- data[,paste(field)]
    sum(field)/length(field)
  }
  1. Alter the summary.group function to include median, minimum, and maximum values.
  summary.group <- function(data,group,field) {
    groups <- levels(factor(data[,paste(group)]))
    output <- data.frame(group=character(),
                         mean=numeric(),
                         sd=numeric(),
                         median=numeric(),
                         minimum=numeric(),
                         maximum=numeric())
    for(i in 1:length(groups)) {
      subdata <- data[data[,paste(group)]==groups[i],
                      paste(field)]
      output[i,1:6] <- data.frame(groups[i],
                                  mean(subdata),
                                  sd(subdata),
                                  median(subdata),
                                  min(subdata),
                                  max(subdata))
      }
    output
  }
  1. Write a function for the Fibonacci Sequence, which ends at a number you choose. You’ll need to use a control flow to accomplish this and a default value for the end of the sequence. (Hint: You won’t use the for(var in seq) expr control flow. Execute ?Control to use a different version.)
  fib <- function(end=55){
    x <- c(0,1)
    n <- length(x)
    while(x[n]<end){
      x[n+1] <- x[n]+x[n-1]
      n <- length(x)
    }
    x
  }

12.6 Chapter 10 - How to Plot Data in R

  1. Use the ggplot(), aes(), and geom_point() functions to construct a plot.
  library(ggplot2)
  data(mtcars)
  ggplot(data=mtcars,
          mapping=aes(x=hp,y=mpg)) +
    geom_point(size=3)
  1. Simply add factor(cyl) to the color argument in the aes() function.
  library(ggplot2)
  data(mtcars)
  ggplot(data=mtcars,
          mapping=aes(x=hp,y=mpg,color=factor(cyl))) +
    geom_point(size=3)
  1. Use the x, y, and color arguments in the labs() function to use a more intuitive naming convention.
  library(ggplot2)
  data(mtcars)
  ggplot(data=mtcars,
          mapping=aes(x=hp,y=mpg,color=factor(cyl))) +
    geom_point(size=3) +
    labs(x="Horsepower",
         y="Miles Per Gallon",
         color="Cylinders")
  1. Use the title argument in the labs() function.
  library(ggplot2)
  data(mtcars)
  ggplot(data=mtcars,
          mapping=aes(x=hp,y=mpg,color=factor(cyl))) +
    geom_point(size=3) +
    labs(x="Horsepower",
         y="Miles Per Gallon",
         color="Cylinders",
         title="Car Performance")
  1. Use the theme_few() theme from the ggthemes package.
  library(ggplot2)
  library(ggthemes)
  data(mtcars)
  ggplot(data=mtcars,
          mapping=aes(x=hp,y=mpg,color=factor(cyl))) +
    geom_point(size=3) +
    labs(x="Horsepower",
         y="Miles Per Gallon",
         color="Cylinders",
         title="Car Performance") +
    theme_few()

12.7 Chapter 11 - Statistical Functions in R

  1. After loading your data with data(), use the lm() function to build a model.
  data(iris)
  PracticeModel <- lm(Petal.Length~Sepal.Length+Sepal.Width,data=iris)
  1. Use the summary() function on your model to determine model performance, such as p-values.
  summary(PracticeModel)
  1. Use the predict() function to make model predictions on a new data set.
  NewData <- data.frame(Sepal.Length=5,Sepal.Width=3.25)
  predict(PracticeModel,NewData)
  1. Use the confint() function to determine the confident intervals for a model.
  confint(PracticeModel)