# 4 Object Types in R Programming

R uses objects to store and interact with data and there are various object types. That probably means little to you now, but understanding these differences will make R programming easier – whatever your R programming goals.

In fact, I say it’s better to understand how these objects interact with one another over memorizing every base function and package out there.

That’s different from the approach I took to learning R. When I learned R, I went straight to learning the base functions. You know, the cool stuff that does the regression analysis and confidence intervals and whatnot.

That didn’t work out well for me. I was coming from a SQL background and thought data worked in a similar way with R.

Had I started by learning about object types first, I would’ve saved a lot more time. I would’ve done less data manipulation in SQL or Excel and made simpler, more scalable R code.

## 4.1 Why Do Objects Matter?

Almost everything you program in R does one of the following:

• Modifies an object
• Produces an object
• Calls upon a pre-existing object

For example, the simple code below utilizes five different object types:

  confint.lm(happy_model)

This code for calculating confidence intervals calls upon a base function, evaluates an existing list, creates several vectors and an array to perform the analysis, and then outputs a matrix. All five are objects. (You can see the function’s script by pasting stats::confint.lm in your console.)

Understanding this will help you understand how R can seem to “guess” what it’s supposed to do based on the data inputs.

## 4.2 Understanding Object Types Makes It Easier to Transform and Analyze

Pulling data from one object type is different than pulling data from another. This makes it confusing for people who learned about data through SQL, as opposed to other programming languages.

For example, the following code will select most data types in SQL:

  SELECT
Field1,
Field2,
Field3,
Field4
FROM
Data_Set

That’s different from R. Data selection in R depends on the object type.

For example, using the command [6] next to the object name will select a single value from a vector object…

  money[6]
## [1] 175

But that won’t work for a list object below…

  happy_model[6]

  bond$filmname ## [1] "Skyfall" "Thunderball" ## [3] "Goldfinger" "Spectre" ## [5] "Live and Let Die" "You Only Live Twice" ## [7] "The Spy Who Loved Me" "Casino Royale" ## [9] "Moonraker" "Diamonds Are Forever" ## [11] "Quantum of Solace" "From Russia with Love" ## [13] "Die Another Day" "Goldeneye" ## [15] "On Her Majesty's Secret Service" "The World is Not Enough" ## [17] "For Your Eyes Only" "Tomorrow Never Dies" ## [19] "The Man with the Golden Gun" "Dr. No" ## [21] "Octopussy" "The Living Daylights" ## [23] "A View to a Kill" "Licence to Kill" If you want to select a single column and maintain the data frame object type, you have to use the following code:  bond[1] ## filmname ## 1 Skyfall ## 2 Thunderball ## 3 Goldfinger ## 4 Spectre ## 5 Live and Let Die ## 6 You Only Live Twice ## 7 The Spy Who Loved Me ## 8 Casino Royale ## 9 Moonraker ## 10 Diamonds Are Forever ## 11 Quantum of Solace ## 12 From Russia with Love ## 13 Die Another Day ## 14 Goldeneye ## 15 On Her Majesty's Secret Service ## 16 The World is Not Enough ## 17 For Your Eyes Only ## 18 Tomorrow Never Dies ## 19 The Man with the Golden Gun ## 20 Dr. No ## 21 Octopussy ## 22 The Living Daylights ## 23 A View to a Kill ## 24 Licence to Kill We’ll go into further detail about selecting, transforming, and analyzing data frames later on. The way you go about it depends on whether you want to make efficient code or you want to make “readable” code for other analysts. ## 4.7 Factors Factors take vectors (or data frame columns) and create categories to group the values. Confused? It’s actually fairly simple. Think back to the data frame we built for the Bond films. If you use the code below, you’ll see the first six rows:  head(bond) ## filmname year actor gross ## 1 Skyfall 2012 Daniel Craig 1108.5610 ## 2 Thunderball 1965 Sean Connery 1014.9411 ## 3 Goldfinger 1964 Sean Connery 912.2575 ## 4 Spectre 2015 Daniel Craig 880.6692 ## 5 Live and Let Die 1973 Roger Moore 825.1108 ## 6 You Only Live Twice 1967 Sean Connery 756.5444 Now let’s say you want a short list of the Bond actors. If you’ll notice in the data set, the actor names like “Daniel Craig” and “Sean Connery” are used repeatedly. These are basically ways to group the data frame with a common field name - the actor who played Bond. If we tried to get a list of these actors using the levels() function, it wouldn’t work.  levels(bond$actor)
## NULL

That’s because it hasn’t been factored yet.

This is a real simply fix. Simply use the factor() function and assign it to the field name within the data frame.

  bond$actor <- factor(bond$actor)
levels(bond$actor) ## [1] "Daniel Craig" "George Lazenby" "Pierce Brosnan" "Roger Moore" ## [5] "Sean Connery" "Timothy Dalton" And this will also show up in the environment tab in the top left. R used to automatically factor character variables for you. However, that functionality was removed in a recent update. You may see factors as a not-so-important object type, but that’s not true. It comes in handy with regression analysis. Especially if your categorical variables are numeric. For example, our Bond data frame may not include the actor name. It could simply have a number between 1 and 6 for the actor - with Sean Connery as 1 and Daniel Craig as 6. That means a regression analysis would’ve analyzed the actor as a continuous variable by default! This also comes up with experiments that analyze the impact of medicine. It’s not uncommon to label one drug as 1 and another drug as 2. That means you’d have to factor those drug codes so that your analysis reads them correctly. ## 4.8 Lists Lists are objects that usually store other objects in a nice bundle. Those objects could be vectors, other lists, data frames, etc. Many of the more complex R base functions produce lists. A common one is produced by the lm() function. Use the code below to build a model with the James Bond data:  bondmodel <- lm(gross~actor,data=bond) Now you can see the list this produces in the environment tab. As you can see, there’s a lot in this list. You can also see what’s in the list using the following code:  names(bondmodel) ## [1] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [9] "contrasts" "xlevels" "call" "terms" ## [13] "model" This gets to the heart of why it’s important to know when you’re dealing with a list. It changes the way you select components of that list. For example, let’s say you want just the coefficients from a model you had built. You can use the same $ symbol as before.

  bondmodel$coefficients ## (Intercept) actorGeorge Lazenby actorPierce Brosnan actorRoger Moore ## 820.31651 -314.41673 -309.37854 -269.48336 ## actorSean Connery actorTimothy Dalton ## -95.43409 -487.19349 However, if you want to select a single coefficient, you have to use a number value afterwards.  bondmodel$coefficients[4]
## actorRoger Moore
##        -269.4834

That’s why it’s important to know if you’re pulling from a list or not. It changes the way that you select key parts of the data.

## 4.9 Functions

Functions are also an object. Most of the time, you’ll be using a built-in function that is in the R base code or in a package you loaded.

However, you may find yourself building your own functions, which is handy if you don’t want to search for a pre-existing one or need something unique to your situation.

We’ll go into more detail about functions in a later chapter because of their complexity.

## 4.10 Things to Remember

• The key to understanding R is understanding objects
• The object type changes the way you’ll read, transform, and produce data

## 4.11 Exercises

Try and see if you can answer the following questions. Answers are in the back of the book!

1. Down below is the printed output of an object. What type of object is it?
## , , 1
##
##           [,1]     [,2]
## [1,] 15.864838 11.23700
## [2,]  9.306884 22.23843
##
## , , 2
##
##          [,1]     [,2]
## [1,] 23.72055 19.57388
## [2,] 18.21969 21.93573
##
## , , 3
##
##           [,1]     [,2]
## [1,]  9.503213 9.667211
## [2,] 10.788929 9.026587
1. Down below is the printed output of an object. What type of object is it?
## [1]  5.106328 11.586182  9.622173 15.599097
1. We want to select the column mpg from the data frame mtcars. Would the following code output a vector or data frame?
  data(mtcars)
mtcars\$mpg
1. We want to select the column mpg from the data frame mtcars. Would the following code output a vector or data frame?
  data(mtcars)
mtcars[1]
1. We want to select the column gear from the data frame mtcars, but we want to treat is as a categorical variable. How would you turn this column into a factor? (You can load the data set using data(mtcars))