5 How to Filter and Transform Objects in Base R

In the last chapter, I explained the various object types in R. Now we want to learn how to filter and transform those objects. Notice how I didn’t say filter and transform “data?” That’s because the methods you use to handle data in R heavily depend on the object type.

Before I explain those methods though, we need to cover operators.

5.1 What Are Operators?

If you’re new to programming, then you’re probably not familiar with the term operator. Operators, in plain English, modify or evaluate data. That’s important to data transformation and filtering.

There are two types of operators in R: arithmetic and logical.

Arithmetic operators cover tasks like addition, subtraction, etc. This is useful for data transformation and will be used in several examples later.

Here are the common arithmetic operators:

label symbol
addition +
subtraction -
multiplication *
division /
exponent ^
matrix multiplication %*%
matrix division %/%

Logical operators takes the data and generates a TRUE or FALSE output, based on whether the data meets your requirement. This is more helpful for filtering data than transforming.

Here are the common logical operators:

label symbol
less than <
greater than >
less than or equal <=
greater than or equal >=
equal ==
does not equal !=
and &
or |
in %in%

Don’t worry if you’re unsure of how to use these just yet. You’ll see examples for these in the next few sections. This is just for your easy reference later.

5.2 How to Filter and Transform Data From a Vector

Vectors are the easiest object type to filter. Same with transforming the data within them.

If you want to reference or view the entire vector, you simply enter the name you assigned the object.

  v5 <- c(1,5,5,2,1,4)
  v5
## [1] 1 5 5 2 1 4

(Remember: the <- command allows you to name any object. You can use Option + “-” for Macs and Alt + “-” for Windows as a short-cut for <-.)

You also can select a single entry from a vector using the [n] notation.

  v5 <- c(1,5,5,2,1,4)
  v5[3]
## [1] 5

As you can see, the script above selected the third value from the vector.

You can select a range of entries by using the [n:n] notation.

  v5 <- c(1,5,5,2,1,4)
  v5[3:4]
## [1] 5 2

And, as we’ve seen before, you can create a new vector by referencing old vectors!

  v2 <- c("Hola","Howdy","Hello")
  v7 <- c(2:4)
  v8 <- c(v2,v7)
  v8
## [1] "Hola"  "Howdy" "Hello" "2"     "3"     "4"

You can also use other base R functions to filter data.

For example, you may want to see the minimum or maximum value in a vector. You can use the max() and min() command to do so.

  v5
## [1] 1 5 5 2 1 4
  max(v5) # Max value
## [1] 5
  min(v5) # Min value
## [1] 1

And you can use logical operators as well. In the example below, I use the >= and & operators to filter values.

  v5 >= 2 # Values greater than 2
## [1] FALSE  TRUE  TRUE  TRUE FALSE  TRUE
  v5 >= 3 & v5 <= 5 # Values between 3 and 5
## [1] FALSE  TRUE  TRUE FALSE FALSE  TRUE

We can also use the | (“or”) operator to find values that meet a criteria. For example, I filter the vector below to “Hola” and “Howdy.”

  v2
## [1] "Hola"  "Howdy" "Hello"
  v2 == "Hola" | v2 == "Howdy"
## [1]  TRUE  TRUE FALSE

You probably noticed that these logical operators only return a TRUE or FALSE statement. That makes sense since it is a logical argument that’s evaluated. However, we may want to see the actual values that meet the argument. This isn’t important in an example like this, but it does come up later on for more complex objects.

To show the actual values where the logical argument is true, you use the object_name[argument] notation.

In the next few examples, I filter the vectors down to values that meet the arguments used in the last few examples:

  v5[v5>=2]
## [1] 5 5 2 4
  v5[v5 >= 3 & v5 <= 5]
## [1] 5 5 4
  v2[v2 == "Hola" | v2 == "Howdy"]
## [1] "Hola"  "Howdy"

In the examples above, I simply took the logical argument and plugged it into the brackets.

You can also change data easily when it comes to numeric vectors. For example, down below is a vector of box office revenue for James Bond films.

Copy and paste this script into your R console and execute:

  gross <- 
    c(1108561008,1014941117,912257512,880669186,
      825110761,756544419,692713752,669789482,
      655872400,648514469,622246378,576277964,
      543639638,529548711,505899782,491617153,
      486468881,478946402,448249281,440759072,
      426244352,381088866,321172633,285157191)
  gross

As you can see, the values are very large. To make our analysis easier, we can use one of the arithmetic operators that I showed earlier.

In this scenario, I want to make the values smaller. So I’m going to divide it using the / operator.

  gross <- gross/1000000
  gross

You may have noticed that I reassigned the vector the same name “gross.” You can do this if you want to overwrite the previous object assignment. Careful though. Those changes are permanent. That means to reverse them, you may need to run an earlier script to re-generate the original data or re-import your data set.

You can also calculate individual values this way too.

  gross <- 
    c(1108561008,1014941117,912257512,880669186,
      825110761,756544419,692713752,669789482,
      655872400,648514469,622246378,576277964,
      543639638,529548711,505899782,491617153,
      486468881,478946402,448249281,440759072,
      426244352,381088866,321172633,285157191)
  gross[4]  
## [1] 880669186
  gross[4]/100000
## [1] 8806.692

And you can re-assign the value to a particular part of a vector using the methods we described above and the <- notation.

For example, we can see below how we re-assign values based on the location:

  v8 <- c(1,5,5,2,1,4) # Creates the original vector
  v8[6] <- 8 # Replaces the sixth value with an 8
  v8[1:3] <- c(4,3,1) # Replaces the first three values
  v8
## [1] 4 3 1 2 1 8

5.3 How to Filter and Transform Data From a Matrix

Filtering the data within a matrix is both similar to and different from a vector.

It’s similar because we can use the [n] notation to select a single entry. We had done this before with a vector.

  v5[2]
## [1] 5

You can do the same for a matrix. If you run the code below, you’ll re-create and view the matrix we used in the last chapter:

  matrix1 <- matrix(c(2,0,1,3),nrow=2,ncol=2)
  matrix1
##      [,1] [,2]
## [1,]    2    1
## [2,]    0    3

And here you’ll select the fourth value from that matrix using the [4] command:

  matrix1[4]
## [1] 3

Now that isn’t very practical for a matrix. You may need to select a value from a specific row or column instead. This is where matrices are different from a vector. You’ll want to use the [r,c] command to determine which values you want.

In the example below, I select the second row and first column of the matrix:

  matrix1[2,1]
## [1] 0

We can make this easier on ourselves. Instead of specifying row or column numbers, we can give them names. That way, we can use the [row_name,column_name] notation to select data from a matrix.

Down below, I give our previously created matrix row and column names:

  colnames(matrix1) <- c("Col1","Col2")
  rownames(matrix1) <- c("Row1","Row2")
  matrix1["Row2","Col1"]
## [1] 0

We can also apply vector filtering methods to matrices. For example, I want to see what values are greater than 0.

  matrix1 > 0 # Returns true or false
##       Col1 Col2
## Row1  TRUE TRUE
## Row2 FALSE TRUE

Funny enough though, you can’t return the actual values that meet this criteria in a matrix form. It’ll turn into a vector. That’s because the output may not have the same number of columns and rows as the original matrix. So R assumes it’ll need a one-dimensional object output.

  matrix1[matrix1 > 0]
## [1] 2 1 3

You can use the same techniques we outlined before with the vectors to transform the data within a matrix. Copy and paste the codes below to your R console and see the results. Feel free to play around with the inputs to see what happens.

  matrix1 <- matrix(c(2,0,1,3),nrow=2,ncol=2)  
  matrix1
##      [,1] [,2]
## [1,]    2    1
## [2,]    0    3
  matrix1[3] <- 5
  matrix1
##      [,1] [,2]
## [1,]    2    5
## [2,]    0    3
  matrix1[,2] <- 2
  matrix1
##      [,1] [,2]
## [1,]    2    2
## [2,]    0    2
  matrix1[2,2] <- 0
  matrix1
##      [,1] [,2]
## [1,]    2    2
## [2,]    0    0

Like the vectors, you can transform the data within the matrix using the arithmetic operators we discussed earlier.

  matrix1
##      [,1] [,2]
## [1,]    2    2
## [2,]    0    0
  matrix1 + 2
##      [,1] [,2]
## [1,]    4    4
## [2,]    2    2
  matrix1 - 4
##      [,1] [,2]
## [1,]   -2   -2
## [2,]   -4   -4
  matrix1 ^ 3
##      [,1] [,2]
## [1,]    8    8
## [2,]    0    0
  matrix1 * 5
##      [,1] [,2]
## [1,]   10   10
## [2,]    0    0

You can also use these operators to combine matrices. We’ll need a few matrices to illustrate these examples though. Take the code I have below and execute it in your console, if you want to follow along with my examples.

  matrix1 <- matrix(c(2,0,1,3),nrow=2,ncol=2)
  matrix1
  matrix2 <- matrix(c(5,7),nrow=2)
  matrix2
  matrix6 <- matrix(c(4,3,1,3),nrow=2,ncol=2)
  matrix6

It’s important to remember the dimensions of your matrices. Attempting to use addition on two matrices without the same dimensions won’t work.

Matrix 1 and 2 do not have the same dimensions, so it will return an error:

  matrix1 + matrix2
## Error in matrix1 + matrix2: non-conformable arrays

However, Matrix 1 and Matrix 6 do have the same dimensions and will execute:

  matrix1 + matrix6
##      [,1] [,2]
## [1,]    6    2
## [2,]    3    6

Multiplying two matrices together can be misleading. For example, using the simple * operator will merely multiply the corresponding values in two matrices with the same dimensions. Confused? Look at the two matrices below and then look at the output:

  matrix1
##      [,1] [,2]
## [1,]    2    1
## [2,]    0    3
  matrix6
##      [,1] [,2]
## [1,]    4    1
## [2,]    3    3
  matrix1 * matrix6
##      [,1] [,2]
## [1,]    8    1
## [2,]    0    9

Entry [1,1] of the first matrix is 2. Entry [1,1] of the second matrix is 4. 2 x 4 = 8. That shows us that the multiplication used here is not true matrix multiplication.

If you attempt to use the same * operator for Matrix 1 and Matrix 2 though, you will get an error:

  matrix1
##      [,1] [,2]
## [1,]    2    1
## [2,]    0    3
  matrix2
##      [,1]
## [1,]    5
## [2,]    7
  matrix1 * matrix2
## Error in matrix1 * matrix2: non-conformable arrays

That’s because these two matrices do not share the same dimensions.

However, we can generate a single matrix from these two matrices using matrix algebra. To do so, we use the %*% operator:

  matrix7 <- matrix1 %*% matrix2  
  matrix7
##      [,1]
## [1,]   17
## [2,]   21

You can also divide a matrix with another using the %/% operator:

  matrix7 %/% matrix2
##      [,1]
## [1,]    3
## [2,]    3

Just remember the difference in how a matrix will interact with the *, %*%, /, and %/% operators.

5.4 How to Filter and Transform Data from Arrays

Selecting data from arrays is similar to what we did before. You can select an individual entry using the [n] command. If you look down below, we re-create the array we used in the last chapter. Copy and paste this code into your R console.

  matrix3 <- matrix(c(2,0,1,4,5,2,3,4),nrow=4,ncol=2)
  matrix4 <- matrix(c(4,3,5,2,1,6,4,5),nrow=4,ncol=2)
  matrix5 <- matrix(c(1,3,1,2,3,5,6,2),nrow=4,ncol=2)
  array1 <- array(c(matrix3,matrix4,matrix5),
                  dim=c(4,2,3))  
  array1

This script will create three separate matrices, like you see below:

And then stacks them into an array:

You can then select the first and twenty-second entry with the scripts below:

  array1[1]
## [1] 2
  array1[22]
## [1] 5

To help you visualize this, I highlighted the fifth and twenty-second values from the array in the illustration below:

With arrays, selecting particular columns or rows gets difficult because arrays can have multiple dimensions. That introduces the [r,c,d] notation.

Down below, we select the entire second row of every matrix in our array:

  array1[2,,]
##      [,1] [,2] [,3]
## [1,]    0    3    3
## [2,]    2    6    5

You may have noticed that this “flipped” the direction. R isn’t trying to confuse you. It’s merely displaying the previous, individual matrix rows as columns. So column 1 shows the results from matrix 1 in the previous array. Keep this in mind as you interact with arrays. The output may not always be intuitive.

Down below is an illustration of how R processes this command.

First, R finds the second row for each level of the array…

And R then pivots and places those rows into a new array, with each column representing the level of the array…

Just like rows, we can also select the second column of every matrix within the array:

  array1[,2,]
##      [,1] [,2] [,3]
## [1,]    5    1    3
## [2,]    2    6    5
## [3,]    3    4    6
## [4,]    4    5    2

Down below is an illustration of how R processes this command. R takes the second column from each level of the array and outputs it. Each column of the output represents the level of the array.

We can also select a single level of the array by specifying the dimension, but leaving the row and column numbers blank:

  array1[,,3]
##      [,1] [,2]
## [1,]    1    3
## [2,]    3    5
## [3,]    1    6
## [4,]    2    2

And finally, we can even get a specific entry by selecting the row, column, and dimension of the array:

  array1[2,2,3]
## [1] 5

You can transform arrays in the same way as the other objects. Down below are some examples of how we can use those same techniques.

Copy and paste the code below to your own computer to get an idea of how this works:

  array1 
  #Changes the 2nd value of the entire array to a 2
  array1[2] <- 2
  array1
  #Changes the 2nd column of each matrix to a 2
  array1[,2,] <- 2
  array1
  #Changes the 2nd row, 1st column of the 3rd array to 0
  array1[2,1,3] <- 0
  array1
  #Divided the 3rd matrix only by 3
  array1[,,3] <- array1[,,3] / 3
  array1

5.5 How to Filter and Transform Data from a Data Frame

There’s two approaches you can take to selecting data from a data frame. There’s the “classic” approach, which I’ll show you in this section, and then there’s the data plier approach.

The classic approach uses R base to interact with data frames. The “data plier” approach uses a package called dplyr to transform the data. The dplyr syntax is far more readable, which is handy for longer scripts.

If you don’t want to confuse yourself by learning two different methods, feel free to skip this section and go to the chapter on dplyr. This section will show the classical approach.

Personally though, I still recommend learning both methods. As you get more hands-on experience with R programming, you’ll find different situations call for different techniques.

With that out of the way, let’s get started with the classical approach.

Down below is a script to load the James Bond data frame from the last chapter:

  bond <- read.csv("https://raw.githubusercontent.com/taylorrodgers/bond/main/bond.csv")

As you may recall, a data frame is basically comprised of individual vectors of equal lengths.

You can re-select individual vectors back out of a data frame using the $ notation:

  bond$filmname
##  [1] "Skyfall"                         "Thunderball"                    
##  [3] "Goldfinger"                      "Spectre"                        
##  [5] "Live and Let Die"                "You Only Live Twice"            
##  [7] "The Spy Who Loved Me"            "Casino Royale"                  
##  [9] "Moonraker"                       "Diamonds Are Forever"           
## [11] "Quantum of Solace"               "From Russia with Love"          
## [13] "Die Another Day"                 "Goldeneye"                      
## [15] "On Her Majesty's Secret Service" "The World is Not Enough"        
## [17] "For Your Eyes Only"              "Tomorrow Never Dies"            
## [19] "The Man with the Golden Gun"     "Dr. No"                         
## [21] "Octopussy"                       "The Living Daylights"           
## [23] "A View to a Kill"                "Licence to Kill"

You can also produce a vector using the [,c] notation. What this means is you’re ignoring the row and selecting a column number. We generate the same result as our last example with this method.

  bond[,1]
##  [1] "Skyfall"                         "Thunderball"                    
##  [3] "Goldfinger"                      "Spectre"                        
##  [5] "Live and Let Die"                "You Only Live Twice"            
##  [7] "The Spy Who Loved Me"            "Casino Royale"                  
##  [9] "Moonraker"                       "Diamonds Are Forever"           
## [11] "Quantum of Solace"               "From Russia with Love"          
## [13] "Die Another Day"                 "Goldeneye"                      
## [15] "On Her Majesty's Secret Service" "The World is Not Enough"        
## [17] "For Your Eyes Only"              "Tomorrow Never Dies"            
## [19] "The Man with the Golden Gun"     "Dr. No"                         
## [21] "Octopussy"                       "The Living Daylights"           
## [23] "A View to a Kill"                "Licence to Kill"

If you use the [n] notation, you’ll select the same column as before, but you’re keeping it in a data frame structure.

  bond[1]
##                           filmname
## 1                          Skyfall
## 2                      Thunderball
## 3                       Goldfinger
## 4                          Spectre
## 5                 Live and Let Die
## 6              You Only Live Twice
## 7             The Spy Who Loved Me
## 8                    Casino Royale
## 9                        Moonraker
## 10            Diamonds Are Forever
## 11               Quantum of Solace
## 12           From Russia with Love
## 13                 Die Another Day
## 14                       Goldeneye
## 15 On Her Majesty's Secret Service
## 16         The World is Not Enough
## 17              For Your Eyes Only
## 18             Tomorrow Never Dies
## 19     The Man with the Golden Gun
## 20                          Dr. No
## 21                       Octopussy
## 22            The Living Daylights
## 23                A View to a Kill
## 24                 Licence to Kill

As you may have noticed, the [n] notation in this context works differently than other object types. For data frames, [n] selects the nth column and NOT the nth data entry.

You can use the [n:n] or the [,n:n] notation to select multiple columns. Both will be presented as a data frame.

  bond[1:3]
##                           filmname year          actor
## 1                          Skyfall 2012   Daniel Craig
## 2                      Thunderball 1965   Sean Connery
## 3                       Goldfinger 1964   Sean Connery
## 4                          Spectre 2015   Daniel Craig
## 5                 Live and Let Die 1973    Roger Moore
## 6              You Only Live Twice 1967   Sean Connery
## 7             The Spy Who Loved Me 1977    Roger Moore
## 8                    Casino Royale 2006   Daniel Craig
## 9                        Moonraker 1979    Roger Moore
## 10            Diamonds Are Forever 1971   Sean Connery
## 11               Quantum of Solace 2008   Daniel Craig
## 12           From Russia with Love 1963   Sean Connery
## 13                 Die Another Day 2002 Pierce Brosnan
## 14                       Goldeneye 1995 Pierce Brosnan
## 15 On Her Majesty's Secret Service 1969 George Lazenby
## 16         The World is Not Enough 1999 Pierce Brosnan
## 17              For Your Eyes Only 1981    Roger Moore
## 18             Tomorrow Never Dies 1997 Pierce Brosnan
## 19     The Man with the Golden Gun 1974    Roger Moore
## 20                          Dr. No 1962   Sean Connery
## 21                       Octopussy 1983    Roger Moore
## 22            The Living Daylights 1987 Timothy Dalton
## 23                A View to a Kill 1985    Roger Moore
## 24                 Licence to Kill 1989 Timothy Dalton
  bond[,1:3]
##                           filmname year          actor
## 1                          Skyfall 2012   Daniel Craig
## 2                      Thunderball 1965   Sean Connery
## 3                       Goldfinger 1964   Sean Connery
## 4                          Spectre 2015   Daniel Craig
## 5                 Live and Let Die 1973    Roger Moore
## 6              You Only Live Twice 1967   Sean Connery
## 7             The Spy Who Loved Me 1977    Roger Moore
## 8                    Casino Royale 2006   Daniel Craig
## 9                        Moonraker 1979    Roger Moore
## 10            Diamonds Are Forever 1971   Sean Connery
## 11               Quantum of Solace 2008   Daniel Craig
## 12           From Russia with Love 1963   Sean Connery
## 13                 Die Another Day 2002 Pierce Brosnan
## 14                       Goldeneye 1995 Pierce Brosnan
## 15 On Her Majesty's Secret Service 1969 George Lazenby
## 16         The World is Not Enough 1999 Pierce Brosnan
## 17              For Your Eyes Only 1981    Roger Moore
## 18             Tomorrow Never Dies 1997 Pierce Brosnan
## 19     The Man with the Golden Gun 1974    Roger Moore
## 20                          Dr. No 1962   Sean Connery
## 21                       Octopussy 1983    Roger Moore
## 22            The Living Daylights 1987 Timothy Dalton
## 23                A View to a Kill 1985    Roger Moore
## 24                 Licence to Kill 1989 Timothy Dalton

To select an individual or range of rows, you use the same [r,c] notation as before:

  bond[1:3,] # Selects first three rows
##      filmname year        actor     gross
## 1     Skyfall 2012 Daniel Craig 1108.5610
## 2 Thunderball 1965 Sean Connery 1014.9411
## 3  Goldfinger 1964 Sean Connery  912.2575

You can also exclude rows or columns using the negative - sign before the row or column numbers:

  bond[-1:-20,] # Excludes rows 1 through 20
##                filmname year          actor    gross
## 21            Octopussy 1983    Roger Moore 426.2444
## 22 The Living Daylights 1987 Timothy Dalton 381.0889
## 23     A View to a Kill 1985    Roger Moore 321.1726
## 24      Licence to Kill 1989 Timothy Dalton 285.1572
  bond[3,-4] # Row 3, excluding the 4th column
##     filmname year        actor
## 3 Goldfinger 1964 Sean Connery

Just like the other object types, you can use logical and arithmetic operators, which makes it easy to filter to what you need.

Note: this is where things start to get complicated with data frame filtering and why I suggest learning the dplyr package.

Let’s say we want to filter by year. We want only Bond films made after 1990.

First, we’ll generate our TRUE / FALSE output. Go ahead and take the script below and run it in your own console:

  bond["year"]>=1990

Now, you’ll notice that I intentionally kept this as a data frame object type. Had I used the bond$year notation, it would’ve turned it into a vector. This would’ve made filtering data more difficult later on.

Next, we need to plug this into another script:

  bond[bond["year"]>=1990]
##  [1] "Skyfall"                 "Spectre"                
##  [3] "Casino Royale"           "Quantum of Solace"      
##  [5] "Die Another Day"         "Goldeneye"              
##  [7] "The World is Not Enough" "Tomorrow Never Dies"    
##  [9] "2012"                    "2015"                   
## [11] "2006"                    "2008"                   
## [13] "2002"                    "1995"                   
## [15] "1999"                    "1997"                   
## [17] "Daniel Craig"            "Daniel Craig"           
## [19] "Daniel Craig"            "Daniel Craig"           
## [21] "Pierce Brosnan"          "Pierce Brosnan"         
## [23] "Pierce Brosnan"          "Pierce Brosnan"         
## [25] "1108.5610"               " 880.6692"              
## [27] " 669.7895"               " 622.2464"              
## [29] " 543.6396"               " 529.5487"              
## [31] " 491.6172"               " 478.9464"

If you notice though, this doesn’t cleanly give us the information we need.

We need to make sure we preserve the columns structure. We can do this by simply adding a comma and / or specifying the column numbers:

  bond[bond["year"]>=1990,] # Includes all columns
##                   filmname year          actor     gross
## 1                  Skyfall 2012   Daniel Craig 1108.5610
## 4                  Spectre 2015   Daniel Craig  880.6692
## 8            Casino Royale 2006   Daniel Craig  669.7895
## 11       Quantum of Solace 2008   Daniel Craig  622.2464
## 13         Die Another Day 2002 Pierce Brosnan  543.6396
## 14               Goldeneye 1995 Pierce Brosnan  529.5487
## 16 The World is Not Enough 1999 Pierce Brosnan  491.6172
## 18     Tomorrow Never Dies 1997 Pierce Brosnan  478.9464
  bond[bond["year"]>=1990,1:3] # Columns 1 through 3
##                   filmname year          actor
## 1                  Skyfall 2012   Daniel Craig
## 4                  Spectre 2015   Daniel Craig
## 8            Casino Royale 2006   Daniel Craig
## 11       Quantum of Solace 2008   Daniel Craig
## 13         Die Another Day 2002 Pierce Brosnan
## 14               Goldeneye 1995 Pierce Brosnan
## 16 The World is Not Enough 1999 Pierce Brosnan
## 18     Tomorrow Never Dies 1997 Pierce Brosnan

Adding more filters would become more difficult from here. It would make the script a lot harder to read. I’ll go into better techniques for multiple filters in the next chapter on dplyr.

5.6 How to Filter a List

Unlike the other object types, I won’t go into detail about transforming a list. The reason is that lists are usually reserved as an output of various functions. Or they’re a handy way of bunching other objects together. If you wanted to transform an object within a list, you’d probably transform that object directly.

Filtering a list is a useful skill to have though.

The script below creates a model using our James Bond data. That creates a list of the various calculations in a regression analysis. The names() function then shows you all the objects contained within the list:

  bondmodel <- lm(gross ~ actor,data=bond)
  names(bondmodel)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "contrasts"     "xlevels"       "call"          "terms"        
## [13] "model"

We can select any of these objects within the list with the $, ["object_name"], or [n] notations:

  bondmodel$coefficients
##         (Intercept) actorGeorge Lazenby actorPierce Brosnan    actorRoger Moore 
##           820.31651          -314.41673          -309.37854          -269.48336 
##   actorSean Connery actorTimothy Dalton 
##           -95.43409          -487.19349
  bondmodel["coefficients"]
## $coefficients
##         (Intercept) actorGeorge Lazenby actorPierce Brosnan    actorRoger Moore 
##           820.31651          -314.41673          -309.37854          -269.48336 
##   actorSean Connery actorTimothy Dalton 
##           -95.43409          -487.19349
  bondmodel[1]
## $coefficients
##         (Intercept) actorGeorge Lazenby actorPierce Brosnan    actorRoger Moore 
##           820.31651          -314.41673          -309.37854          -269.48336 
##   actorSean Connery actorTimothy Dalton 
##           -95.43409          -487.19349

Now here’s where things get tricky. Let’s say we want to filter down to a smaller value within the objects of the list. That changes depending on those object types. Confused?

It’s better if we go with a simpler example than the list generated by the lm() function.

Down below, I create a list using some of the other objects we made in this lesson:

  v1 <- c(TRUE,FALSE,TRUE)
  v2 <- c("Hola","Howdy","Hello")
  v3 <- c(1:3)
  list1 <- list(vector1=v1,
                vector2=v2,
                vector3=v3,
                matrix1=matrix1,
                array1=array1,
                bond=bond)

I can select any one of those objects from the list using the $ notation:

  list1$vector1
## [1]  TRUE FALSE  TRUE
  list1$matrix1
##      [,1] [,2]
## [1,]    2    1
## [2,]    0    3

Now let’s say I want to select a specific data point from the list. Let’s say I want to know the Bond actors. I know that information was stored within a data frame within the list. To pull that data, I use a combination of the filtering techniques we’ve covered this chapter.

First, I have to pull the data frame from the list. I do that with the $ notation. Then, we treat the object type as a normal data frame.

Here’s what I mean:

  list1$bond[3]
##             actor
## 1    Daniel Craig
## 2    Sean Connery
## 3    Sean Connery
## 4    Daniel Craig
## 5     Roger Moore
## 6    Sean Connery
## 7     Roger Moore
## 8    Daniel Craig
## 9     Roger Moore
## 10   Sean Connery
## 11   Daniel Craig
## 12   Sean Connery
## 13 Pierce Brosnan
## 14 Pierce Brosnan
## 15 George Lazenby
## 16 Pierce Brosnan
## 17    Roger Moore
## 18 Pierce Brosnan
## 19    Roger Moore
## 20   Sean Connery
## 21    Roger Moore
## 22 Timothy Dalton
## 23    Roger Moore
## 24 Timothy Dalton
  list1$bond$actor
##  [1] "Daniel Craig"   "Sean Connery"   "Sean Connery"   "Daniel Craig"  
##  [5] "Roger Moore"    "Sean Connery"   "Roger Moore"    "Daniel Craig"  
##  [9] "Roger Moore"    "Sean Connery"   "Daniel Craig"   "Sean Connery"  
## [13] "Pierce Brosnan" "Pierce Brosnan" "George Lazenby" "Pierce Brosnan"
## [17] "Roger Moore"    "Pierce Brosnan" "Roger Moore"    "Sean Connery"  
## [21] "Roger Moore"    "Timothy Dalton" "Roger Moore"    "Timothy Dalton"
  list1$bond["actor"]
##             actor
## 1    Daniel Craig
## 2    Sean Connery
## 3    Sean Connery
## 4    Daniel Craig
## 5     Roger Moore
## 6    Sean Connery
## 7     Roger Moore
## 8    Daniel Craig
## 9     Roger Moore
## 10   Sean Connery
## 11   Daniel Craig
## 12   Sean Connery
## 13 Pierce Brosnan
## 14 Pierce Brosnan
## 15 George Lazenby
## 16 Pierce Brosnan
## 17    Roger Moore
## 18 Pierce Brosnan
## 19    Roger Moore
## 20   Sean Connery
## 21    Roger Moore
## 22 Timothy Dalton
## 23    Roger Moore
## 24 Timothy Dalton

I do the same with the other object types here:

  list1$vector1[3]
## [1] TRUE
  list1$vector3[v3>=2]
## [1] 2 3
  list1$array1[,2,]
##      [,1] [,2]      [,3]
## [1,]    2    2 0.6666667
## [2,]    2    2 0.6666667
## [3,]    2    2 0.6666667
## [4,]    2    2 0.6666667

5.7 Things to Remember

  • How you filter and transform data depends upon the object type

5.8 Exercises

Try to see if you can complete the following exercises. Answers are in the back of the book!

  1. Filter the following vector to values greater than 2. Display the actual number (not TRUE / FALSE).
  q1 <- seq(1,20,2)
  1. Filter the following vector to values between 20 and 30, but only for the first three entries that meet that criteria. (Hint: add [n:n] for the range of values after you determine which values meet that criteria)
  q2 <- round(rnorm(20,32,7),0)
  1. Multiply the following matrices together.
  q3_1 <- matrix(round(seq(1,40,3.27),0),3)
  q3_2 <- matrix(seq(1,8,1),4)
  1. Subtract 41 from every entry in the second column of the following matrix. Replace the column with those new values.
  q4 <- matrix(seq(1,120,4),10,3)
  1. Select the second row from each matrix in the following array. Subtract 5 from those rows.
  q5 <- array(data=c(matrix(seq(1,15,1),5,3),
                    matrix(seq(4,60,4),5,3),
                    matrix(seq(2,30,2),5,3)),
              dim=c(5,3,3))
  1. Filter the James Bond data frame to only films starring Roger Moore.
  bond
  1. Filter the James Bond data frame to films starring Sean Connery made after 1966.
  bond