Data Structure: Data Frames

Topics covered in this section are

  ➤ Creating data frames

  ➤ Finding data frame attributes

  ➤ Adding rows and columns to the data frame

  ➤ Changing Variable Name

  ➤ Deleting rows from the data frame

  ➤ Deleting columns from the data frame

  ➤ Extracting rows and columns from the data frame

  ➤ Editing the data frame on a spreadsheet-style editor

Data Structure: Data Frames

↪ Creating data frames

Data Frames are matrices where columns can be of different types. For example, one column can be a number, while another could be characters.

Data Frame Properties

  • Columns contain the same number of data items
  • Row name should be unique
  • Columns should have a name
  • Columns can be of different types: numeric, character, logical
  • The data.frame() function is used for constructing the data frame.

          name = c("Micky", "Minny", "Goofy", "Donald", "Daisy")
          age = c(10,8,12,6,7)
          frame = data.frame(name,age)
          frame
    
          ---Output---       - name age       1 Micky 10       2 Minny 8       3 Goofy 12       4 Donald 6       5 Daisy 7

    By default, data.frame() function converts character vector into factor. To suppress this behavior, argument stringsAsFactors = FALSE should be used.

          name = c("Micky", "Minny", "Goofy", "Donald", "Daisy")
          age = c(10,8,12,6,7)
          frame = data.frame(name,age,stringsAsFactors = FALSE)
          typeof(frame)
    
          ---Output---       [1] "list"

    Data Structure: Data Frames

    ↪ Finding data frame attributes

    The str() function is used to display the data frame's structure.

          str(frame)
    
          ---Output---       'data.frame': 5 obs. of 2 variables:        $ name: chr "Micky" "Minny" "Goofy" "Donald" ...       $ age : num 10 8 12 6 7

    The summary() function is used for displaying statistical summary and nature of the data.

          summary(frame)
    
          ---Output---       - name age       Length:5 Min. : 6.0       Class :character 1st Qu.: 7.0       Mode :character Median : 8.0       - Mean : 8.6       - 3rd Qu.:10.0       - Max. :12.0

    The nrow(), ncol(), and dim() functions are used for finding number of rows and columns in the data frame.

          nrow(frame)     # Prints 5
          ncol(frame)     # Prints 2
          dim(frame)      # Prints 5 2
    
    

    The nrownames(), ncolnames(), and dimnames() functions are used for displaying row and column names.

          rownames(frame)     # Prints [1] "1" "2" "3" "4" "5" 
          colnames(frame)     # Prints [1] "name"   "age" 
          dimnames(frame)
    
    

    Data Structure: Data Frames

    ↪ Adding rows and columns to the data frame

    A column can be added to the data frame by cbind() function. The following code shows how height column is added to frame.

          height = c(70.2, 60.5,110,45.2,35)
          frame = cbind(frame,height)
          frame
    
          ---Output---       - name age height       1 Micky 10 70.2       2 Minny 8 60.5       3 Goofy 12 110.0       4 Donald 6 45.2       5 Daisy 7 35.0

    A column to the data frame can also be added using a new column name frame$weight or frame['weight']. The following code shows how weight column added to frame.

          frame$weight = c(25,22,40,15,10)
          frame
    
          ---Output---       - name age height weight       1 Micky 10 70.2 25       2 Minny 8 60.5 22       3 Goofy 12 110.0 40       4 Donald 6 45.2 15       5 Daisy 7 35.0 10

    A row can be added to the data frame using the rbind() function. The following code shows how a row 6 was added to frame.

          new_row = list(name="Albert",age=45,height=70,weight=26)
          frame = rbind(frame,new_row, stringsAsFactors=FALSE)
          frame
    
          ---Output---       - name age height weight       1 Micky 10 70.2 25       2 Minny 8 60.5 22       3 Goofy 12 110.0 40       4 Donald 6 45.2 15       5 Daisy 7 35.0 10       6 Albert 45 70.0 26

    A row can be added using the nrow() function. The following code shows how a row 7 was added to frame.

          frame[nrow(frame) + 1,] <- list(name="Gammie",age=50,height=35,weight=35)
          frame
    
          ---Output---       - name age height weight       1 Micky 10 70.2 25       2 Minny 8 60.5 22       3 Goofy 12 110.0 40       4 Donald 6 45.2 15       5 Daisy 7 35.0 10       6 Albert 45 70.0 26       7 Gammie 50 35.0 35

    Two data frames can be merged using the merge() function.

          name <- c("Micky", "Minny", "Goofy", "Donald", "Daisy")
          gender <- c("M", "F", "M", "M", "F")
          frame2 <- data.frame(name, gender, stringsAsFactors = FALSE)
          frame3 <- merge(frame, frame2, by="name")
          frame3
    
          ---Output---       - name age height weight gender       1 Daisy 7 35.0 10 F       2 Donald 6 45.2 15 M       3 Goofy 12 110.0 40 M       4 Micky 10 70.2 25 M       5 Minny 8 60.5 22 F

    Data Structure: Data Frames

    ↪ Changing Variable Name

    Variable names can be changed interactively or programmatically. The fix() function can be used for an interactive editor where the variable name can be changed.

          fix(frame3)
    
    

    Variable name can be changed programmatically using the function names(), For example, the following code shows how the column name has been renamed to Name.

          names(frame3)[1] <- "Name"
          frame3
    
          ---Output---       - Name age height weight gender       1 Daisy 7 35.0 10 F       2 Donald 6 45.2 15 M       3 Goofy 12 110.0 40 M       4 Micky 10 70.2 25 M       5 Minny 8 60.5 22 F

    Data Structure: Data Frames

    ↪ Deleting rows from the data frame

    Using exclude vector

          # Reference frame
          frame
    
          ---Output---       - name age height weight       1 Micky 10 70.2 25       2 Minny 8 60.5 22       3 Goofy 12 110.0 40       4 Donald 6 45.2 15       5 Daisy 7 35.0 10       6 Albert 45 70.0 26       7 Gammie 50 35.0 35
          frame[-c(7),]    # Removes 7th row from the frame frame
    
    

    Using a boolean vector

          frame[c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE),]
                            # Keeps the first 5 rows in the frame
    
    

    Using comparison operator

          frame[c(frame$age > 9),]   # Keeps the rows where the age > 9
    
    

    Data Structure: Data Frames

    ↪ Deleting columns from the data frame

    Using column index

          # Reference frame
          frame
    
          ---Output---       - name age height weight       1 Micky 10 70.2 25       2 Minny 8 60.5 22       3 Goofy 12 110.0 40       4 Donald 6 45.2 15       5 Daisy 7 35.0 10       6 Albert 45 70.0 26       7 Gammie 50 35.0 35
          frame[,-c(4)]     # Removes 4th column - weight
          frame[,-c(2)]     # Removes 2nd column - age
    
    

    Using NULL

          frame$weight <- NULL  # Removes column - weight
          frame
    
    

    Using the subset() function

          frame$weight <- c(25,22,40,15,10,26,35)
          frame
          subset(frame, select = -c(2))       # Removed 2nd column - age
    
    

    Data Structure: Data Frames

    ↪ Extracting rows and columns from the data frame

    Extracting a row

          # Reference frame
          frame
    
          ---Output---       - name age height weight       1 Micky 10 70.2 25       2 Minny 8 60.5 22       3 Goofy 12 110.0 40       4 Donald 6 45.2 15       5 Daisy 7 35.0 10       6 Albert 45 70.0 26       7 Gammie 50 35.0 35
          frame[1,]       # Prints first row
    
    

    Extracting a column

          frame[,1]       # Prints first column
    
    

    Extracting everything from the first 3 rows

          frame[1:3,]    # Prints first 3 rows
    
    

    Extracting everything from the first two columns

          frame[,1:2]    # Prints first two columns
    
    

    Extracting everything from 1st, 2nd and 5th row

          frame[c(1,2,5),]
    
    

    Extracting everything from 1st and 3rd column

          frame[,c(1,3)]
    
    

    Data Structure: Data Frames

    ↪ Editing the data frame on a spreadsheet-style editor

          frame2 = edit(data.frame())
          frame3 = edit(frame)