Data Structure: Data Frames
Topics covered in this section are
➤ Creating data frames
➤ Finding data frame attributes
➤ Adding rows and columns to the data frame
➤ Changing Variable Name
➤ Deleting rows from the data frame
➤ Deleting columns from the data frame
➤ Extracting rows and columns from the data frame
➤ Editing the data frame on a spreadsheet-style editor
Data Structure: Data Frames
↪ Creating data frames
Data Frames are matrices where columns can be of different types. For example, one column can be a number, while another could be characters.
Data Frame Properties
The data.frame() function is used for constructing the data frame.
name = c("Micky", "Minny", "Goofy", "Donald", "Daisy") age = c(10,8,12,6,7) frame = data.frame(name,age) frame
---Output--- - name age 1 Micky 10 2 Minny 8 3 Goofy 12 4 Donald 6 5 Daisy 7
By default, data.frame() function converts character vector into factor. To suppress this behavior, argument stringsAsFactors = FALSE should be used.
name = c("Micky", "Minny", "Goofy", "Donald", "Daisy") age = c(10,8,12,6,7) frame = data.frame(name,age,stringsAsFactors = FALSE) typeof(frame)
---Output--- [1] "list"
Data Structure: Data Frames
↪ Finding data frame attributes
The str() function is used to display the data frame's structure.
str(frame)
---Output--- 'data.frame': 5 obs. of 2 variables: $ name: chr "Micky" "Minny" "Goofy" "Donald" ... $ age : num 10 8 12 6 7
The summary() function is used for displaying statistical summary and nature of the data.
summary(frame)
---Output--- - name age Length:5 Min. : 6.0 Class :character 1st Qu.: 7.0 Mode :character Median : 8.0 - Mean : 8.6 - 3rd Qu.:10.0 - Max. :12.0
The nrow(), ncol(), and dim() functions are used for finding number of rows and columns in the data frame.
nrow(frame) # Prints 5 ncol(frame) # Prints 2 dim(frame) # Prints 5 2
The nrownames(), ncolnames(), and dimnames() functions are used for displaying row and column names.
rownames(frame) # Prints [1] "1" "2" "3" "4" "5" colnames(frame) # Prints [1] "name" "age" dimnames(frame)
Data Structure: Data Frames
↪ Adding rows and columns to the data frame
A column can be added to the data frame by cbind() function. The following code shows how height column is added to frame.
height = c(70.2, 60.5,110,45.2,35) frame = cbind(frame,height) frame
---Output--- - name age height 1 Micky 10 70.2 2 Minny 8 60.5 3 Goofy 12 110.0 4 Donald 6 45.2 5 Daisy 7 35.0
A column to the data frame can also be added using a new column name frame$weight or frame['weight']. The following code shows how weight column added to frame.
frame$weight = c(25,22,40,15,10) frame
---Output--- - name age height weight 1 Micky 10 70.2 25 2 Minny 8 60.5 22 3 Goofy 12 110.0 40 4 Donald 6 45.2 15 5 Daisy 7 35.0 10
A row can be added to the data frame using the rbind() function. The following code shows how a row 6 was added to frame.
new_row = list(name="Albert",age=45,height=70,weight=26) frame = rbind(frame,new_row, stringsAsFactors=FALSE) frame
---Output--- - name age height weight 1 Micky 10 70.2 25 2 Minny 8 60.5 22 3 Goofy 12 110.0 40 4 Donald 6 45.2 15 5 Daisy 7 35.0 10 6 Albert 45 70.0 26
A row can be added using the nrow() function. The following code shows how a row 7 was added to frame.
frame[nrow(frame) + 1,] <- list(name="Gammie",age=50,height=35,weight=35) frame
---Output--- - name age height weight 1 Micky 10 70.2 25 2 Minny 8 60.5 22 3 Goofy 12 110.0 40 4 Donald 6 45.2 15 5 Daisy 7 35.0 10 6 Albert 45 70.0 26 7 Gammie 50 35.0 35
Two data frames can be merged using the merge() function.
name <- c("Micky", "Minny", "Goofy", "Donald", "Daisy") gender <- c("M", "F", "M", "M", "F") frame2 <- data.frame(name, gender, stringsAsFactors = FALSE) frame3 <- merge(frame, frame2, by="name") frame3
---Output--- - name age height weight gender 1 Daisy 7 35.0 10 F 2 Donald 6 45.2 15 M 3 Goofy 12 110.0 40 M 4 Micky 10 70.2 25 M 5 Minny 8 60.5 22 F
Data Structure: Data Frames
↪ Changing Variable Name
Variable names can be changed interactively or programmatically. The fix() function can be used for an interactive editor where the variable name can be changed.
fix(frame3)
Variable name can be changed programmatically using the function names(), For example, the following code shows how the column name has been renamed to Name.
names(frame3)[1] <- "Name" frame3
---Output--- - Name age height weight gender 1 Daisy 7 35.0 10 F 2 Donald 6 45.2 15 M 3 Goofy 12 110.0 40 M 4 Micky 10 70.2 25 M 5 Minny 8 60.5 22 F
Data Structure: Data Frames
↪ Deleting rows from the data frame
Using exclude vector
# Reference frame frame
---Output--- - name age height weight 1 Micky 10 70.2 25 2 Minny 8 60.5 22 3 Goofy 12 110.0 40 4 Donald 6 45.2 15 5 Daisy 7 35.0 10 6 Albert 45 70.0 26 7 Gammie 50 35.0 35
frame[-c(7),] # Removes 7th row from the frame frame
Using a boolean vector
frame[c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE),] # Keeps the first 5 rows in the frame
Using comparison operator
frame[c(frame$age > 9),] # Keeps the rows where the age > 9
Data Structure: Data Frames
↪ Deleting columns from the data frame
Using column index
# Reference frame frame
---Output--- - name age height weight 1 Micky 10 70.2 25 2 Minny 8 60.5 22 3 Goofy 12 110.0 40 4 Donald 6 45.2 15 5 Daisy 7 35.0 10 6 Albert 45 70.0 26 7 Gammie 50 35.0 35
frame[,-c(4)] # Removes 4th column - weight frame[,-c(2)] # Removes 2nd column - age
Using NULL
frame$weight <- NULL # Removes column - weight frame
Using the subset() function
frame$weight <- c(25,22,40,15,10,26,35) frame subset(frame, select = -c(2)) # Removed 2nd column - age
Data Structure: Data Frames
↪ Extracting rows and columns from the data frame
Extracting a row
# Reference frame frame
---Output--- - name age height weight 1 Micky 10 70.2 25 2 Minny 8 60.5 22 3 Goofy 12 110.0 40 4 Donald 6 45.2 15 5 Daisy 7 35.0 10 6 Albert 45 70.0 26 7 Gammie 50 35.0 35
frame[1,] # Prints first row
Extracting a column
frame[,1] # Prints first column
Extracting everything from the first 3 rows
frame[1:3,] # Prints first 3 rows
Extracting everything from the first two columns
frame[,1:2] # Prints first two columns
Extracting everything from 1st, 2nd and 5th row
frame[c(1,2,5),]
Extracting everything from 1st and 3rd column
frame[,c(1,3)]
Data Structure: Data Frames
↪ Editing the data frame on a spreadsheet-style editor
frame2 = edit(data.frame()) frame3 = edit(frame)