Reading and Writing data
Topics covered in this section are
➤ Reading tabular data from a file
➤ Writing Data Frame to a file
➤ Reading CSV file
➤ Modifying the Data Frame
➤ Writing the Data Frame into the CSV file
➤ Reading Excel file
➤ Writing object to a file in ASCII text
➤ Serializing Objects
➤ Reading data from the connection interface
Reading and Writing data
↪ Reading tabular data from a file
The read.table() function can read tabular data with various delimitations such as TSV (Tab Separated Values), CSV (Comma Separated Values), and so on into a Data Frame.
dtable <- read.table("students.csv", sep = ",") # Notice sep value
dtable
---Output---
- V1 V2 V3 V4
1 Name Class Age Gender
2 Mark 5 10 M
3 Mary 5 10 F
4 Arjun 6 11 M
5 Aruna 7 12 F
In the above statement, the headers are also treated as data columns and new names for the columns are assigned.
The argument header=TRUE tells R that the first row consists of variable names rather than actual data.
dtable <- read.table("students.csv", header=TRUE, sep = ",")
dtable
---Output---
- Name Class Age Gender
1 Mark 5 10 M
2 Mary 5 10 F
3 Arjun 6 11 M
4 Aruna 7 12 F
Reading and Writing data
↪ Writing Data Frame to a file
The write.table() function writes the specified Data Frame to a file. The sep argument is used for specifying a delimiter to separate the columns. The popular choices are comma (sep=”,”), space (sep=” “), and tab (sep=” “).
write.table(dtable,"students2.txt",sep=",")
Reading and Writing data
↪ Reading CSV file
The read.csv() function is a special case of the read.table() function. The read.csv() function with the argument header = TRUE or without a header argument indicates that the first line of the file is a 'header' or names of the variables.Reading a .csv file into a Data Frame.
stud <- read.csv("students.csv")
stud
str(stud)
The argument stringsAsFactors = FALSE prevents R from converting all text variables into factors.
stud <- read.csv("students.csv",stringsAsFactors = FALSE)
stud
str(stud)
The argument header = FALSE should be passed when the input file does not contain a header line.
stud <- read.csv("students.csv",header = FALSE)
stud
---Output---
- V1 V2 V3 V4 # automatically generated row header
1 Name Class Age Gender
2 Mark 5 10 M
3 Mary 5 10 F
4 Arjun 6 11 M
5 Aruna 7 12 F
Reading and Writing data
↪ Modifying the Data Frame
Adding a row into Data Frame.
stud <- read.csv("students.csv")
stud[nrow(stud) + 1,] <- list(Name="Peter",Class=7,Age=12,Genger="M")
stud
---Output---
- Name Class Age Gender
1 Mark 5 10 M
2 Mary 5 10 F
3 Arjun 6 11 M
4 Aruna 7 12 F
5 Peter 7 12 M # new row from the list above added
Adding a column into Data Frame.
stud$Score = c(79,80,52,63,75) # adds Score column to the frame stud stud
---Output--- - Name Class Age Gender Score # Score column added 1 Mark 5 10 M 79 2 Mary 5 10 F 80 3 Arjun 6 11 M 52 4 Aruna 7 12 F 63 5 Peter 7 12 M 75
Reading and Writing data
↪ Writing the Data Frame into the CSV file
Writing data back to CSV file. The row.names parameter overrides the default row names in the CSV file.
write.csv(stud, file="students2.csv", # saves stud data frame
row.names = FALSE) # into a file students2.csv
stud2 <- read.csv("students2.csv",stringsAsFactors = FALSE)
stud2
Reading and Writing data
↪ Reading Excel file
Reading excel worksheets directly using the 'readxl' package. The readxl package can be used to read both .xls and .xlsx files. The read_excel() function imports a worksheet into a data frame. The read_excel(file, n), where the file is the path to an excel file, n is the number of the worksheet to be imported, and the first line of the worksheet contains the variable names.
install.packages("readxl")
library(readxl)
stud_data_frame <- read_xlsx("students2.xlsx", 1)
stud_data_frame
The read_excel function's range option helps to import data from specific cell ranges from excel sheet.
library(readxl)
stud_data_frame <- read_xlsx("students2.xlsx", 1, range="students2!A1:C5")
stud_data_frame
Reading and Writing data
↪ Writing object to a file in ASCII text
The function dput() writes an ASCII text representation of an R object to a file. Unlike writing objects out in a CSV file, dput() preserves the metadata. The saved file can be read back in using the function dget().
dput(dtable) # Writes 'dtable' contents into the console
dput(dtable, file="student3.dput") # Writes 'dtable' contents into a file
dp <- dget("student3.dput") # Reads the data from a file
dp
The dump() function takes a vector of R objects and produces an ACSII text representations of the objects into a file. Similar to dput(), the dump() preserve the metadata. The saved file can be read back in using the function source().
rb <- c("violet", "indigo", "blue", "green", "yellow",
"orange", "red")
dtable <- read.table("students.csv", sep = ",") # Notice sep value
rb
dtable
dump(c("rb", "dtable"),"") # dumping vector to console
dump(c("rb", "dtable"), file = "student4.dump") # dumping vector to file
rm(rb,dtable) # remove rb and dtable objects
rb # Error: object 'rb' not found
dtable # Error: object 'dtable' not found
source("student4.dump") # read the data from file
rb # vector rb displayed
dtable # vector dtable displayed
Reading and Writing data
↪ Writing object to a file in ASCII text
While the files saved by dput() and dump() functions can be used for transferring objects, this is not a good way to transfer objects between R sessions. The save() and saveRDS() functions save data in a binary format and are designed to be used for transporting R data.
While saving the data in a textual format and reading from it, the numerical data could often lose precision. Saving those data in binary format works better.
The save() function writes an external representation of R objects to the specified file. The objects can be read back from the file at a later stage by using the load() function.
rb <- c("violet", "indigo", "blue", "green", "yellow",
"orange", "red")
dtable <- read.table("students.csv", sep = ",")
rb # vector rb displayed
dtable # vector dtable displayed
save(rb,dtable, file = "student5.rda") # save data in binary format
rm(rb,dtable) # remove rb and dtable objects
rb # Error: object 'rb' not found
dtable # Error: object 'dtable' not found
load("student5.rda") # read the data from file
rb
dtable
Reading and Writing data
↪ Serializing Objects
The serialize() function serializes the object to the specified connection. If connection is NULL then the object is serialized to a raw vector, which is returned as the result of serialize. The unserialize() function reads a serialized object from the connection or a raw vector.
rb <- c("violet", "indigo", "blue", "green", "yellow",
"orange","red")
rb
serialize(rb, NULL) # serialized data displayed to console
rv <- serialize(rb, NULL) # serilized data assigned to vector
rv
rm(rb) # remove rb object
rb # Error: object 'rb' not found
unserialize(rv, refhook = NULL) # unserialize rv
rb <- unserialize(rv, refhook = NULL)
rb
Reading and Writing data
↪ Reading data from the connection interface
The data can be read using connection interfaces. Connections can be made to files, compressed files, or URLs
The file() function is used for making connections to the text files.
con <- file("students.txt") # Create a connection to a file
open(con, "r") # Open connection in a read-only mode
dat <- read.csv(con) # Read from the connection
dat
close(con) # Close the connection
The above sequence is the same as
dat <- read.csv("students.txt")
dat
Text files can be read line by line using the readLines() function.
con <- file("students.txt") # Create a connection to a file
open(con, "r") # Open connection in a read-only mode
dat <- readLines(con, 3) # Read first 3 lines from the file
dat
close(con) # Close the connection
The gzfiles() function is used for creating a connection to the gzipped file.
con <- gzfile("students.txt.gz") # Create a connection to a file
open(con, "r") # Open connection in a read-only mode
dat <- read.csv(con) # Read the first 3 lines from the file
dat
close(con) # Close the connection
The url() function is used for creating a connection to a URL. The readLines() function can be used for reading the lines of the URL.
con <- url("https://cran.r-project.org/") # Create a connection to a webpage
dat <- readLines(con, 5) # Read first 5 lines
head(dat)
dat
close(con) # Close the connection
Reading and Writing data
↪ Summary
