Reading and Writing data
Topics covered in this section are
➤ Reading tabular data from a file
➤ Writing Data Frame to a file
➤ Reading CSV file
➤ Modifying the Data Frame
➤ Writing the Data Frame into the CSV file
➤ Reading Excel file
➤ Writing object to a file in ASCII text
➤ Serializing Objects
➤ Reading data from the connection interface
Reading and Writing data
↪ Reading tabular data from a file
The read.table() function can read tabular data with various delimitations such as TSV (Tab Separated Values), CSV (Comma Separated Values), and so on into a Data Frame.
dtable <- read.table("students.csv", sep = ",") # Notice sep value dtable
---Output--- - V1 V2 V3 V4 1 Name Class Age Gender 2 Mark 5 10 M 3 Mary 5 10 F 4 Arjun 6 11 M 5 Aruna 7 12 F
In the above statement, the headers are also treated as data columns and new names for the columns are assigned.
The argument header=TRUE tells R that the first row consists of variable names rather than actual data.
dtable <- read.table("students.csv", header=TRUE, sep = ",") dtable
---Output--- - Name Class Age Gender 1 Mark 5 10 M 2 Mary 5 10 F 3 Arjun 6 11 M 4 Aruna 7 12 F
Reading and Writing data
↪ Writing Data Frame to a file
The write.table() function writes the specified Data Frame to a file. The sep argument is used for specifying a delimiter to separate the columns. The popular choices are comma (sep=”,”), space (sep=” “), and tab (sep=” “).
write.table(dtable,"students2.txt",sep=",")
Reading and Writing data
↪ Reading CSV file
The read.csv() function is a special case of the read.table() function. The read.csv() function with the argument header = TRUE or without a header argument indicates that the first line of the file is a 'header' or names of the variables.Reading a .csv file into a Data Frame.
stud <- read.csv("students.csv") stud str(stud)
The argument stringsAsFactors = FALSE prevents R from converting all text variables into factors.
stud <- read.csv("students.csv",stringsAsFactors = FALSE) stud str(stud)
The argument header = FALSE should be passed when the input file does not contain a header line.
stud <- read.csv("students.csv",header = FALSE) stud
---Output--- - V1 V2 V3 V4 # automatically generated row header 1 Name Class Age Gender 2 Mark 5 10 M 3 Mary 5 10 F 4 Arjun 6 11 M 5 Aruna 7 12 F
Reading and Writing data
↪ Modifying the Data Frame
Adding a row into Data Frame.
stud <- read.csv("students.csv") stud[nrow(stud) + 1,] <- list(Name="Peter",Class=7,Age=12,Genger="M") stud
---Output--- - Name Class Age Gender 1 Mark 5 10 M 2 Mary 5 10 F 3 Arjun 6 11 M 4 Aruna 7 12 F 5 Peter 7 12 M # new row from the list above added
Adding a column into Data Frame.
stud$Score = c(79,80,52,63,75) # adds Score column to the frame stud stud
---Output--- - Name Class Age Gender Score # Score column added 1 Mark 5 10 M 79 2 Mary 5 10 F 80 3 Arjun 6 11 M 52 4 Aruna 7 12 F 63 5 Peter 7 12 M 75
Reading and Writing data
↪ Writing the Data Frame into the CSV file
Writing data back to CSV file. The row.names parameter overrides the default row names in the CSV file.
write.csv(stud, file="students2.csv", # saves stud data frame row.names = FALSE) # into a file students2.csv stud2 <- read.csv("students2.csv",stringsAsFactors = FALSE) stud2
Reading and Writing data
↪ Reading Excel file
Reading excel worksheets directly using the 'readxl' package. The readxl package can be used to read both .xls and .xlsx files. The read_excel() function imports a worksheet into a data frame. The read_excel(file, n), where the file is the path to an excel file, n is the number of the worksheet to be imported, and the first line of the worksheet contains the variable names.
install.packages("readxl") library(readxl) stud_data_frame <- read_xlsx("students2.xlsx", 1) stud_data_frame
The read_excel function's range option helps to import data from specific cell ranges from excel sheet.
library(readxl) stud_data_frame <- read_xlsx("students2.xlsx", 1, range="students2!A1:C5") stud_data_frame
Reading and Writing data
↪ Writing object to a file in ASCII text
The function dput() writes an ASCII text representation of an R object to a file. Unlike writing objects out in a CSV file, dput() preserves the metadata. The saved file can be read back in using the function dget().
dput(dtable) # Writes 'dtable' contents into the console dput(dtable, file="student3.dput") # Writes 'dtable' contents into a file dp <- dget("student3.dput") # Reads the data from a file dp
The dump() function takes a vector of R objects and produces an ACSII text representations of the objects into a file. Similar to dput(), the dump() preserve the metadata. The saved file can be read back in using the function source().
rb <- c("violet", "indigo", "blue", "green", "yellow", "orange", "red") dtable <- read.table("students.csv", sep = ",") # Notice sep value rb dtable dump(c("rb", "dtable"),"") # dumping vector to console dump(c("rb", "dtable"), file = "student4.dump") # dumping vector to file rm(rb,dtable) # remove rb and dtable objects rb # Error: object 'rb' not found dtable # Error: object 'dtable' not found source("student4.dump") # read the data from file rb # vector rb displayed dtable # vector dtable displayed
Reading and Writing data
↪ Writing object to a file in ASCII text
While the files saved by dput() and dump() functions can be used for transferring objects, this is not a good way to transfer objects between R sessions. The save() and saveRDS() functions save data in a binary format and are designed to be used for transporting R data.
While saving the data in a textual format and reading from it, the numerical data could often lose precision. Saving those data in binary format works better.
The save() function writes an external representation of R objects to the specified file. The objects can be read back from the file at a later stage by using the load() function.
rb <- c("violet", "indigo", "blue", "green", "yellow", "orange", "red") dtable <- read.table("students.csv", sep = ",") rb # vector rb displayed dtable # vector dtable displayed save(rb,dtable, file = "student5.rda") # save data in binary format rm(rb,dtable) # remove rb and dtable objects rb # Error: object 'rb' not found dtable # Error: object 'dtable' not found load("student5.rda") # read the data from file rb dtable
Reading and Writing data
↪ Serializing Objects
The serialize() function serializes the object to the specified connection. If connection is NULL then the object is serialized to a raw vector, which is returned as the result of serialize. The unserialize() function reads a serialized object from the connection or a raw vector.
rb <- c("violet", "indigo", "blue", "green", "yellow", "orange","red") rb serialize(rb, NULL) # serialized data displayed to console rv <- serialize(rb, NULL) # serilized data assigned to vector rv rm(rb) # remove rb object rb # Error: object 'rb' not found unserialize(rv, refhook = NULL) # unserialize rv rb <- unserialize(rv, refhook = NULL) rb
Reading and Writing data
↪ Reading data from the connection interface
The data can be read using connection interfaces. Connections can be made to files, compressed files, or URLs
The file() function is used for making connections to the text files.
con <- file("students.txt") # Create a connection to a file open(con, "r") # Open connection in a read-only mode dat <- read.csv(con) # Read from the connection dat close(con) # Close the connection
The above sequence is the same as
dat <- read.csv("students.txt") dat
Text files can be read line by line using the readLines() function.
con <- file("students.txt") # Create a connection to a file open(con, "r") # Open connection in a read-only mode dat <- readLines(con, 3) # Read first 3 lines from the file dat close(con) # Close the connection
The gzfiles() function is used for creating a connection to the gzipped file.
con <- gzfile("students.txt.gz") # Create a connection to a file open(con, "r") # Open connection in a read-only mode dat <- read.csv(con) # Read the first 3 lines from the file dat close(con) # Close the connection
The url() function is used for creating a connection to a URL. The readLines() function can be used for reading the lines of the URL.
con <- url("https://cran.r-project.org/") # Create a connection to a webpage dat <- readLines(con, 5) # Read first 5 lines head(dat) dat close(con) # Close the connection
Reading and Writing data
↪ Summary