Reading and Writing data

Topics covered in this section are

  ➤ Reading tabular data from a file

  ➤ Writing Data Frame to a file

  ➤ Reading CSV file

  ➤ Modifying the Data Frame

  ➤ Writing the Data Frame into the CSV file

  ➤ Reading Excel file

  ➤ Writing object to a file in ASCII text

  ➤ Serializing Objects

  ➤ Reading data from the connection interface

Reading and Writing data

↪ Reading tabular data from a file

The read.table() function can read tabular data with various delimitations such as TSV (Tab Separated Values), CSV (Comma Separated Values), and so on into a Data Frame.

      dtable <- read.table("students.csv", sep = ",") # Notice sep value
      dtable

      ---Output---       - V1 V2 V3 V4       1 Name Class Age Gender       2 Mark 5 10 M       3 Mary 5 10 F       4 Arjun 6 11 M       5 Aruna 7 12 F

In the above statement, the headers are also treated as data columns and new names for the columns are assigned.

The argument header=TRUE tells R that the first row consists of variable names rather than actual data.

      dtable <- read.table("students.csv", header=TRUE, sep = ",")
      dtable

      ---Output---       - Name Class Age Gender       1 Mark 5 10 M       2 Mary 5 10 F       3 Arjun 6 11 M       4 Aruna 7 12 F

Reading and Writing data

↪ Writing Data Frame to a file

The write.table() function writes the specified Data Frame to a file. The sep argument is used for specifying a delimiter to separate the columns. The popular choices are comma (sep=”,”), space (sep=” “), and tab (sep=” “).

      write.table(dtable,"students2.txt",sep=",")

Reading and Writing data

↪ Reading CSV file

The read.csv() function is a special case of the read.table() function. The read.csv() function with the argument header = TRUE or without a header argument indicates that the first line of the file is a 'header' or names of the variables.

Reading a .csv file into a Data Frame.

      stud <- read.csv("students.csv")
      stud
      str(stud)

The argument stringsAsFactors = FALSE prevents R from converting all text variables into factors.

      stud <- read.csv("students.csv",stringsAsFactors = FALSE)
      stud
      str(stud)

The argument header = FALSE should be passed when the input file does not contain a header line.

      stud <- read.csv("students.csv",header = FALSE)
      stud

      ---Output---       - V1 V2 V3 V4 # automatically generated row header       1 Name Class Age Gender       2 Mark 5 10 M       3 Mary 5 10 F       4 Arjun 6 11 M       5 Aruna 7 12 F

Reading and Writing data

↪ Modifying the Data Frame

Adding a row into Data Frame.

      stud <- read.csv("students.csv")
      stud[nrow(stud) + 1,] <- list(Name="Peter",Class=7,Age=12,Genger="M")
      stud

      ---Output---       - Name Class Age Gender       1 Mark 5 10 M       2 Mary 5 10 F       3 Arjun 6 11 M       4 Aruna 7 12 F       5 Peter 7 12 M # new row from the list above added

Adding a column into Data Frame.

      stud$Score = c(79,80,52,63,75)       # adds Score column to the frame stud
      stud

      ---Output---       - Name Class Age Gender Score # Score column added       1 Mark 5 10 M 79       2 Mary 5 10 F 80       3 Arjun 6 11 M 52       4 Aruna 7 12 F 63       5 Peter 7 12 M 75

Reading and Writing data

↪ Writing the Data Frame into the CSV file

Writing data back to CSV file. The row.names parameter overrides the default row names in the CSV file.

      write.csv(stud, file="students2.csv",   # saves stud data frame
                row.names = FALSE)            # into a file students2.csv
      stud2 <- read.csv("students2.csv",stringsAsFactors = FALSE)
      stud2

Reading and Writing data

↪ Reading Excel file

Reading excel worksheets directly using the 'readxl' package. The readxl package can be used to read both .xls and .xlsx files. The read_excel() function imports a worksheet into a data frame. The read_excel(file, n), where the file is the path to an excel file, n is the number of the worksheet to be imported, and the first line of the worksheet contains the variable names.

      install.packages("readxl")
      library(readxl)
      stud_data_frame <- read_xlsx("students2.xlsx", 1)
      stud_data_frame

The read_excel function's range option helps to import data from specific cell ranges from excel sheet.

      library(readxl)
      stud_data_frame <- read_xlsx("students2.xlsx", 1, range="students2!A1:C5")
      stud_data_frame

Reading and Writing data

↪ Writing object to a file in ASCII text

The function dput() writes an ASCII text representation of an R object to a file. Unlike writing objects out in a CSV file, dput() preserves the metadata. The saved file can be read back in using the function dget().

      dput(dtable)                         # Writes 'dtable' contents into the console
      dput(dtable, file="student3.dput")   # Writes 'dtable' contents into a file
      dp <- dget("student3.dput")          # Reads the data from a file
      dp

The dump() function takes a vector of R objects and produces an ACSII text representations of the objects into a file. Similar to dput(), the dump() preserve the metadata. The saved file can be read back in using the function source().

      rb <- c("violet", "indigo", "blue", "green", "yellow", 
              "orange", "red")
      dtable <- read.table("students.csv", sep = ",")  # Notice sep value
      rb
      dtable
      dump(c("rb", "dtable"),"")                       # dumping vector to console
      dump(c("rb", "dtable"), file = "student4.dump")  # dumping vector to file
      rm(rb,dtable)                                    # remove rb and dtable objects
      rb                                               # Error: object 'rb' not found
      dtable                                           # Error: object 'dtable' not found
      source("student4.dump")                          # read the data from file
      rb                                               # vector rb displayed
      dtable                                           # vector dtable displayed

Reading and Writing data

↪ Writing object to a file in ASCII text

While the files saved by dput() and dump() functions can be used for transferring objects, this is not a good way to transfer objects between R sessions. The save() and saveRDS() functions save data in a binary format and are designed to be used for transporting R data.

While saving the data in a textual format and reading from it, the numerical data could often lose precision. Saving those data in binary format works better.

The save() function writes an external representation of R objects to the specified file. The objects can be read back from the file at a later stage by using the load() function.

      rb <- c("violet", "indigo", "blue", "green", "yellow", 
              "orange", "red")
      dtable <- read.table("students.csv", sep = ",")
      rb                                             # vector rb displayed
      dtable                                         # vector dtable displayed
      save(rb,dtable, file = "student5.rda")         # save data in binary format
      rm(rb,dtable)                                  # remove rb and dtable objects
      rb                                             # Error: object 'rb' not found
      dtable                                         # Error: object 'dtable' not found
      load("student5.rda")                           # read the data from file
      rb
      dtable

Reading and Writing data

↪ Serializing Objects

The serialize() function serializes the object to the specified connection. If connection is NULL then the object is serialized to a raw vector, which is returned as the result of serialize. The unserialize() function reads a serialized object from the connection or a raw vector.

      rb <- c("violet", "indigo", "blue", "green", "yellow", 
              "orange","red")
      rb
      serialize(rb, NULL)                          # serialized data displayed to console
      rv <- serialize(rb, NULL)                    # serilized data assigned to vector
      rv
      rm(rb)                                       # remove rb object
      rb                                           # Error: object 'rb' not found
      unserialize(rv, refhook = NULL)              # unserialize rv
      rb <- unserialize(rv, refhook = NULL)
      rb

Reading and Writing data

↪ Reading data from the connection interface

The data can be read using connection interfaces. Connections can be made to files, compressed files, or URLs

The file() function is used for making connections to the text files.

      con <- file("students.txt")  # Create a connection to a file
      open(con, "r")               # Open connection in a read-only mode
      dat <- read.csv(con)         # Read from the connection
      dat
      close(con)                   # Close the connection

The above sequence is the same as

      dat <- read.csv("students.txt")
      dat

Text files can be read line by line using the readLines() function.

      con <- file("students.txt")  # Create a connection to a file
      open(con, "r")               # Open connection in a read-only mode
      dat <- readLines(con, 3)     # Read first 3 lines from the file
      dat
      close(con)                   # Close the connection

The gzfiles() function is used for creating a connection to the gzipped file.

      con <- gzfile("students.txt.gz")  # Create a connection to a file
      open(con, "r")                    # Open connection in a read-only mode
      dat <- read.csv(con)              # Read the first 3 lines from the file
      dat
      close(con)                        # Close the connection

The url() function is used for creating a connection to a URL. The readLines() function can be used for reading the lines of the URL.

      con <- url("https://cran.r-project.org/") # Create a connection to a webpage
      dat <- readLines(con, 5)             # Read first 5 lines
      head(dat)
      dat
      close(con)                           # Close the connection

Reading and Writing data

↪ Summary

  • The read.table() and read.csv() functions are used for reading tabular data from a text file.
  • The write.table() and write.csv functions are used for writing tabular data into a file.
  • The xlsx file format can be read from read_xlsx() function which is part of 'readxl' package.
  • The dput() and dget() functions are used for reading and writing ASCII text representation of an R object to and from a file.
  • The save() and saveRDS() functions save data in a binary format and are designed to be used for transporting R data.
  • The serialize() function serializes the object to the specified connection. The unserialize() function reads a serialized object from the connection or a raw vector.