The apply() family of functions

The apply family of functions in R repetitively performs an action, by applying a named function with one or several optional arguments, on data from matrices, arrays, lists, and data frames. The apply functions are essentially an alternative to loop constructs and these functions are compact, require less code, and run faster than loops.

The named function could be:

An aggregating function, like for example the mean, or the sum (that returns a number or scalar);

Transforming or Sub-setting functions

Vectorized functions, which yield more complex structures like lists, vectors, matrices, and arrays.

Topics covered in this section are

➤ apply(): Apply a function over the margins of an array

➤ lapply(): Loop over a list and evaluate a function on each element

➤ sapply(): Same as the lapply() function but try to simplify the result

➤ vapply(): Same as the sapply() function but has a pre-specified type of return value.

➤ rapply(): a recursive version of the lapply() function with flexibility in how the result is structured

➤ mapply(): Multivariate version of the lapply() function

➤ tapply(): Apply a function over subsets of a vector

The apply() family of functions

↪ The apply() function

The apply() function takes an array, data frame, or matrix as an input and gives output in vector, list, or array.

The syntax is

      apply(X, MARGIN, FUN, ..., simplify = TRUE)
      Where:
        X: an array, data frame, or matrix
        MARGIN:  take a value or range between 1 and 2 to define where to apply the function:
          =1: the manipulation is performed on rows
          =2: the manipulation is performed on columns
          =c(1,2): the manipulation is performed on rows and columns
        FUN: tells which function to apply
        ...: optional arguments to FUN
        simplify: boolean value indicating whether results should be simplified if possible

Example: The following block creates a fruitsframe

      fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
      jan <- c(10, 3, 12, 5, 8)
      feb <- c(12.5, 5, 14, 3, 10)
      mar <- c(10.5, 6, 10, 4, 13)
      fruitsframe <- data.frame(fruits,jan,feb,mar,stringsAsFactors = FALSE)
      fruitsframe

      ---Output---
              fruits jan  feb  mar
      1        apple  10 12.5 10.5
      2       banana   3  5.0  6.0
      3       cherry  12 14.0 10.0
      4 dragon fruit   5  3.0  4.0
      5   elderberry   8 10.0 13.0

The following code block computes sum of each rows of columns jan, feb, and mar.

      apply(fruitsframe[2:4], 1, sum)  # X is a dataframe of fruitsframe[2:4]
                                       # MARGIN is 1 i.e. apply the 'sum' of rows
                                       # FUN is a 'sum' aggregate function

      ---Output---
      [1] 33 14 36 12 31

The apply() family of functions

↪ The lapply() function

The lapply() function takes a list, vector, or data frame as input and gives output in the list. The returned list is of the same length as the input object, each element of which is the result of applying FUN to the corresponding element of the list.

The syntax is

        lapply(X, FUN, ...)
        Where
          X: list, vector, or data frame
          FUN: the function to be applied to each element of X
          ...: optional arguments to FUN

The argument FUN is applied to each element of the input X, and the returned object is a list which is having the same length as input object X.

In the example below, a fruits vector is passed on to the lapply() function to find sizes of each element of the vector.

      fruits <- c("apple", "banana", "dragon fruit")
      lapply(fruits, nchar)

      ---Output---
      [[1]]
      [1] 5
 
      [[2]]
      [1] 6
 
      [[3]]
      [1] 12

If the input X is not a list, it will be coerced into a list using as.list(). In the example below, the FUN square root is applied to each element of the matrix.

      m <- matrix(a<-(7:10),nrow=2, ncol=2)
      lapply(m, sqrt)

      ---Output---
      [[1]]
      [1] 2.645751
 
      [[2]]
      [1] 2.828427
 
      [[3]]
      [1] 3
 
      [[4]]
      [1] 3.162278

Here is another example of sqrt has been applied to each elements of the list.

      m1 <- matrix(a<-(9:10), nrow=2, ncol=1)
      x <- list(a=m1, b=100)
      lapply(x, sqrt)

      ---Output---
      $a
      [,1]
      [1,] 3.000000
      [2,] 3.162278
 
      $b
      [1] 10

The function FUN can be a user defined function. The following example shows anonymous FUN passed on to the lapply() function to find squares of the each elements of the list.

      m1 <- matrix(a<-(9:10), nrow=2, ncol=1)
      x <- list(a=m1, b=100)
      lapply(x, function(f) {f^2})

      ---Output---
      $a
      [,1]
      [1,]   81
      [2,]  100
 
      $b
      [1] 10000

The function FUN above can be a named function like the below example.

      FUN <- function(f) {f^2}
      lapply(x, FUN)

When a function FUN is passed to the lapply() function, it takes the element of the list X and passes them as the first argument of the function FUN. If the function FUN has more than one argument, the remaining arguments to FUN are passed through ... parameter of the lapply() function. In the example below, a value of 5 is passed over to y which is a second argument of the function FUN.

      x <- c(5,10,100)
      lapply(x, function(f,y) {(f^2)/y}, 5)

      ---Output---
      [[1]]
      [1] 5
 
      [[2]]
      [1] 20
 
      [[3]]
      [1] 2000

The apply() family of functions

↪ The sapply() function

The sapply() function is a wrapper and user-friendly version of the lapply() function, but differs in the return value. The sapply() function tries to simplify the result of the lapply() function if possible and returns a vector, a matrix or an array.

      The syntax is
         sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
         Where
           X: list, vector, or data frame
           FUN: the function to be applied to each element of X
           ...: optional arguments to FUN
           simplify: boolean or character string; result be simplified to a vector, matrix
                     or higher dimensional array if possible. The default value, TRUE, 
                     returns a vector or matrix as appropriate.
           USE.NAMES: boolean; if TRUE and if X is a character, use X as the name for the 
                      result unless it had names already

In the example below, a fruits vector is passed on to the sapply() function to find sizes of each element of the vector.

      fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
      sapply(fruits, nchar)

      ---Output---
      apple       banana       cherry dragon fruit   elderberry 
      5            6            6           12           10

Take a note of output of lapply(fruits, nchar). When arguments simplify = FALSE and USE.NAMES = FALSE are passed to the sapply() function, the output of the sapply() function is the same as with the output of the lapply() function.

      fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
      sapply(fruits, nchar, simplify = FALSE, USE.NAMES = FALSE)

Here is another example.

      m1 <- matrix(a<-(7:10), nrow=2, ncol=2)
      m2 <- matrix(a<-(107:110), nrow=2, ncol=2)
      m <- list(a=m1, b=m2)
      sapply(m, sqrt)

      ---Output---
      -           a        b
      [1,] 2.645751 10.34408
      [2,] 2.828427 10.39230
      [3,] 3.000000 10.44031
      [4,] 3.162278 10.48809

Here is another example.

      m1 <- matrix(a<-(7:10), nrow=2, ncol=2)
      x <- list(a=m1, b=100)
      sapply(x, sqrt, simplify = "array", USE.NAMES = TRUE) 

      ---Output---
      $a
      [,1]     [,2]
      [1,] 2.645751 3.000000
      [2,] 2.828427 3.162278
 
      $b
      [1] 10

The apply() family of functions

↪ The vapply() function

The vapply() function is similar to the sapply() function, but has a pre-specified type of return value.

      The syntax is
        vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
          Where
            X: list, vector, or data frame
            FUN: the function to be applied to each element of X
            ...: optional arguments to FUN
            FUN.VALUE: a vector; a template for the return value from FUN
            USE.NAMES: boolean; if TRUE and if X is a character, use X as the name for the 
            result unless it had names already

While a simplification is attempted in the lapply() function, simplification is always done in the vapply() function. This function checks that all values of FUN are compatible with the FUN.VALUE, in that they must have the same length and type.

      fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
      x <- c(1)
      vapply(fruits, nchar, FUN.VALUE = x)

      ---Output---
      apple       banana       cherry dragon fruit   elderberry 
      5            6            6           12           10

The vapply() function is very useful when the data type is known and an apply() function is expected to be applied only to that data type. In such cases, vapply() helps to prevent silent errors.

For example, here is a list with both numeric values and character values and the max() function works for both numeric and character values.

      m <- list(a = c(3, 27, 9), b = c(64, 4, 16), c = c("A", "B", "C"))
      sapply(m,max)

      ---Output---
        a    b    c 
      "27" "64"  "C"

The list is not type-safe. If the expected value of the list is just numeric values, then applying of max() function to the list could be a silent error with the sapply() function. The type checking in the vapply() function prevents this silent error and helps to debug the error.

      vapply(m, max, FUN.VALUE = c(1))

      ---Output---
      Error in vapply(m, max, FUN.VALUE = c(1)) : values must be type 'double',
      but FUN(X[[3]]) result is type 'character'

The apply() family of functions

↪ The rapply() function

The rapply() function is a recursive version of the lapply() function with flexibility in how the result is structured.

      The syntax is
        rapply(X, FUN, classes = "ANY", deflt = NULL,
             how = c("unlist", "replace", "list"), ...)
        Where
           X:  a list or expression
           FUN: a function with an argument, passing further arguments via ....
           classes: a character vector of class names, or "ANY" to match any class.
           deflt: The default result (not used if how = "replace").
           how: character string partially matching the three possibilities given
           ...: additional arguments passed to the call to FUN

Here is an example in which squares of each element are calculated.

      m <- list(a = c(2, 3, 4), b = c(5, 6, 7))
      out <- rapply(m, function(f){f^2})
      out

      ---Output---
      a1 a2 a3 b1 b2 b3 
      4  9 16 25 36 49

Notice that the result is in the form of a vector. The original structure can be preserved using the list string with a how argument.

      m <- list(a = c(2, 3, 4), b = c(5, 6, 7))
      out <- rapply(m, function(f){f^2}, how="list")
      out

      ---Output---
      $a
      [1]  4  9 16
      $b
      [1] 25 36 49

The class parameter tells the classes to which FUN should be applied. In the following example, the class numeric tells the FUN should be applied only to numeric data types.

      m <- list(a = c(2, 3, 4), b = c(5, 6, 7), c=c("a", "b", "c"))
      out <- rapply(m, function(f){f^2}, class=c("numeric"))
      out

      ---Output---
      a1 a2 a3 b1 b2 b3 
      4  9 16 25 36 49

In the above example, the function is not applied to character vector 'c' and its values are suppressed in the output 'out'. However, it is not lost.

      str(out)

      ---Output---
      Named num [1:6] 4 9 16 25 36 49
      - attr(*, "names")= chr [1:6] "a1" "a2" "a3" "b1" ...

The default parameter helps to give a default value to the part of the lists to which the function is not applied.

      m <- list(a = c(2, 3, 4), b = c(5, 6, 7), c=c("a", "b", "c"))
      out <- rapply(m,function(f){f^2}, class=c("numeric"),  deflt="Hello" )
      out

      ---Output---
      a1      a2      a3      b1      b2      b3       c 
      "4"     "9"    "16"    "25"    "36"    "49" "Hello"

The parameter how="replace" replaces back the character vector in the list. The default value, if any, is ignored with how="replace".

      m <- list(a = c(2, 3, 4), b = c(5, 6, 7), c=c("a", "b", "c"))
      out <- rapply(m, function(f){f^2}, class=c("numeric"), how="replace")
      out

      ---Output---
      $a
      [1]  4  9 16
      $b
      [1] 25 36 49
      $c
      [1] "a" "b" "c"

The apply() family of functions

↪ The mapply() function

The mapply() function is a multivariate version of the sapply() function. The mapply() function applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary. The mapply() function, similar to the sapply() function, tries to return a vector result when possible.

      The syntax is
        mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)

In the following example, the mapply() function is used for multiplying each element of the list 'm1' with the corresponding element of the list 'm2'.

      m1 <- list(a=c(2, 3, 4), b=c(5, 6, 7))
      m2 <- list(a=c(20, 30, 40), b=c(50, 60, 70))
      mapply(function(x1, x2) {x1*x2}, m1, m2)

      ---Output---
       -     a   b
      [1,]  40 250
      [2,]  90 360
      [3,] 160 490

      mapply(function(x1, x2) {x1*x2}, m1, m2, SIMPLIFY=FALSE)
      $a
      [1]  40  90 160
      $b
      [1] 250 360 490

Note: SIMPLIFY argument is case sensitive.

The apply() family of functions

↪ The tapply() function

The tapply() function is used to apply a function over subsets of a vector.

The syntax is

        tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
        Where
           X: a vector
           INDEX: a factor or a list of factors (or else they are coerced to factors)
           FUN:  a function to be applied
           … : contains other arguments to be passed FUN
           simplify: boolean; should we simplify the result?

In the example below, the tapply() function is used to calculate mean of heights of Female and Male from the data frame disney.

      name <- c("Micky", "Minny", "Goofey", "Donald", "Daisy")
      gender <- c("M", "F", "M", "M", "F")
      height <- c(30, 20, 45, 15, 10)
      disney <- data.frame(name, gender, height, stringsAsFactors = FALSE)
      disney

      ---Output---
      -   name gender height
      1  Micky      M     30
      2  Minny      F     20
      3 Goofey      M     45
      4 Donald      M     15
      5  Daisy      F     10

      tapply(disney$height, disney$gender, mean)

      ---Output---
      F  M 
      15 30

The apply() family of functions

↪ Summary

The apply() family of functions

is more efficient in looping over the large data structure.
made up of the apply(), lapply() , sapply(), vapply(), mapply(), rapply(), and tapply() functions.
gives a sense of functional programming paradigm in R.

The use of the apply() functions depends on the structure of the data that the program is required to operate on and on the expected format of the output.