The apply() family of functions

The apply family of functions in R repetitively performs an action, by applying a named function with one or several optional arguments, on data from matrices, arrays, lists, and data frames. The apply functions are essentially an alternative to loop constructs and these functions are compact, require less code, and run faster than loops.

The named function could be:

  • An aggregating function, like for example the mean, or the sum (that returns a number or scalar);
  • Transforming or Sub-setting functions
  • Vectorized functions, which yield more complex structures like lists, vectors, matrices, and arrays.
  • Topics covered in this section are

      ➤ apply(): Apply a function over the margins of an array

      ➤ lapply(): Loop over a list and evaluate a function on each element

      ➤ sapply(): Same as the lapply() function but try to simplify the result

      ➤ vapply(): Same as the sapply() function but has a pre-specified type of return value.

      ➤ rapply(): a recursive version of the lapply() function with flexibility in how the result is structured

      ➤ mapply(): Multivariate version of the lapply() function

      ➤ tapply(): Apply a function over subsets of a vector

    The apply() family of functions

    ↪ The apply() function

    The apply() function takes an array, data frame, or matrix as an input and gives output in vector, list, or array.

    The syntax is

          apply(X, MARGIN, FUN, ..., simplify = TRUE)
          Where:
            X: an array, data frame, or matrix
            MARGIN:  take a value or range between 1 and 2 to define where to apply the function:
              =1: the manipulation is performed on rows
              =2: the manipulation is performed on columns
              =c(1,2): the manipulation is performed on rows and columns
            FUN: tells which function to apply
            ...: optional arguments to FUN
            simplify: boolean value indicating whether results should be simplified if possible
    
    

    Example: The following block creates a fruitsframe

          fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
          jan <- c(10, 3, 12, 5, 8)
          feb <- c(12.5, 5, 14, 3, 10)
          mar <- c(10.5, 6, 10, 4, 13)
          fruitsframe <- data.frame(fruits,jan,feb,mar,stringsAsFactors = FALSE)
          fruitsframe
    
          ---Output---        fruits jan feb mar       1 apple 10 12.5 10.5       2 banana 3 5.0 6.0       3 cherry 12 14.0 10.0       4 dragon fruit 5 3.0 4.0       5 elderberry 8 10.0 13.0

    The following code block computes sum of each rows of columns jan, feb, and mar.

          apply(fruitsframe[2:4], 1, sum)  # X is a dataframe of fruitsframe[2:4]
                                           # MARGIN is 1 i.e. apply the 'sum' of rows
                                           # FUN is a 'sum' aggregate function
    
          ---Output---       [1] 33 14 36 12 31

    The apply() family of functions

    ↪ The lapply() function

    The lapply() function takes a list, vector, or data frame as input and gives output in the list. The returned list is of the same length as the input object, each element of which is the result of applying FUN to the corresponding element of the list.

    The syntax is

            lapply(X, FUN, ...)
            Where
              X: list, vector, or data frame
              FUN: the function to be applied to each element of X
              ...: optional arguments to FUN
    
    
    The argument FUN is applied to each element of the input X, and the returned object is a list which is having the same length as input object X.

    In the example below, a fruits vector is passed on to the lapply() function to find sizes of each element of the vector.

          fruits <- c("apple", "banana", "dragon fruit")
          lapply(fruits, nchar)
    
          ---Output---       [[1]]       [1] 5         [[2]]       [1] 6         [[3]]       [1] 12

    If the input X is not a list, it will be coerced into a list using as.list(). In the example below, the FUN square root is applied to each element of the matrix.

          m <- matrix(a<-(7:10),nrow=2, ncol=2)
          lapply(m, sqrt)
    
          ---Output---       [[1]]       [1] 2.645751         [[2]]       [1] 2.828427         [[3]]       [1] 3         [[4]]       [1] 3.162278

    Here is another example of sqrt has been applied to each elements of the list.

          m1 <- matrix(a<-(9:10), nrow=2, ncol=1)
          x <- list(a=m1, b=100)
          lapply(x, sqrt)
    
          ---Output---       $a       [,1]       [1,] 3.000000       [2,] 3.162278         $b       [1] 10

    The function FUN can be a user defined function. The following example shows anonymous FUN passed on to the lapply() function to find squares of the each elements of the list.

          m1 <- matrix(a<-(9:10), nrow=2, ncol=1)
          x <- list(a=m1, b=100)
          lapply(x, function(f) {f^2})
    
          ---Output---       $a       [,1]       [1,] 81       [2,] 100         $b       [1] 10000

    The function FUN above can be a named function like the below example.

          FUN <- function(f) {f^2}
          lapply(x, FUN)
    
    

    When a function FUN is passed to the lapply() function, it takes the element of the list X and passes them as the first argument of the function FUN. If the function FUN has more than one argument, the remaining arguments to FUN are passed through ... parameter of the lapply() function. In the example below, a value of 5 is passed over to y which is a second argument of the function FUN.

          x <- c(5,10,100)
          lapply(x, function(f,y) {(f^2)/y}, 5)
    
          ---Output---       [[1]]       [1] 5         [[2]]       [1] 20         [[3]]       [1] 2000

    The apply() family of functions

    ↪ The sapply() function

    The sapply() function is a wrapper and user-friendly version of the lapply() function, but differs in the return value. The sapply() function tries to simplify the result of the lapply() function if possible and returns a vector, a matrix or an array.

          The syntax is
             sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
             Where
               X: list, vector, or data frame
               FUN: the function to be applied to each element of X
               ...: optional arguments to FUN
               simplify: boolean or character string; result be simplified to a vector, matrix
                         or higher dimensional array if possible. The default value, TRUE, 
                         returns a vector or matrix as appropriate.
               USE.NAMES: boolean; if TRUE and if X is a character, use X as the name for the 
                          result unless it had names already
    
    

    In the example below, a fruits vector is passed on to the sapply() function to find sizes of each element of the vector.

          fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
          sapply(fruits, nchar)
    
          ---Output---       apple banana cherry dragon fruit elderberry       5 6 6 12 10

    Take a note of output of lapply(fruits, nchar). When arguments simplify = FALSE and USE.NAMES = FALSE are passed to the sapply() function, the output of the sapply() function is the same as with the output of the lapply() function.

          fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
          sapply(fruits, nchar, simplify = FALSE, USE.NAMES = FALSE)
    
    

    Here is another example.

          m1 <- matrix(a<-(7:10), nrow=2, ncol=2)
          m2 <- matrix(a<-(107:110), nrow=2, ncol=2)
          m <- list(a=m1, b=m2)
          sapply(m, sqrt)
    
          ---Output---       - a b       [1,] 2.645751 10.34408       [2,] 2.828427 10.39230       [3,] 3.000000 10.44031       [4,] 3.162278 10.48809

    Here is another example.

          m1 <- matrix(a<-(7:10), nrow=2, ncol=2)
          x <- list(a=m1, b=100)
          sapply(x, sqrt, simplify = "array", USE.NAMES = TRUE) 
    
          ---Output---       $a       [,1] [,2]       [1,] 2.645751 3.000000       [2,] 2.828427 3.162278         $b       [1] 10

    The apply() family of functions

    ↪ The vapply() function

    The vapply() function is similar to the sapply() function, but has a pre-specified type of return value.

          The syntax is
            vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
              Where
                X: list, vector, or data frame
                FUN: the function to be applied to each element of X
                ...: optional arguments to FUN
                FUN.VALUE: a vector; a template for the return value from FUN
                USE.NAMES: boolean; if TRUE and if X is a character, use X as the name for the 
                result unless it had names already
    
    

    While a simplification is attempted in the lapply() function, simplification is always done in the vapply() function. This function checks that all values of FUN are compatible with the FUN.VALUE, in that they must have the same length and type.

          fruits <- c("apple", "banana", "cherry", "dragon fruit", "elderberry")
          x <- c(1)
          vapply(fruits, nchar, FUN.VALUE = x)
    
          ---Output---       apple banana cherry dragon fruit elderberry       5 6 6 12 10

    The vapply() function is very useful when the data type is known and an apply() function is expected to be applied only to that data type. In such cases, vapply() helps to prevent silent errors.

    For example, here is a list with both numeric values and character values and the max() function works for both numeric and character values.

          m <- list(a = c(3, 27, 9), b = c(64, 4, 16), c = c("A", "B", "C"))
          sapply(m,max)
    
          ---Output---        a b c       "27" "64" "C"

    The list is not type-safe. If the expected value of the list is just numeric values, then applying of max() function to the list could be a silent error with the sapply() function. The type checking in the vapply() function prevents this silent error and helps to debug the error.

          vapply(m, max, FUN.VALUE = c(1))
    
          ---Output---       Error in vapply(m, max, FUN.VALUE = c(1)) : values must be type 'double',       but FUN(X[[3]]) result is type 'character'

    The apply() family of functions

    ↪ The rapply() function

    The rapply() function is a recursive version of the lapply() function with flexibility in how the result is structured.

          The syntax is
            rapply(X, FUN, classes = "ANY", deflt = NULL,
                 how = c("unlist", "replace", "list"), ...)
            Where
               X:  a list or expression
               FUN: a function with an argument, passing further arguments via ....
               classes: a character vector of class names, or "ANY" to match any class.
               deflt: The default result (not used if how = "replace").
               how: character string partially matching the three possibilities given
               ...: additional arguments passed to the call to FUN
    
    

    Here is an example in which squares of each element are calculated.

          m <- list(a = c(2, 3, 4), b = c(5, 6, 7))
          out <- rapply(m, function(f){f^2})
          out
    
          ---Output---       a1 a2 a3 b1 b2 b3       4 9 16 25 36 49

    Notice that the result is in the form of a vector. The original structure can be preserved using the list string with a how argument.

          m <- list(a = c(2, 3, 4), b = c(5, 6, 7))
          out <- rapply(m, function(f){f^2}, how="list")
          out
    
          ---Output---       $a       [1] 4 9 16       $b       [1] 25 36 49

    The class parameter tells the classes to which FUN should be applied. In the following example, the class numeric tells the FUN should be applied only to numeric data types.

          m <- list(a = c(2, 3, 4), b = c(5, 6, 7), c=c("a", "b", "c"))
          out <- rapply(m, function(f){f^2}, class=c("numeric"))
          out
    
          ---Output---       a1 a2 a3 b1 b2 b3       4 9 16 25 36 49

    In the above example, the function is not applied to character vector 'c' and its values are suppressed in the output 'out'. However, it is not lost.

          str(out)
    
          ---Output---       Named num [1:6] 4 9 16 25 36 49       - attr(*, "names")= chr [1:6] "a1" "a2" "a3" "b1" ...

    The default parameter helps to give a default value to the part of the lists to which the function is not applied.

          m <- list(a = c(2, 3, 4), b = c(5, 6, 7), c=c("a", "b", "c"))
          out <- rapply(m,function(f){f^2}, class=c("numeric"),  deflt="Hello" )
          out
    
          ---Output---       a1 a2 a3 b1 b2 b3 c       "4" "9" "16" "25" "36" "49" "Hello"

    The parameter how="replace" replaces back the character vector in the list. The default value, if any, is ignored with how="replace".

          m <- list(a = c(2, 3, 4), b = c(5, 6, 7), c=c("a", "b", "c"))
          out <- rapply(m, function(f){f^2}, class=c("numeric"), how="replace")
          out
    
          ---Output---       $a       [1] 4 9 16       $b       [1] 25 36 49       $c       [1] "a" "b" "c"

    The apply() family of functions

    ↪ The mapply() function

    The mapply() function is a multivariate version of the sapply() function. The mapply() function applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary. The mapply() function, similar to the sapply() function, tries to return a vector result when possible.

          The syntax is
            mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
    
    

    In the following example, the mapply() function is used for multiplying each element of the list 'm1' with the corresponding element of the list 'm2'.

          m1 <- list(a=c(2, 3, 4), b=c(5, 6, 7))
          m2 <- list(a=c(20, 30, 40), b=c(50, 60, 70))
          mapply(function(x1, x2) {x1*x2}, m1, m2)
    
          ---Output---        - a b       [1,] 40 250       [2,] 90 360       [3,] 160 490

          mapply(function(x1, x2) {x1*x2}, m1, m2, SIMPLIFY=FALSE)
          $a
          [1]  40  90 160
          $b
          [1] 250 360 490
    
    

    Note: SIMPLIFY argument is case sensitive.

    The apply() family of functions

    ↪ The tapply() function

    The tapply() function is used to apply a function over subsets of a vector.

    The syntax is
            tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
            Where
               X: a vector
               INDEX: a factor or a list of factors (or else they are coerced to factors)
               FUN:  a function to be applied
               … : contains other arguments to be passed FUN
               simplify: boolean; should we simplify the result?
    
    

    In the example below, the tapply() function is used to calculate mean of heights of Female and Male from the data frame disney.

          name <- c("Micky", "Minny", "Goofey", "Donald", "Daisy")
          gender <- c("M", "F", "M", "M", "F")
          height <- c(30, 20, 45, 15, 10)
          disney <- data.frame(name, gender, height, stringsAsFactors = FALSE)
          disney
    
          ---Output---       - name gender height       1 Micky M 30       2 Minny F 20       3 Goofey M 45       4 Donald M 15       5 Daisy F 10
          tapply(disney$height, disney$gender, mean)
    
          ---Output---       F M       15 30

    The apply() family of functions

    ↪ Summary

    The apply() family of functions

    1. is more efficient in looping over the large data structure.
    2. made up of the apply(), lapply() , sapply(), vapply(), mapply(), rapply(), and tapply() functions.
    3. gives a sense of functional programming paradigm in R.

    The use of the apply() functions depends on the structure of the data that the program is required to operate on and on the expected format of the output.