3. Language Facilities¶
3.1. Operator precedence rules¶
- Function calls and grouping expressions: (), {}
- Index and lookup operators:
- Indexing: [], [[]]
- Namespace access: ::, ::
- Component, slot extraction: $, @
- Arithmetic:
- Exponentiation: ^ (right to left)
- Unary plus, minus: +, -
- Sequence operator: :
- Special operators: %any%, %%, %/%, %*%, %o%
- Multiply, Divide: *, /
- add, subtract : +, -
- Comparison: <, > , <=, >=, ==, !=
- Negation : !
- And: &, &&
- Or: | , ||
- Formulas: ~
- Right wise Assignment: ->, ->>
- Assignment =
- Assignment (right to left) <-, <<-
- Help: ?
3.2. Expressions¶
One expression per line:
> x <-1
> y <- x+2
> z <- y + x
> x
[1] 1
> y
[1] 3
> z
[1] 4
Multiple expressions in single line:
> x <- 1; y <- x+2; z <- y + x
> x; y; z
[1] 1
[1] 3
[1] 4
Series of expressions followed by a value:
> {x <- 1; y <- x+2; z <- y + x; z}
[1] 4
3.3. Flow Control¶
Help about control flow:
?Control
if, else, ifelse
if
> if (T) c(1,2)
[1] 1 2
> if (F) c(1,2)
if else:
> if (T) c(1,2,3) else matrix(c(1,2,3, 4), nrow=2)
[1] 1 2 3
> if (F) c(1,2,3) else matrix(c(1,2,3, 4), nrow=2)
[,1] [,2]
[1,] 1 3
[2,] 2 4
vectorized ifelse:
> v1 <- c(1,2,3,4)
> v2 <- c(5,6,7,8)
> cond <- c(T,F,F,T)
> ifelse(cond, v1, v2)
[1] 1 6 7 4
Logical operations:
> T && F
[1] FALSE
> T || F
[1] TRUE
Element wise logical operations:
> v1 <- c(T,T,F,F)
> v2 <- c(T, F, T, F)
> v1 | v2
[1] TRUE TRUE TRUE FALSE
> v1 & v2
[1] TRUE FALSE FALSE FALSE
repeat
A simple repeat loop
> x <- 10
> repeat { if (x == 0) break ; x = x - 1; print(x)}
[1] 9
[1] 8
[1] 7
[1] 6
[1] 5
[1] 4
[1] 3
[1] 2
[1] 1
[1] 0
If your repeat loop is stuck in an infinite loop, press ESC key.
for
Simple for loops:
> for (i in seq(1,10)) print(i)
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
> for (i in seq(1,10, by=2)) print(i)
[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
Results are not printed inside a loop without using the print
function as above:
> for (i in seq(1,10)) i
>
For loop for computing sum of squares:
ul <- rnorm(30)
usq <- 0
for (i in 1:10){
usq <- ul[i] * ul[i]
}
Off course a better solution is sum(ul^2)
.
Nested for loops:
nrow <- 10
ncol <- 10
m <- matrix(nrow=nrow, ncol=ncol)
for (i in 1:nrow){
for (j in 1:ncol){
m[i, j] <- i + j
}
}
while
A simple while loop:
> i <- 10; while ( i < 20 ) {i <- i +1; print(i)}
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
While loop with next and break
> i <- 10; while (T) {i <- i +1; if (i == 20) break; if ( i %% 2 == 0) next; print(i);}
[1] 11
[1] 13
[1] 15
[1] 17
[1] 19
..rubric:: iterators
Installing the package:
> install.packages('iterators')
Loading the package:
> library(iterators)
Creating an iterator:
> ii <- iter(1:4)
Using the iterator:
> nextElem(ii)
[1] 1
> nextElem(ii)
[1] 2
> nextElem(ii)
[1] 3
> nextElem(ii)
[1] 4
> nextElem(ii)
Error: StopIteration
An iterator recycling the elements:
> ii <- iter(1:4, recycle = T)
> for (i in 1:10) print(nextElem(ii))
[1] 1
[1] 2
[1] 3
[1] 4
[1] 1
[1] 2
[1] 3
[1] 4
[1] 1
[1] 2
foreach
Installing the package:
> install.packages('foreach')
Loading the library:
> library(foreach)
Checking the variation on growth of income with compounded interest rate:
> unlist(foreach(i=1:10) %do% {100 * (1 + i/100)^5})
[1] 105.1010 110.4081 115.9274 121.6653 127.6282 133.8226 140.2552 146.9328 153.8624 161.0510
It works with iterators too:
> unlist(foreach(i=iter(1:10)) %do% {100 * (1 + i/100)^5})
[1] 105.1010 110.4081 115.9274 121.6653 127.6282 133.8226 140.2552 146.9328 153.8624 161.0510
3.4. Functions¶
Calling an function:
> b = c(2,3,5)
> m = mean(x=b)
> s = sum(c(4,5,8,11))
Computing variance by combining multiple functions:
> x <- c(rnorm(10000))
> sum((x-mean(x))^2)/(length(x)-1)
[1] 0.992163
Defining a function:
function_name <- function (arglist){
body
}
Defining our own mean function:
my_mean <- function(x){
s <- sum(x)
n <- length(x)
s / n
}
Using the function:
> my_mean(rivers)
[1] 591.1844
Verifying against built-in implementation of mean:
> mean(rivers)
[1] 591.1844
A log-sum-exp function:
log_sum_exp <- function(x){
xx <- exp(x)
xxx <- sum(xx)
log(xxx)
}
Let us store its definition into a file named my_functions.R
.
Loading the function definition:
> source('my_functions.R')
Calling the function:
> log_sum_exp(10)
[1] 10
> log_sum_exp(c(10, 12))
[1] 12.12693
> log_sum_exp(sample(1:100, 100, replace=T))
[1] 100.4429
Recursive Functions
Let us solve the Tower of Hanoi problem in R:
hanoi <- function(num_disks, from, to, via, disk_num=num_disks){
if (num_disks == 1){
cat("move disk", disk_num, "from ", from, "to", to, "\n")
}else{
hanoi(num_disks-1, from, via, to)
hanoi(1, from, to, via, disk_num)
hanoi(num_disks-1, via, to, from)
}
}
Let’s see this in action:
> hanoi(1,'a', 'b', 'c')
move disk 1 from a to b
> hanoi(2,'a', 'b', 'c')
move disk 1 from a to c
move disk 2 from a to b
move disk 1 from c to b
> hanoi(3,'a', 'b', 'c')
move disk 1 from a to b
move disk 2 from a to c
move disk 1 from b to c
move disk 3 from a to b
move disk 1 from c to a
move disk 2 from c to b
move disk 1 from a to b
3.4.1. Closure in Lexical Scope¶
Accessing variable in the lexical scope:
fourth_power <- function(n){
sq <- function() n* n
sq() * sq()
}
Let’s see this function in action:
> fourth_power(2)
[1] 16
> fourth_power(3)
[1] 81
Let’s create a counter generator function:
counter <- function(n){
list(
increase = function(){
n <<- n+1
},
decrease = function(){
n <<- n-1
},
value = function(){
n
}
)
}
The value n
is the initial value of the counter. This gets stored
in the closure for the function. The function returns a list
whose members are functions which manipulate the value of
n
sitting in the closure.
The operator <<-
is used to update a variable in lexical scope.
Let’s now construct a counter object:
> v <- counter(10)
> v$value()
[1] 10
Let’s increase and decrease counter values:
> v$increase()
> v$increase()
> v$value()
[1] 12
> v$decrease()
> v$decrease()
> v$value()
[1] 10
3.5. Packages¶
A library is a collection of packages. Libraries are local to an R installation. Typically, there is a global library with the R installation and a user specific library.
List of library paths:
> .libPaths()
[1] "C:/Users/Shailesh/R/win-library/3.4" "C:/Program Files/R/R-3.4.2/library"
List of installed packages in all libraries:
> library()
Installing a package:
> install.packages("geometry")
Loading a package:
> library("geometry")
Installing a package if it is not installed:
> if(!require(psych)){install.packages("psych")}
List of currently installed packages:
> search()
[1] ".GlobalEnv" "package:foreach" "package:iterators" "package:MASS"
[5] "package:ggplot2" "package:e1071" "tools:rstudio" "package:stats"
[9] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[13] "package:methods" "Autoloads" "package:base"
This may vary in your setup.
List of loaded namespaces:
> loadedNamespaces()
[1] "Rcpp" "codetools" "grDevices" "class" "foreach" "MASS"
[7] "grid" "plyr" "gtable" "e1071" "datasets" "scales"
[13] "ggplot2" "rlang" "utils" "lazyeval" "graphics" "base"
[19] "labeling" "iterators" "tools" "munsell" "compiler" "stats"
[25] "colorspace" "methods" "tibble"
3.7. Logical Tests¶
Checking for missing values:
> x <- c(1, 4, NA, 5, 0/0)
> is.na(x)
[1] FALSE FALSE TRUE FALSE TRUE
Checking for not a number values:
> is.nan(x)
[1] FALSE FALSE FALSE FALSE TRUE
Checking for vectors:
> is.vector(1:3)
[1] TRUE
> is.vector("133")
[1] TRUE
> is.vector(matrix(1:4, nrow=2))
[1] FALSE
Checking for matrices:
> is.matrix(1:3)
[1] FALSE
> is.matrix(matrix(1:4, nrow=2))
[1] TRUE
3.8. Introspection¶
The mode of an object is the basic type of its fundamental constituents:
> x <- 1:10
> mode(x)
[1] "numeric"
- Class of an object::
- > class(x) [1] “integer”
- Type of an object::
- > typeof(x) [1] “integer”
- Length of an object::
- > length(x) [1] 10
Mode of a list:
> l <- list(1, '2', 3.4, TRUE)
> mode(l)
[1] "list"
Mode of a sublist is also list:
> mode(l[1])
[1] "list"
But individual elements in the list have different modes:
> mode(l[[1]])
[1] "numeric"
> mode(l[[2]])
[1] "character"
List of attributes
> l <- list("1", 2, TRUE, NA)
> attributes(l)
NULL
Setting an attribute:
> attr(l, 'color') <- 'red'
> attributes(l)
$color
[1] "red"
> attr(l, 'color')
[1] "red"
The class of an object enables object oriented programming and allows same function to behave differently for different classes.
Querying the class of an object:
> class(1:10)
[1] "integer"
> class(matrix(1:10, nrow=2))
[1] "matrix"
> class(list(1,2,3))
[1] "list"
Removing the class of an object (temporarily):
> unclass(object)
3.9. Coercion¶
Integers to strings:
> as.character(10:14)
[1] "10" "11" "12" "13" "14"
Strings to integers:
> as.integer(c("10", "11", "12", "13"))
[1] 10 11 12 13
Convert an array to a vector:
> as.vector(arr)
3.10. Sorting and Searching¶
Searching in a vector:
> which (v == 5)
[1] 5
> which (v > 5)
[1] 6 7 8 9 10
> which (v > 5 & v < 8)
[1] 6 7
Searching in a matrix:
> m <- matrix(1:10, nrow=2)
> m == 4
[,1] [,2] [,3] [,4] [,5]
[1,] FALSE FALSE FALSE FALSE FALSE
[2,] FALSE TRUE FALSE FALSE FALSE
> which(m == 4)
[1] 4
Sorting a vector in ascending order:
> x = sample(1:10)
> x
[1] 6 5 8 10 2 4 1 3 7 9
> sort(x)
[1] 1 2 3 4 5 6 7 8 9 10
Finding unique elements:
> v <- c(1, 4, 4, 3, 4, 4, 3, 3, 1, 2, 3, 4, 2, 3, 1, 3, 5, 6)
> unique(v)
[1] 1 4 3 2 5 6
3.11. Basic Mathematical Functions¶
Trigonometric functions:
> theta = pi/2
> sin(theta)
[1] 1
> cos(theta)
[1] 6.123032e-17
> tan(theta)
[1] 1.633124e+16
> asin(1)
[1] 1.570796
> acos(1)
[1] 0
> atan(1)
[1] 0.7853982
> atan(1) * 2
[1] 1.570796
Exponentiation:
> exp(1)
[1] 2.718282
Logarithms:
> log(exp(1))
[1] 1
> log(exp(4))
[1] 4
> log10(10^4)
[1] 4
> log2(8)
[1] 3
> log2(c(8,16,256,1024, 2048))
[1] 3 4 8 10 11
Square root:
> sqrt(4)
[1] 2
> sqrt(-4)
[1] NaN
Warning message:
In sqrt(-4) : NaNs produced
> sqrt(-4+0i)
[1] 0+2i
3.12. Built-in Constants¶
\(\pi\):
> pi
[1] 3.141593 >
Month names:
> month.name
[1] "January" "February" "March" "April" "May" "June" "July" "August"
[9] "September" "October" "November" "December"
Month name abbreviations:
> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
English letters:
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
3.13. Converting Numerical Data to Factor¶
Numerical data may need to be binned into a sequence of intervals.
Breaking data into intervals of equal length:
> data <- sample(0:20, 10, replace = TRUE)
> data
[1] 10 0 20 3 13 13 16 2 1 10
> cut (data, breaks=4)
[1] (5,10] (-0.02,5] (15,20] (-0.02,5] (10,15] (10,15] (15,20] (-0.02,5] (-0.02,5] (5,10]
Levels: (-0.02,5] (5,10] (10,15] (15,20]
Each interval is by default open on left side and closed on right side. Closed on left and open on right intervals can be created by using the parameter right=FALSE.
Frequency of categories:
> table(cut (data, breaks=4))
(-0.02,5] (5,10] (10,15] (15,20]
4 2 2 2
Making sure that the factors are ordered:
> cut (data, breaks=4, ordered_result = TRUE)
[1] (5,10] (-0.02,5] (15,20] (-0.02,5] (10,15] (10,15] (15,20] (-0.02,5] (-0.02,5] (5,10]
Levels: (-0.02,5] < (5,10] < (10,15] < (15,20]
Using our own labels for the factors:
> cut (data, breaks=4, labels=c("a", "b", "c", "d"))
[1] b a d a c c d a a b
Levels: a b c d
Specifying our own break-points (intervals) for cutting:
> cut (data, breaks=c(-1, 5,10, 20))
[1] (5,10] (-1,5] (10,20] (-1,5] (10,20] (10,20] (10,20] (-1,5] (-1,5] (5,10]
Levels: (-1,5] (5,10] (10,20]
Including the lowest value in the first interval:
> cut (data, breaks=c(0, 5,10, 20), include.lowest = TRUE)
[1] (5,10] [0,5] (10,20] [0,5] (10,20] (10,20] (10,20] [0,5] [0,5] (5,10]
Levels: [0,5] (5,10] (10,20]
3.14. Apply Family of Functions¶
Sample data:
> m <- matrix(1:8, nrow=2)
> m
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
Summing a matrix over rows:
> apply(m, 1, sum)
[1] 16 20
Summing a matrix over columns:
> apply(m, 2, sum)
[1] 3 7 11 15
Median for each row and column:
> apply(m, 1, median)
[1] 4 5
> apply(m, 2, median)
[1] 1.5 3.5 5.5 7.5
lapply
applies a function on each
element of a list and returns the values
as a list.
Let’s prepare a list of matrices:
> A <- matrix(c(1,1,1,3,0,2), nrow=3)
> B <- matrix(c(0,7,2,0,5,1), nrow=3)
> l <- list(A, B)
> l
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 1 0
[3,] 1 2
[[2]]
[,1] [,2]
[1,] 0 0
[2,] 7 5
[3,] 2 1
Extracting first row from each matrix:
> lapply(l, '[', 1,)
[[1]]
[1] 1 3
[[2]]
[1] 0 0
Extracting second column from each matrix:
> lapply(l, '[', , 2)
[[1]]
[1] 3 0 2
[[2]]
[1] 0 5 1
Extracting the element at position [1,2] from each matrix:
> lapply(l, '[', 1,2)
[[1]]
[1] 3
[[2]]
[1] 0
> unlist(lapply(l, '[', 1,2))
[1] 3 0
Computing the mean of each column in the mtcars dataset:
> lapply(mtcars, 'mean')
$mpg
[1] 20.09062
$cyl
[1] 6.1875
$disp
[1] 230.7219
$hp
[1] 146.6875
$drat
[1] 3.596563
$wt
[1] 3.21725
$qsec
[1] 17.84875
$vs
[1] 0.4375
$am
[1] 0.40625
$gear
[1] 3.6875
$carb
[1] 2.8125
sapply
can help achieve the combination of unlist and lapply
easily:
> sapply(l, '[', 1,2)
[1] 3 0
It basically attempts to simplify the result of lapply
as much as possible.
Computing the mean of each column in mtcars:
> sapply(mtcars, 'mean')
mpg cyl disp hp drat wt qsec vs am
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 0.437500 0.406250
gear carb
3.687500 2.812500
The same for iris dataset:
> sapply(iris, 'mean')
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.843333 3.057333 3.758000 1.199333 NA
Warning message:
In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
Printing class of each column in a data frame:
> sapply(iris, class)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"numeric" "numeric" "numeric" "numeric" "factor"
mapply
applies a function repetitively to elements
from a pair of lists or vectors:
> v1 <- c(1,2,3)
> v2 <- c(3,4,5)
> mapply(v1, v2, sum)
[1] 4 6 8
Applying rep
to each element of a vector
and constructing a matrix of repeated rows:
> mapply(rep,1:4,4)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 2 3 4
[3,] 1 2 3 4
[4,] 1 2 3 4
This is equivalent to:
> matrix(c(rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4)),4,4)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 2 3 4
[3,] 1 2 3 4
[4,] 1 2 3 4
Repeating a list of characters into a matrix:
> l <- list("a", "b", "c", "d")
> mode(l)
[1] "list"
> class(l)
[1] "list"
> mode(l[[1]])
[1] "character"
> class(l[[1]])
[1] "character"
> m <- mapply(rep, l, 4)
> m
[,1] [,2] [,3] [,4]
[1,] "a" "b" "c" "d"
[2,] "a" "b" "c" "d"
[3,] "a" "b" "c" "d"
[4,] "a" "b" "c" "d"
> mode(m)
[1] "character"
> class(m)
[1] "matrix"
One more example:
> l <- list("aa", "bb", "cc", "dd")
> m <- mapply(rep, l, 4)
> m
[,1] [,2] [,3] [,4]
[1,] "aa" "bb" "cc" "dd"
[2,] "aa" "bb" "cc" "dd"
[3,] "aa" "bb" "cc" "dd"
[4,] "aa" "bb" "cc" "dd"
Coercion is applied when necessary:
> l <- list(1, "bb", T, 4.5)
> m <- mapply(rep, l, 4)
> m
[,1] [,2] [,3] [,4]
[1,] "1" "bb" "TRUE" "4.5"
[2,] "1" "bb" "TRUE" "4.5"
[3,] "1" "bb" "TRUE" "4.5"
[4,] "1" "bb" "TRUE" "4.5"
3.15. Missing Data¶
R has extensive support for missing data.
A vector with missing values:
> x <- c(1, -1, 1, NA, -2, 1, -3, 4, NA, NA, 3, 2, -4, -3, NA)
Identifying entries in x
which are missing:
> is.na(x)
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
Extracting non-missing values from x
:
> x[!is.na(x)]
[1] 1 -1 1 -2 1 -3 4 3 2 -4 -3
By defaulting summing NA values gives us NA:
> sum(x)
[1] NA
We can ignore missing values while calculating the sum:
> sum(x, na.rm = T)
[1] -1
Ignoring missing values for calculating mean:
> mean(x)
[1] NA
> mean(x, na.rm = T)
[1] -0.09090909
Ignoring missing values for calculating variance:
> var(x)
[1] NA
> var(x, na.rm = T)
[1] 7.090909
Recording a missing value:
> x[1] <- NA
Creating a new dataset without the missing data:
> y<-na.omit(x)
> y
[1] -1 1 -2 1 -3 4 3 2 -4 -3
attr(,"na.action")
[1] 1 4 9 10 15
attr(,"class")
[1] "omit"
Failing and error out in presence of missing values:
> na.fail(x)
Error in na.fail.default(x) : missing values in object
> na.fail(y)
[1] -1 1 -2 1 -3 4 3 2 -4 -3
attr(,"na.action")
[1] 1 4 9 10 15
attr(,"class")
[1] "omit"
3.16. Classes¶
A generic function performs a task or action on its arguments specific to the class of the argument itself. If the argument doesn’t have a class attribute, then the default version of the generic function is called.
Various versions of the generic function plot
:
> methods(plot)
[1] plot.acf* plot.bclust* plot.data.frame* plot.decomposed.ts* plot.default
[6] plot.dendrogram* plot.density* plot.ecdf plot.factor* plot.formula*
[11] plot.function plot.hclust* plot.histogram* plot.HoltWinters* plot.ica*
[16] plot.isoreg* plot.lm* plot.medpolish* plot.mlm* plot.ppr*
[21] plot.prcomp* plot.princomp* plot.profile.nls* plot.raster* plot.SOM*
[26] plot.somgrid* plot.spec* plot.stepfun plot.stft* plot.stl*
[31] plot.svm* plot.table* plot.ts plot.tskernel* plot.TukeyHSD*
[36] plot.tune*
Generic methods associated with matrix
class:
> methods(class="matrix")
[1] anyDuplicated as.data.frame as.raster boxplot coerce determinant duplicated
[8] edit head initialize isSymmetric Math Math2 Ops
[15] relist subset summary tail unique
Generic methods associated with table
class:
> methods(class="table")
[1] [ aperm as.data.frame Axis coerce head initialize
[8] lines plot points print show slotsFromS3 summary
[15] tail
Some of the functions may not be visible. They are marked with *:
> methods(coef)
[1] coef.aov* coef.Arima* coef.default* coef.listof* coef.maov* coef.nls*
3.17. Defining our own operators¶
Let us define a distance operator:
> `%d%` <- function(x, y) { sqrt(sum((x-y)^2)) }
Let us use the operator for calculating distances between points:
> c(1,0, 0) %d% c(0,1,0)
[1] 1.414214
> c(1,1, 0) %d% c(0,1,0)
[1] 1
> c(1,1, 1) %d% c(0,1,0)
[1] 1.414214