Programming basics: control flow (if and for)

This belongs to more advanced programming and you will have a chance to revisit it during this course. Here is just a brief overview. For extra info:

Classes PowerPoint Presentation

Key insights and activities

Basically what we will learn now is conditional expressions and control flow. In R, we can actually perform quite a bit of data analysis without conditionals but they do come up occasionally. Control flow amounts to setting actions to occur only if a condition or a set of conditions are met *(if)*. For example, to put it a bit into perspective, we might be interested in only performing an action (e.g print name) of those patients in a dataset that are older than 80.Alternatively, we can also set an action to occur a particular number of times (for).`

So for a bit of refresh, we have seen that there are three main operations you can perform in variables, two of them will result in a variable with a logical data.type (TRUE or FALSE), this is the kind of input we are interested in for conditional workflows.

  • arithmetic: (+/-/*)

  • logical: AND operator (&), OR operator (|),NOT operator (!)

  • comparison:

    • >(greater than)

    • = (greater than or equal to)

    • < (less than)

    • <= (less than or equal to)

    • == (equal to)

    • != (not equal to)

We will not get into describing logical operators, but feel free to look it up. They are also sometimes found under boolean algebra.

So for example you can compare:

x <- 5

x > 10
[1] FALSE
x == 5
[1] TRUE
x <= 3
[1] FALSE
is.character(x)
[1] FALSE
!is.character(x)
[1] TRUE

You can apply logical operators elementwise to vectors or matrices.

x <- c(-1, 0, 1)

# Check each element of x against the condition (elementwise)
x <= 0             
[1]  TRUE  TRUE FALSE

Also:

# Check if EACH element of x is equal to ANY of
# the elements of the object on the right
x %in% c(0, 1, 2)
[1] FALSE  TRUE  TRUE

If-else

For conditional statements, the most commonly used approaches are the constructs:

# if
if (condition is true) {
  perform action
}

# if ... else
if (condition is true) {
  perform action
} else {  # that is, if the condition is false,
  perform alternative action
}

Say, for example, that we want R to print a message if a variable x has a particular value:

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
}

x
[1] 8

The print statement does not appear in the console because x is not greater than 10. To print a different message for numbers less than 10, we can add an else statement.

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else {
  print("x is less than 10")
}
[1] "x is less than 10"

You can also test multiple conditions by using else if.

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else if (x > 5) {
  print("x is greater than 5, but less than 10")
} else {
  print("x is less than 5")
}
[1] "x is greater than 5, but less than 10"

Important: when R evaluates the condition inside if() statements, it is looking for a logical element, i.e., TRUE or FALSE. This can cause some headaches for beginners. For example:

x  <-  4 == 3
if (x) {
  "4 equals 3"
} else {
  "4 does not equal 3"          
}
[1] "4 does not equal 3"

As we can see, the not equal message was printed because the vector x is FALSE

x <- 4 == 3
x
[1] FALSE

Tip: Built in ifelse() function

R accepts both if() and else if() statements structured as outlined above, but also statements using R’s built-in ifelse() function. This function accepts both singular and vector inputs and is structured as follows:

# ifelse function 
ifelse(condition is true, perform action, perform alternative action) 

where the first argument is the condition or a set of conditions to be met, the second argument is the statement that is evaluated when the condition is TRUE, and the third statement is the statement that is evaluated when the condition is FALSE.

y <- -3
ifelse(y < 0, "y is a negative number", "y is either positive or zero")
[1] "y is a negative number"

for: repeating operations

Sometimes you want to carry out the same procedure multiple times. This is called iteration and the simplest form of iteration is the for loop.

The basic structure of a for() loop is:

for (iterator in set of values) {
  do a thing
}

For example:

for (i in 1:10) {
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

The 1:10 bit creates a vector on the fly; you can iterate over any other vector as well.

We can use a for() loop nested within another for() loop to iterate over two things at once.

for (i in 1:5) {
  for (j in c('a', 'b', 'c', 'd', 'e')) {
    print(paste(i,j))
  }
}
[1] "1 a"
[1] "1 b"
[1] "1 c"
[1] "1 d"
[1] "1 e"
[1] "2 a"
[1] "2 b"
[1] "2 c"
[1] "2 d"
[1] "2 e"
[1] "3 a"
[1] "3 b"
[1] "3 c"
[1] "3 d"
[1] "3 e"
[1] "4 a"
[1] "4 b"
[1] "4 c"
[1] "4 d"
[1] "4 e"
[1] "5 a"
[1] "5 b"
[1] "5 c"
[1] "5 d"
[1] "5 e"

We notice in the output that when the first index (i) is set to 1, the second index (j) iterates through its full set of indices. Once the indices of j have been iterated through, then i is incremented. This process continues until the last index has been used for each for() loop.

Another example:

x <- c("A", "B", "C")

for(i in 1:length(x)) {
  print(x[i])
}
[1] "A"
[1] "B"
[1] "C"

The syntax is that you define an index (in this case, the letter i) and starting (1) and stopping (length(x)) values. R sets the index to the first value, then runs the code between the { } brackets. Then it iterates, moving to the next value of the index, and re-running the code. This example also demonstrates the use of the length() function, which returns the number of elements in an object.

Here is a similar loop but with a more complicated piece of code inside. At each iteration, the paste() function combines text from the ith element of x and the the ith element y into a sentence.

x <- c("A", "B", "C")
y <- c(10, 18, 7)

for(i in 1:length(x)) {
  print(paste("Item", x[i], "weighs", y[i], "lbs.", sep = " "))
}
[1] "Item A weighs 10 lbs."
[1] "Item B weighs 18 lbs."
[1] "Item C weighs 7 lbs."

Not very computational efficient. Other options are vectorized approaches such as the purrr package.

Final challenge

Write a script that loops through the gapminder data by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.

Step 1: We want to make sure we can extract all the unique values of the continent vector

library(gapminder)
gapminder <- gapminder
unique(gapminder$continent)

Step 2: We also need to loop over each of these continents and calculate the average life expectancy for each subset of data. We can do that as follows:

  1. Loop over each of the unique values of ‘continent’
  2. For each value of continent, create a temporary variable storing that subset
  3. Return the calculated life expectancy to the user by printing the output:
for (iContinent in unique(gapminder$continent)) {
  tmp <- gapminder[gapminder$continent == iContinent, ]   
  cat(iContinent, mean(tmp$lifeExp, na.rm = TRUE), "\n")  
  
}

Step 3: The exercise only wants the output printed if the average life expectancy is less than 50 or greater than 50. So we need to add an if() condition before printing, which evaluates whether the calculated average life expectancy is above or below a threshold, and prints an output conditional on the result. We need to amend (3) from above:

3a. If the calculated life expectancy is less than some threshold (50 years), return the continent and a statement that life expectancy is less than threshold, otherwise return the continent and a statement that life expectancy is greater than threshold:

thresholdValue <- 50

for (iContinent in unique(gapminder$continent)) {
   tmp <- mean(gapminder[gapminder$continent == iContinent, "lifeExp"])
   
   if (tmp < thresholdValue){
       cat("Average Life Expectancy in", iContinent, "is less than", thresholdValue, "\n")
   } else {
       cat("Average Life Expectancy in", iContinent, "is greater than", thresholdValue, "\n")
   } # end if else condition
   
} # end for loop

{: .solution} {: .challenge}

As we have seen many times in these lectures, there are many different ways this could have been done in too!

Back to top