x <- 5
x > 10[1] FALSE
x == 5[1] TRUE
x <= 3[1] FALSE
is.character(x)[1] FALSE
!is.character(x)[1] TRUE
This belongs to more advanced programming and you will have a chance to revisit it during this course. Here is just a brief overview. For extra info:
Basically what we will learn now is conditional expressions and control flow. In R, we can actually perform quite a bit of data analysis without conditionals but they do come up occasionally. Control flow amounts to setting actions to occur only if a condition or a set of conditions are met *(if)*. For example, to put it a bit into perspective, we might be interested in only performing an action (e.g print name) of those patients in a dataset that are older than 80.Alternatively, we can also set an action to occur a particular number of times (for).`
So for a bit of refresh, we have seen that there are three main operations you can perform in variables, two of them will result in a variable with a logical data.type (TRUE or FALSE), this is the kind of input we are interested in for conditional workflows.
arithmetic: (+/-/*)
logical: AND operator (&), OR operator (|),NOT operator (!)
comparison:
>(greater than)
= (greater than or equal to)
< (less than)
<= (less than or equal to)
== (equal to)
!= (not equal to)
We will not get into describing logical operators, but feel free to look it up. They are also sometimes found under boolean algebra.
So for example you can compare:
[1] FALSE
[1] TRUE
[1] FALSE
[1] FALSE
[1] TRUE
You can apply logical operators elementwise to vectors or matrices.
[1] TRUE TRUE FALSE
Also:
For conditional statements, the most commonly used approaches are the constructs:
Say, for example, that we want R to print a message if a variable x has a particular value:
The print statement does not appear in the console because x is not greater than 10. To print a different message for numbers less than 10, we can add an else statement.
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
} else {
print("x is less than 10")
}[1] "x is less than 10"
You can also test multiple conditions by using else if.
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
} else if (x > 5) {
print("x is greater than 5, but less than 10")
} else {
print("x is less than 5")
}[1] "x is greater than 5, but less than 10"
Important: when R evaluates the condition inside if() statements, it is looking for a logical element, i.e., TRUE or FALSE. This can cause some headaches for beginners. For example:
As we can see, the not equal message was printed because the vector x is FALSE
Tip: Built in
ifelse()function
Raccepts bothif()andelse if()statements structured as outlined above, but also statements usingR’s built-inifelse()function. This function accepts both singular and vector inputs and is structured as follows:
where the first argument is the condition or a set of conditions to be met, the second argument is the statement that is evaluated when the condition is
TRUE, and the third statement is the statement that is evaluated when the condition isFALSE.
Sometimes you want to carry out the same procedure multiple times. This is called iteration and the simplest form of iteration is the for loop.
The basic structure of a for() loop is:
For example:
The 1:10 bit creates a vector on the fly; you can iterate over any other vector as well.
We can use a for() loop nested within another for() loop to iterate over two things at once.
[1] "1 a"
[1] "1 b"
[1] "1 c"
[1] "1 d"
[1] "1 e"
[1] "2 a"
[1] "2 b"
[1] "2 c"
[1] "2 d"
[1] "2 e"
[1] "3 a"
[1] "3 b"
[1] "3 c"
[1] "3 d"
[1] "3 e"
[1] "4 a"
[1] "4 b"
[1] "4 c"
[1] "4 d"
[1] "4 e"
[1] "5 a"
[1] "5 b"
[1] "5 c"
[1] "5 d"
[1] "5 e"
We notice in the output that when the first index (i) is set to 1, the second index (j) iterates through its full set of indices. Once the indices of j have been iterated through, then i is incremented. This process continues until the last index has been used for each for() loop.
Another example:
The syntax is that you define an index (in this case, the letter i) and starting (1) and stopping (length(x)) values. R sets the index to the first value, then runs the code between the { } brackets. Then it iterates, moving to the next value of the index, and re-running the code. This example also demonstrates the use of the length() function, which returns the number of elements in an object.
Here is a similar loop but with a more complicated piece of code inside. At each iteration, the paste() function combines text from the ith element of x and the the ith element y into a sentence.
x <- c("A", "B", "C")
y <- c(10, 18, 7)
for(i in 1:length(x)) {
print(paste("Item", x[i], "weighs", y[i], "lbs.", sep = " "))
}[1] "Item A weighs 10 lbs."
[1] "Item B weighs 18 lbs."
[1] "Item C weighs 7 lbs."
Not very computational efficient. Other options are vectorized approaches such as the purrr package.
Final challenge
Write a script that loops through the
gapminderdata by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.Step 1: We want to make sure we can extract all the unique values of the continent vector
Step 2: We also need to loop over each of these continents and calculate the average life expectancy for each
subsetof data. We can do that as follows:
- Loop over each of the unique values of ‘continent’
- For each value of continent, create a temporary variable storing that subset
- Return the calculated life expectancy to the user by printing the output:
Step 3: The exercise only wants the output printed if the average life expectancy is less than 50 or greater than 50. So we need to add an
if()condition before printing, which evaluates whether the calculated average life expectancy is above or below a threshold, and prints an output conditional on the result. We need to amend (3) from above:3a. If the calculated life expectancy is less than some threshold (50 years), return the continent and a statement that life expectancy is less than threshold, otherwise return the continent and a statement that life expectancy is greater than threshold:
thresholdValue <- 50 for (iContinent in unique(gapminder$continent)) { tmp <- mean(gapminder[gapminder$continent == iContinent, "lifeExp"]) if (tmp < thresholdValue){ cat("Average Life Expectancy in", iContinent, "is less than", thresholdValue, "\n") } else { cat("Average Life Expectancy in", iContinent, "is greater than", thresholdValue, "\n") } # end if else condition } # end for loop{: .solution} {: .challenge}
As we have seen many times in these lectures, there are many different ways this could have been done in too!