Getting comfortable with R and RStudio

Author

Dr Ashwini Kalantri

What This Session Is (and Isn’t)

Warning

This session is not an R programming course. You do not need to memorise anything here.

The goal is to build enough familiarity that when you see R code in the following sessions, you can:

  • Recognise what it is doing
  • Know where to change an input value
  • Run it yourself and see the output
  • Not be intimidated by it

Think of this as learning to read a new language well enough to follow a conversation — not to write poetry.

The RStudio Interface

When you open RStudio, you see four panels. Understanding what each one does will help you follow along in every session.

  • Source Editor (top-left): Where you write and edit code files (.R, .qmd)
  • Console (bottom-left): Where code runs and output appears
  • Environment (top-right): Shows all variables and data currently in memory
  • Files / Plots / Help (bottom-right): File browser, plot viewer, package documentation

For this workshop, you will mostly open a .qmd file in the Source panel, run code chunks by clicking the green ▶ button, and see results in the Console or Plots tab.

ImportantAlways open the project file first

Always open the workshop by double-clicking the .Rproj file in the workshop folder — not by opening .qmd files directly. This ensures R looks for files in the right place. If you skip this step, data import commands will fail with a cannot open file error.

The Workflow

Every session in this workshop follows the same four steps. Come back to this whenever you feel unsure about what to do next.

Code
flowchart LR
  A["<b>READ</b><br/>the code<br/><i>(understand logic)</i>"]:::blue --> B["<b>MODIFY</b><br/>a parameter<br/><i>(change a number)</i>"]:::green --> C["<b>RUN</b><br/>the chunk<br/><i>(click play)</i>"]:::orange --> D["<b>INTERPRET</b><br/>the output<br/><i>(what changed?)</i>"]:::red
  D -. "Try again" .-> B

classDef blue   fill:#4e79a7,color:#fff
classDef green  fill:#59a14f,color:#fff
classDef orange fill:#f28e2b,color:#fff
classDef red    fill:#e15759,color:#fff

flowchart LR
  A["<b>READ</b><br/>the code<br/><i>(understand logic)</i>"]:::blue --> B["<b>MODIFY</b><br/>a parameter<br/><i>(change a number)</i>"]:::green --> C["<b>RUN</b><br/>the chunk<br/><i>(click play)</i>"]:::orange --> D["<b>INTERPRET</b><br/>the output<br/><i>(what changed?)</i>"]:::red
  D -. "Try again" .-> B

classDef blue   fill:#4e79a7,color:#fff
classDef green  fill:#59a14f,color:#fff
classDef orange fill:#f28e2b,color:#fff
classDef red    fill:#e15759,color:#fff

Working with .qmd Files

All workshop materials are in Quarto (.qmd) files. These combine three things in one document:

  • Text — plain-language explanations, like this paragraph
  • Code chunks — grey blocks containing R code, with a green ▶ button to run them
  • Outputs — results, tables, and plots that appear directly below each chunk

To work with a .qmd file, open it in RStudio, read the text explanations, and run code chunks one at a time using the ▶ button. Modify a value and re-run to see what changes. You can also click Render in the toolbar to produce a complete HTML document with everything formatted together.

Your First Code

Run the chunk below to confirm everything is working. Click the green ▶ button.

Code
print("Coding for better health decisions.")
[1] "Coding for better health decisions."

If you see the text printed below the chunk, your setup is working correctly.

Comments

In R, the # symbol marks a comment — anything after it on that line is ignored by R. Comments are notes for the human reading the code, not instructions for the computer. You will see them throughout every session. Read them — they are often the clearest explanation of what a section of code is doing.

Code
# Print the text
print("Coding for better health decisions.")

print("Coding for better health decisions.") # Print the text

# Multi-line comment
# about printing the text
print("Coding for better health decisions.")

Assignment

In R, you store a value under a name using the assignment operator <-. Once stored, you can use the name anywhere in place of the value.

Code
text <- "Coding for better health decisions."
print(text)
[1] "Coding for better health decisions."

When you see <- in workshop code, you know: this is where a value is stored, and this is what you change to explore a different scenario.

You may occasionally see -> or = used for assignment in older code. Prefer <- in your own work.

Operators

Arithmetic

R handles all standard arithmetic. These operators appear throughout model calculations for costs, probabilities, and outcomes.

Code
2 + 5    # Addition
[1] 7
Code
73 - 32  # Subtraction
[1] 41
Code
47 * 7   # Multiplication
[1] 329
Code
86 / 3   # Division
[1] 28.66667
Code
8 ^ 2    # Exponentiation
[1] 64

Relational

Relational operators compare two values and return TRUE or FALSE. You will encounter these inside conditional checks — for example, testing whether an ICER is below a cost-effectiveness threshold.

Code
5 > 6    # Greater than
[1] FALSE
Code
5 < 6    # Less than
[1] TRUE
Code
6 == 6   # Equal to — note the double == for comparison, not single =
[1] TRUE
Code
8 >= 5   # Greater than or equal to
[1] TRUE
Code
7 <= 10  # Less than or equal to
[1] TRUE
Code
9 != 10  # Not equal to
[1] TRUE

Objects

In R, everything is stored as an object. The three types you will encounter most in this workshop are vectors, matrices, and data frames.

Vectors

A vector is an ordered collection of values. Create one with c() — short for “combine”.

Code
vec <- c(2, 4, 6, 8, 3, 5.5)
vec
[1] 2.0 4.0 6.0 8.0 3.0 5.5

Vectors can also have named elements — a pattern you will see constantly in HTA models, where each value corresponds to a health state or treatment strategy:

Code
costs <- c(treatment_A = 5000, treatment_B = 3000, treatment_C = 7000)
costs
treatment_A treatment_B treatment_C 
       5000        3000        7000 

You can retrieve a specific element by its position in the vector:

Code
vec[5]   # Fifth element of vec
[1] 3

Matrix

A matrix is a two-dimensional object with rows and columns. In HTA, you will encounter matrices as transition probability tables for Markov models, where each cell holds the probability of moving from one health state to another.

Code
mat <- matrix(1:36, nrow = 6, ncol = 6, byrow = FALSE)
mat
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    7   13   19   25   31
[2,]    2    8   14   20   26   32
[3,]    3    9   15   21   27   33
[4,]    4   10   16   22   28   34
[5,]    5   11   17   23   29   35
[6,]    6   12   18   24   30   36

You can access a specific element, an entire column, or an entire row:

Code
mat[3, 5]   # Element at row 3, column 5
[1] 27
Code
mat[, 5]    # Entire column 5
[1] 25 26 27 28 29 30
Code
mat[5, ]    # Entire row 5
[1]  5 11 17 23 29 35

Matrix multiplication

Code
mat1 <- matrix(c(3, 5, 1, 9), nrow = 2, ncol = 2, byrow = FALSE)
mat2 <- matrix(c(7, 4, 2, 8), nrow = 2, ncol = 2, byrow = FALSE)

mat1 %*% mat2
     [,1] [,2]
[1,]   25   14
[2,]   71   82

Data Frames

A data frame is R’s equivalent of a spreadsheet table — rows are observations, columns are variables.

Code
age    <- c(12, 24, NA, 23, 65, 33)          # NA = missing value

gender <- c("M", "F", "F", "M", "M", "F")

occu   <- factor(c(1, 4, 3, 2, 4, 5),        # factor() = categorical variable
                 levels = c(1:5),
                 labels = c("Unemp", "Service", "Student", "Business", "Prof"))

dob    <- c(as.Date("1993-01-16"),            # as.Date() converts text to a date
            as.Date("1963-12-24"),
            as.Date("1971-01-05"),
            as.Date("1982-11-11"),
            as.Date("1984-05-15"),
            as.Date("1999-03-07"))

df <- data.frame(age, gender, occu, dob)
df
  age gender     occu        dob
1  12      M    Unemp 1993-01-16
2  24      F Business 1963-12-24
3  NA      F  Student 1971-01-05
4  23      M  Service 1982-11-11
5  65      M Business 1984-05-15
6  33      F     Prof 1999-03-07

You access a specific column using $:

Code
df$age
[1] 12 24 NA 23 65 33

Functions

A function takes one or more inputs (called arguments), does something with them, and returns an output. The general syntax is:

Code
function_name(argument1 = value1, argument2 = value2, ...)

For example, mean() calculates the average of a vector. The na.rm = TRUE argument tells it to remove missing values before calculating — without it, a single NA in the data will make the result NA.

Code
mean(x = df$age, na.rm = TRUE)
[1] 31.4
TipGetting help

Type ? followed by any function name in the Console to open its documentation — arguments, return values, and worked examples:

?mean
?round
?ggplot

Packages

R’s base installation covers the basics. Packages are collections of additional functions written by the R community that extend what R can do. Install a package once, load it at the start of each session.

Code
# Install a package — only need to do this once
install.packages("dplyr")
Code
# Load a package — need to do this each session
library(dplyr)

dplyr::glimpse(df)
Rows: 6
Columns: 4
$ age    <dbl> 12, 24, NA, 23, 65, 33
$ gender <chr> "M", "F", "F", "M", "M", "F"
$ occu   <fct> Unemp, Business, Student, Service, Business, Prof
$ dob    <date> 1993-01-16, 1963-12-24, 1971-01-05, 1982-11-11, 1984-05-15, 199…

Analogy: Installing is like downloading an app. Loading is like opening it. In this workshop, the packages you need are listed at the top of each session file — you do not need to find or choose them yourself.

Key packages used in this workshop:

Package Purpose
ggplot2 Graphs and visualisation
dplyr Data manipulation
readxl, haven, rio Importing data
heemod Markov cohort models
BCEA, dampack Cost-effectiveness analysis
flexsurv Survival modelling
shiny Interactive web applications

Pipes

A pipe passes the output of one function directly into the next, letting you chain steps in a readable left-to-right sequence without creating intermediate objects. R has two pipe operators — |> (built into R) and %>% (from the tidyverse). They behave the same way in most situations.

Code
df |>
  select(age, dob, occu) |>
  summarise(mean_age = mean(age, na.rm = TRUE))
  mean_age
1     31.4

Read this as: take df, then select the age, dob, and occu columns, then calculate the mean age. select() picks columns, summarise() calculates summaries. The pipe replaces deeply nested function calls with a sequence that reads the way you think.

Importing Data

Using the GUI

RStudio has a point-and-click import tool. Go to File → Import Dataset and choose your file type. RStudio generates the import code for you — a useful way to learn the syntax before writing it yourself.

Importing with code

Once you know the syntax, code-based import is faster and fully reproducible:

Code
# CSV
data <- read.csv("files/data.csv")

# Excel
library(readxl)
data <- read_excel("files/data.xlsx")

# Stata and SPSS
library(haven)
data <- read_sav("files/data.sav")
data <- read_dta("files/data.dta")

The rio package provides a single import() function that detects the file format automatically — the simplest approach when you work with multiple file types:

Code
library(rio)
data <- rio::import("files/data.xlsx")
data <- rio::import("files/data.csv")
data <- rio::import("files/data.sav")
data <- rio::import("files/data.dta")
Important

All import commands use relative file paths that only work when the .Rproj file has been opened first. A cannot open file error almost always means this step was skipped.

Loops

A for loop repeats a block of code a fixed number of times — once for each value in a sequence. In HTA modelling, loops are used to run the model across time cycles and across thousands of PSA iterations.

Code
library(rio)
data <- rio::import("files/data.csv")
library(dplyr)
data <- data |> mutate(bmi = round(wt / ((ht / 100)^2), 1))

for (i in 1:nrow(data)) {
  bmi <- round(data$wt[i] / ((data$ht[i] / 100)^2), 2)
  cat("BMI of Record No", i, "is", bmi, "\n")
}
BMI of Record No 1 is 21.63 
BMI of Record No 2 is 20.62 
BMI of Record No 3 is 23.7 
BMI of Record No 4 is 15.62 
BMI of Record No 5 is NA 
BMI of Record No 6 is 18.76 
BMI of Record No 7 is 14.2 
BMI of Record No 8 is 27.02 
BMI of Record No 9 is 19.72 
BMI of Record No 10 is 21.22 
BMI of Record No 11 is 22.52 
BMI of Record No 12 is 24.08 
BMI of Record No 13 is 15.01 
BMI of Record No 14 is 16.43 
BMI of Record No 15 is 21.66 
BMI of Record No 16 is 15.87 
BMI of Record No 17 is NA 
BMI of Record No 18 is 17.63 
BMI of Record No 19 is 15.78 
BMI of Record No 20 is 17.21 
BMI of Record No 21 is 16.44 
BMI of Record No 22 is 20.54 
BMI of Record No 23 is 18.32 
BMI of Record No 24 is 21.71 
BMI of Record No 25 is 13.78 
BMI of Record No 26 is 20.72 
BMI of Record No 27 is 29.63 
BMI of Record No 28 is 16.34 
BMI of Record No 29 is 20.77 
BMI of Record No 30 is 17.79 
BMI of Record No 31 is 18.06 
BMI of Record No 32 is 20.62 
BMI of Record No 33 is 22.58 
BMI of Record No 34 is 19.58 
BMI of Record No 35 is 26.12 
BMI of Record No 36 is 18.31 
BMI of Record No 37 is 29.66 
BMI of Record No 38 is 18.63 
BMI of Record No 39 is 22.35 
BMI of Record No 40 is 20.96 
BMI of Record No 41 is 17.44 
BMI of Record No 42 is 14.61 
BMI of Record No 43 is 18.25 
BMI of Record No 44 is 19.17 
BMI of Record No 45 is 18.45 
BMI of Record No 46 is 23.62 
BMI of Record No 47 is 29.97 
BMI of Record No 48 is 17.41 
BMI of Record No 49 is 18.63 
BMI of Record No 50 is 24.44 
BMI of Record No 51 is 17.74 
BMI of Record No 52 is 16.76 
BMI of Record No 53 is 19.69 
BMI of Record No 54 is 16.57 
BMI of Record No 55 is 15.15 
BMI of Record No 56 is 27.96 
BMI of Record No 57 is NA 
BMI of Record No 58 is 18.25 
BMI of Record No 59 is 15.58 
BMI of Record No 60 is 18.13 
BMI of Record No 61 is 15.95 
BMI of Record No 62 is 14.18 
BMI of Record No 63 is 19.47 
BMI of Record No 64 is 24.08 
BMI of Record No 65 is 18.62 
BMI of Record No 66 is 18.07 
BMI of Record No 67 is 16.94 
BMI of Record No 68 is 18.01 
BMI of Record No 69 is 21.54 
BMI of Record No 70 is 17.84 
BMI of Record No 71 is 18.78 
BMI of Record No 72 is 27.89 
BMI of Record No 73 is NA 
BMI of Record No 74 is 16.23 
BMI of Record No 75 is 20.93 
BMI of Record No 76 is 18.83 
BMI of Record No 77 is 16.45 
BMI of Record No 78 is 21.38 
BMI of Record No 79 is 26.78 
BMI of Record No 80 is 15.64 
BMI of Record No 81 is 15.88 
BMI of Record No 82 is 16.44 
BMI of Record No 83 is 21.5 
BMI of Record No 84 is 23.77 
BMI of Record No 85 is 26.23 
BMI of Record No 86 is 22.52 
BMI of Record No 87 is 15.85 
BMI of Record No 88 is 17.37 
BMI of Record No 89 is 28.52 
BMI of Record No 90 is 19.71 
BMI of Record No 91 is 31.39 
BMI of Record No 92 is 15.45 
BMI of Record No 93 is 12.11 
BMI of Record No 94 is 21.8 
BMI of Record No 95 is 24.86 
BMI of Record No 96 is 18.99 
BMI of Record No 97 is 24.38 
BMI of Record No 98 is 15.84 
BMI of Record No 99 is NA 
BMI of Record No 100 is 16.69 
BMI of Record No 101 is 20.01 
BMI of Record No 102 is 23.77 
BMI of Record No 103 is 18.7 
BMI of Record No 104 is 23.35 
BMI of Record No 105 is 18.8 
BMI of Record No 106 is 28.97 
BMI of Record No 107 is 26.91 
BMI of Record No 108 is 15.45 
BMI of Record No 109 is 20.78 
BMI of Record No 110 is 24.98 
BMI of Record No 111 is 17.23 
BMI of Record No 112 is 15.52 
BMI of Record No 113 is 26.79 
BMI of Record No 114 is 21.61 
BMI of Record No 115 is 20.35 
BMI of Record No 116 is 18.09 
BMI of Record No 117 is 21.1 
BMI of Record No 118 is 19.5 
BMI of Record No 119 is 15.04 
BMI of Record No 120 is 17.1 
BMI of Record No 121 is 17 
BMI of Record No 122 is 18.55 
BMI of Record No 123 is 21.34 
BMI of Record No 124 is 21.33 
BMI of Record No 125 is 17.21 
BMI of Record No 126 is 16.98 
BMI of Record No 127 is 16.36 
BMI of Record No 128 is NA 
BMI of Record No 129 is 27.63 
BMI of Record No 130 is 24.23 
BMI of Record No 131 is 17.74 
BMI of Record No 132 is 21.28 
BMI of Record No 133 is 18.87 
BMI of Record No 134 is 19.58 
BMI of Record No 135 is 28.63 
BMI of Record No 136 is 23.96 
BMI of Record No 137 is 28.41 
BMI of Record No 138 is 23.86 
BMI of Record No 139 is 27.67 
BMI of Record No 140 is 16.28 
BMI of Record No 141 is 21.83 
BMI of Record No 142 is 15.47 
BMI of Record No 143 is 26.42 
BMI of Record No 144 is 22.58 
BMI of Record No 145 is 16.15 
BMI of Record No 146 is 20.54 
BMI of Record No 147 is 25.49 
BMI of Record No 148 is 16.52 
BMI of Record No 149 is 22.95 
BMI of Record No 150 is 18.08 

When you see a for loop in the workshop code, you know: the model is repeating a calculation. You do not need to understand every line — just recognise that something is being done repeatedly, and that the values at the top of the loop control what is being repeated and how many times.

Conditional Logic

An if/else block runs different code depending on whether a condition is TRUE or FALSE. You will see this used in cost-effectiveness decisions — is the ICER below the willingness-to-pay threshold? — and in model checks.

Code
if (condition) {
  # code to run if condition is TRUE
} else {
  # code to run if condition is FALSE
}

A simple example:

Code
value <- 5000

if (value > 200) {
  cat("High")
} else {
  cat("Low")
}
High

Graphs

ggplot2 is the standard package for data visualisation in R. It builds plots in layers — you start with the data, map variables to axes, then add geometric elements. The plot below uses the dataset imported in the Loops section.

Code
library(ggplot2)

ggplot(data, aes(x = ht, y = wt, colour = bmi)) +
  geom_point(size = 2) +
  labs(x = "Height", y = "Weight", title = "Height vs Weight") +
  theme_minimal()

Height vs Weight coloured by BMI

When you see ggplot in the workshop code, you know: a chart is being created. The aes() part maps variables to visual properties, the geom_ function determines the chart type, and labs() sets the labels. Changing the data or the variables inside aes() changes the plot.

Three Things to Remember

Almost nothing from this session needs to be memorised. Instead, keep these three things in mind throughout the workshop:

Note1. Values are stored with <-

Change them by editing the number on the right. Every exercise in this workshop is essentially this action.

Note2. Code runs top to bottom

Run chunks in order. If you see an error about an object not being found, scroll up and run the chunks above first.

Note3. Read the error message

It usually tells you exactly what went wrong — a missing package, a typo in a variable name, a file not found. Read it before asking for help.

Everything else you can look up as needed — or ask an AI tool, as we will discuss in Session 7.

Try It Yourself

Before moving to Session 3, modify the values below and re-run the chunk. Change the costs and QALYs to see how the ICER and the cost-effectiveness decision change.

Code
# YOUR TASK: Change these values and re-run
cost_A <- 3000   # Cost of intervention A in ₹
cost_B <- 1500   # Cost of intervention B in ₹
qaly_A <- 0.82   # QALYs gained with A
qaly_B <- 0.75   # QALYs gained with B

# ── Do not modify below this line ─────────────────────────────────────────────

icer <- (cost_A - cost_B) / (qaly_A - qaly_B)
cat("ICER = ₹", round(icer, 0), "per QALY gained\n")
ICER = ₹ 21429 per QALY gained
Code
wtp <- 100000
if (icer < wtp) {
  cat("→ Cost-effective at WTP = ₹", format(wtp, big.mark = ",", scientific = FALSE), "\n")
} else {
  cat("→ NOT cost-effective at WTP = ₹", format(wtp, big.mark = ",", scientific = FALSE), "\n")
}
→ Cost-effective at WTP = ₹ 100,000 

If you ran this and changed the result — you are ready for the rest of the workshop.


Quick Reference

Concept Syntax Example
Assign a value name <- value cost <- 5000
Named vector c(name = value) c(A = 5000, B = 3000)
Index a vector vec[n] vec[5]
Matrix multiply %*% trace %*% tm
Access a column df$col df$age
Function syntax fn(arg = val) mean(x, na.rm = TRUE)
Pipe \|> df \|> select(age)
If / else if (cond) {} else {} if (icer < wtp) {...}
Get help ?function ?mean
Comment # # this is a note

Session 2 · R for HTA (Basics) · RRC-HTA, AIIMS Bhopal