Processes raw STEPS survey data: renames columns, coerces types, derives standard indicators, handles missing values, and applies plausibility checks.

clean_steps_data(
  data,
  cols,
  age_min = 18,
  age_max = 69,
  bp_sbp_threshold = 140,
  bp_dbp_threshold = 90,
  bmi_overweight = 25,
  bmi_obese = 30,
  glucose_threshold = 7,
  glucose_impaired_threshold = 6.1,
  chol_threshold = 5
)

Arguments

data

A data frame (typically from import_steps_data()).

cols

A named list of column names, as returned by detect_steps_columns().

age_min

Minimum age for inclusion (default 18).

age_max

Maximum age for inclusion (default 69).

bp_sbp_threshold

SBP threshold for raised BP (default 140; Mongolia uses 130).

bp_dbp_threshold

DBP threshold for raised BP (default 90; Mongolia uses 80).

bmi_overweight

BMI threshold for overweight (default 25.0).

bmi_obese

BMI threshold for obesity (default 30.0).

glucose_threshold

Fasting glucose threshold for raised glucose / diabetes in mmol/L (default 7.0).

glucose_impaired_threshold

Fasting glucose threshold for impaired fasting glucose in mmol/L (default 6.1).

chol_threshold

Total cholesterol threshold for raised cholesterol in mmol/L (default 5.0).

Value

A data frame with standardised and derived variables, ready for survey design setup.

Details

The function performs the following transformations:

  • Renames columns to standard names (age, sex, wt_final, etc.)

  • Converts numeric strings to appropriate types

  • Restricts age to [age_min, age_max]

  • Creates WHO standard age groups (18-24, 25-34, etc.)

  • Harmonises sex coding to Male/Female

  • Derives body mass index (BMI) and categories

  • Averages blood pressure readings (last 2 of 3)

  • Recodes yes/no variables to logical

  • Creates derived risk indicators (raised BP, diabetes, etc.)

  • Applies plausibility checks to measurements

  • Drops records with missing age or sex