Types of Variables Revisited for Statistics
Core summary
Before any test, you must know your variable type. Whether data is categorical or continuous decides which average, which graph, and which statistical test is valid.
Detailed explanation
Detailed explanation
In Level 1 you met variables; here we look at them through a statistical lens, because the type of a variable is the single most important fact that determines how you analyze it. Choose the wrong test for your data type and the result is meaningless, no matter how powerful the software. There are two big families. Categorical (qualitative) variables sort people into groups. They come in two flavors: nominal, where categories have no order (blood type, gender, nationality), and ordinal, where categories have a meaningful order but unequal or unknown gaps (cancer stage I to IV, pain rated mild/moderate/severe, Likert agreement scales). Numerical (quantitative) variables are measured numbers and also come in two flavors: discrete, which take only whole counts (number of seizures, number of admissions), and continuous, which can take any value on a scale (weight, blood pressure, hemoglobin, time to an event). A special and very common case is the binary variable, exactly two categories such as alive/dead or cured/not cured. Why does this matter so much? Because each type carries different information and obeys different rules. You can calculate a mean age (continuous) but not a mean blood type (nominal); 'the average blood type is AB-and-a-half' is nonsense. For categorical data you report counts and percentages; for continuous data you report a center (mean or median) and a spread. The type also dictates the graph: bar and pie charts for categorical data, histograms and box plots for continuous data. Most importantly, it dictates the test: comparing proportions uses a chi-squared test, while comparing means uses a t-test or ANOVA. A practical habit: the first thing to do with any dataset is classify every variable. Make a list, this column is nominal, that one is continuous, this outcome is binary. This single step prevents the most common beginner error, running a test designed for one data type on a variable of another type. Sometimes you deliberately change a variable's type, for example converting continuous age into categories (under 65 versus 65 and over). This is called categorization; it makes results easier to communicate but throws away information, so do it thoughtfully, not by default.
Clinical example
A resident records tumor stage (I to IV) and tries to report the 'mean stage' as 2.7. Stage is ordinal, the gaps between stages are not equal, so a mean is misleading. The correct summary is how many patients fell into each stage, with percentages.
Research example
In a diabetes study, HbA1c (continuous) is compared between two groups with a t-test, while the proportion reaching target HbA1c below 7% (binary) is compared with a chi-squared test. Same underlying data, but the variable type chosen for each question determines the test.
Knowledge check
Q1. A study records patients' ABO blood type. Which summary is appropriate?
Q2. Cancer stage I, II, III, IV is what type of variable?
Q3. Why classify every variable's type before analysis?