Psychometrics Study Guide
Study Guide
📖 Core Concepts
Psychometrics – scientific study of how to measure unobservable (latent) psychological attributes such as intelligence, personality, and attitudes.
Latent construct – a trait that cannot be directly observed; inferred from test scores or item responses.
Classical Test Theory (CTT) – observed score = true score + error. The true score is the average of an infinite number of repeat administrations.
Item Response Theory (IRT) – models the probability that a person with a given trait level will answer an item correctly (or endorse it). Provides trait estimates that are sample‑independent.
Rasch Model – a one‑parameter IRT model that requires items to meet strict measurement criteria (e.g., specific objectivity).
Reliability – consistency of scores across occasions, forms, or item halves.
Validity – evidence that test scores represent the intended construct and can be used for the intended purpose.
---
📌 Must Remember
Observed score formula: $X = T + E$ (where $X$ = observed, $T$ = true, $E$ = error).
Spearman–Brown prediction: $\rho{new} = \frac{k\rho{half}}{1+(k-1)\rho{half}}$ (adjusts split‑half reliability to full‑test length).
Cronbach’s α: $\alpha = \frac{k}{k-1}\left(1-\frac{\sum \sigmai^{2}}{\sigma{X}^{2}}\right)$; α ≥ .70 is generally acceptable.
Test‑retest & equivalent‑forms reliability – both reported as Pearson $r$ between two administrations.
Validity types:
Content – items cover the domain.
Criterion‑related – predicts an external outcome (concurrent vs. predictive).
Construct – patterns of relationships conform to theory.
Reliability ⟹ Validity? – Reliability is necessary but not sufficient for validity.
---
🔄 Key Processes
Developing a CTT‑based test
Define construct → write items → pilot test → compute item‑total correlations → calculate reliability (α, split‑half, test‑retest) → revise items.
Running a factor analysis
Compute correlation matrix → extract factors (e.g., principal axis) → retain factors with eigenvalue > 1 → rotate (varimax/oblimin) → interpret loadings.
Estimating IRT parameters (e.g., 2‑PL model)
Choose model (difficulty $b$, discrimination $a$) → fit model to response data (MLE or Bayesian) → evaluate item fit → obtain person trait $\theta$ with SE.
Applying the Spearman–Brown formula
Obtain split‑half reliability $r{hh}$ → decide new test length factor $k$ → plug into formula for projected full‑test reliability.
---
🔍 Key Comparisons
CTT vs. IRT – CTT scores are test‑sample dependent; IRT scores are (theoretically) invariant across samples.
Rasch vs. 2‑PL IRT – Rasch fixes discrimination ($a$) across items; 2‑PL allows $a$ to vary, giving more flexibility but fewer measurement guarantees.
Test‑retest vs. Equivalent‑forms reliability – Test‑retest measures stability over time; equivalent‑forms measures consistency across parallel test versions.
Concurrent vs. Predictive validity – Concurrent: criterion measured at same time; Predictive: criterion measured later.
---
⚠️ Common Misunderstandings
“High reliability = high validity.” Reliability is required but does not ensure the test measures the right construct.
“Cronbach’s α = internal consistency only.” α is actually the average of all possible split‑half reliabilities; it also reflects test length.
“Factor analysis tells you the ‘true’ number of factors.” The eigenvalue‑>1 rule is a heuristic; parallel analysis or model‑fit indices are often more accurate.
“IRT eliminates measurement error.” IRT provides standard errors for each trait estimate, but error still exists.
---
🧠 Mental Models / Intuition
True score as the “center” of a cloud – imagine each person’s repeated scores forming a cloud; the true score is the cloud’s center, error spreads the points around it.
IRT curves as “item doors” – each item’s probability curve is a door that opens (high probability) once the trait level crosses its difficulty threshold.
Factor analysis as “dimensional lenses” – the correlation matrix is a blurry image; extracting factors focuses the lens on underlying dimensions.
---
🚩 Exceptions & Edge Cases
Low‑variance samples – reliability estimates (e.g., Cronbach’s α) can be artificially deflated when the sample shows little score spread.
Multidimensional items – standard unidimensional IRT assumptions break down; consider multidimensional IRT or bifactor models.
Small sample factor analysis – eigenvalue‑>1 rule may over‑extract; use parallel analysis or minimum‑average‑partial (MAP) test.
---
📍 When to Use Which
Choose CTT when you need quick, sample‑specific reliability/validity evidence and have limited sample size.
Choose IRT (or Rasch) when you require item‑level information, adaptive testing, or scores that are comparable across groups.
Use Rasch for high‑stakes measurement where strict invariance (specific objectivity) is required.
Apply factor analysis when you want to uncover latent dimensions underlying a set of items.
Use bifactor analysis if you suspect a strong general factor plus distinct specific factors (e.g., intelligence batteries).
---
👀 Patterns to Recognize
Item‑total correlation < .30 → item likely low quality; consider removal.
Flat IRT item characteristic curves → low discrimination; may not contribute much information.
Large residuals after factor extraction → possible multidimensionality or poorly fitting items.
Alpha increases markedly when an item is deleted → that item is reducing internal consistency.
---
🗂️ Exam Traps
“Alpha > .90 is always better.” – Very high α may indicate redundant items, not necessarily better measurement.
Confusing reliability with validity – Remember the directionality: reliability → necessary → validity, not the reverse.
Choosing eigenvalue > 1 as the sole factor‑number rule – Many exams test knowledge of alternative methods (scree plot, parallel analysis).
Assuming the Rasch model is always preferable – It is stricter; if data violate Rasch assumptions, a 2‑PL or 3‑PL IRT model may fit better.
Treating test‑retest correlation of .70 as “good enough.” – Context matters; high‑stakes decisions often demand > .90 stability.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or