Subjects/Social Science/Psychology/Psychology/Psychological testing

Introduction to Psychological Testing

Understand the purpose and categories of psychological testing, the key reliability and validity concepts, and the ethical practices required for responsible use.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What is the systematic use of standardized instruments to assess aspects of a person’s mental life called?

1 of 44

Summary

Introduction to Psychological Testing What Is Psychological Testing? Psychological testing is the systematic use of standardized instruments to measure and evaluate aspects of a person's mental life. Rather than relying solely on intuition or informal observation, psychological tests provide a structured, objective way to assess abilities, personality traits, attitudes, and psychological problems. Think of psychological testing as a measurement tool, much like a thermometer measures temperature. Just as a thermometer produces consistent, comparable readings, psychological tests are designed to produce scores that can be meaningfully compared across individuals and interpreted using established guidelines. The key advantage of standardized testing is that it allows us to gather information that would otherwise be difficult to compare. For example, if one teacher describes a student as "smart" and another as "struggling," we don't know if they're using the same standards. A standardized IQ test, by contrast, provides a numerical score that has the same meaning regardless of who administers it. Standardization and Scoring Standardization is what makes psychological tests reliable and fair. When a test is standardized, every person who takes it receives the same items, presented in the same format, under the same conditions. This consistency is crucial because it ensures that differences in scores reflect true differences in what we're trying to measure, not differences in how the test was administered. How Scores Work Test scores can be expressed in three main ways: Raw Scores are the initial tallies—for example, the number of items answered correctly. However, raw scores alone are hard to interpret. If someone gets 65 items correct on a 100-item test, we don't immediately know if that's good or poor without knowing more. Percentiles tell us what percentage of people in a reference group scored at or below a particular score. A percentile rank of 75 means the person scored as well as or better than 75% of the comparison group. This makes scores easy to understand. Standardized Scores (such as z-scores or T-scores) convert raw scores into a common scale with a known mean and standard deviation. This allows us to compare scores across different tests. The Role of Norms Norms are the benchmarks derived from large, representative samples that allow us to interpret individual scores. When a test is developed, it's administered to hundreds or thousands of people to establish what "typical" performance looks like. These norms become the reference point for understanding any new test-taker's performance. Here's why this matters: a child scoring 95 on an achievement test means something different depending on the norm group. If the norm is based on children in the same grade and age, the score tells us how this child compares to appropriate peers. Using inappropriate norms—such as comparing a third-grader to high school students—leads to misinterpretation and unfair judgments. Effective norms should account for relevant demographic characteristics such as age, grade level, gender, and cultural background. A test normed only on middle-class, English-speaking populations may not fairly represent a student from a different cultural or linguistic background. Criteria and cut-off scores are different from norms. While norms describe how people typically perform, criteria establish the performance level needed to meet a standard or diagnostic threshold—for example, the raw score that indicates clinically significant depression on a mental health screening test. Where Psychological Tests Are Used In educational settings, tests serve multiple purposes. They identify academic strengths and weaknesses to guide instructional planning, help place students in appropriate programs, and assess whether students have mastered specific content. In clinical and health-care settings, tests help diagnose mental health disorders, guide treatment recommendations, and monitor whether therapy or intervention is working over time. A crucial principle across all settings is that test results should never stand alone. Scores are most useful when combined with observations, interviews, work samples, and other sources of information. A single test score is one data point, not a definitive conclusion about a person. Categories of Psychological Tests Understanding different test types helps you recognize their specific purposes and appropriate uses. Ability and Achievement Tests Ability tests estimate how well someone can solve problems or learn new material. They measure cognitive potential—what a person is capable of doing. Intelligence quotient (IQ) tests fall into this category; they assess general cognitive ability, reasoning, and problem-solving skills. Achievement tests measure mastery of specific academic content that has already been taught. They ask "what has this person learned?" rather than "what is this person capable of learning?" The Scholastic Assessment Test (SAT) is a well-known example that bridges both categories. It evaluates college readiness and academic aptitude by assessing skills that predict success in college-level coursework. Scores from ability and achievement tests are commonly used for educational placement, college admissions, and research purposes. Personality Inventories Personality inventories assess stable patterns in how people think, feel, and behave. These are self-report instruments where people typically respond to statements on Likert-type rating scales (like "Strongly Disagree" to "Strongly Agree"). The Minnesota Multiphasic Personality Inventory (MMPI-2) is widely used in clinical settings to evaluate psychopathology and personality dimensions. It contains hundreds of items and produces a detailed profile showing different aspects of personality and psychological functioning. The Big Five questionnaire takes a different approach, measuring five broad personality traits that research has consistently identified across cultures: Openness to experience (curiosity, preference for variety) Conscientiousness (organization, responsibility, self-discipline) Extraversion (sociability, assertiveness, energy) Agreeableness (compassion, cooperation, trust) Neuroticism (tendency toward negative emotions, worry, stress reactivity) Personality inventories provide profiles that can inform therapy approaches, career counseling decisions, and research on personality-related outcomes. Neuropsychological Batteries Neuropsychological batteries comprehensively evaluate cognitive functions such as memory, attention, executive control (planning and decision-making), and language. Rather than a single test, they combine multiple subtests into a battery that paints a detailed picture of cognitive strengths and weaknesses. These batteries are especially valuable after brain injury, stroke, or neurological illness when clinicians need to identify specific areas of cognitive impairment. Tests might include verbal learning tasks (memorizing word lists), visual-spatial puzzles, and reaction-time measures. The results guide rehabilitation efforts by showing exactly which functions need targeted intervention. Diagnostic and Symptom Scales These instruments measure the severity of specific mental health symptoms. The Beck Depression Inventory (BDI), for example, asks about symptoms of depression and their intensity. A person rates how much they agree with statements like "I feel sad" or "I have lost interest in my hobbies." Symptom scales are especially useful for monitoring treatment progress—if someone scores high on depression at the start of therapy and lower after several weeks, we have objective evidence that treatment is working. Scores can also indicate whether a symptom level meets the diagnostic criteria for a disorder, helping clinicians make diagnostic decisions. These scales are typically self-report questionnaires with fixed response options, making them practical and easy to administer repeatedly. Test Formats and Item Types Psychological tests use different formats depending on what they're trying to measure. Multiple-Choice Items Multiple-choice items present a stem (the question) and several response options, with only one correct answer. They are efficient for assessing knowledge, reasoning, and problem-solving because: Scoring is objective and can be automated They allow comprehensive coverage of content in a reasonable time The incorrect options (called distractors) can be strategically designed to identify different levels of understanding A well-written multiple-choice test has high reliability when items clearly distinguish between different levels of understanding and when items evenly represent the construct being measured. True-False Items True-false statements are straightforward to score but vulnerable to guessing (a test-taker has a 50% chance of being correct by chance alone). They work best when construct coverage is carefully balanced to avoid bias—having significantly more "true" answers than "false" answers would bias results. Rating Scales Rating scales provide nuanced information about the intensity or frequency of thoughts, feelings, and behaviors. For instance, rather than asking "Do you feel anxious?" (yes/no), a rating scale might ask "How much anxiety do you experience? (1 = None, 5 = Extreme)." Reliability improves when tests use multiple items tapping the same underlying construct. If three different items all measure anxiety and they correlate highly with each other, we're confident we're measuring anxiety consistently. Performance Tasks Performance tasks require people to actually complete a physical or mental activity—not just answer questions about it. Examples include copying a geometric design, arranging blocks into a pattern, or solving a puzzle within a time limit. Scores reflect accuracy, speed, or quality of performance. Performance tasks are essential for measuring abilities that cannot be captured by written items. Standardization for performance tasks includes providing identical materials, instructions, and timing for each person tested, ensuring fair comparison. Understanding Constructs Every psychological test is built around a specific psychological construct—the underlying quality or ability the test aims to measure. Examples include intelligence, extraversion, depression, or anxiety. Before developing a test, psychologists must clearly define the construct in theoretical terms. What exactly is intelligence? Is it general reasoning ability, specific skills, the capacity to adapt to new situations, or all of these? These theoretical questions shape which items are included. Once items are written to represent various facets of the construct, test developers must quantify responses—convert qualitative answers into numerical scores. This process allows us to measure and compare the construct across individuals. Clarity about what construct a test measures is essential because it determines whether test scores are valid and how they should be interpreted. A test claiming to measure intelligence but actually measuring reading ability would be misleading. Evaluating Test Quality: Reliability What Is Reliability? Reliability refers to the consistency of test scores across repeated administrations or parallel forms. A reliable test yields similar results when conditions are comparable. If you take a reliable test twice under similar conditions, your scores should be roughly the same. Think of reliability like a scale in a doctor's office. If the scale is reliable, it shows approximately the same weight each time you step on it. If it fluctuates wildly despite no real weight change, it's unreliable. High reliability is critical because it: Reduces measurement error (the random inaccuracies in any measurement) Increases confidence in scores Is a prerequisite for establishing validity—you cannot have a valid test that is unreliable Reliability is expressed as a coefficient ranging from 0 (no consistency) to 1 (perfect consistency). Coefficients above 0.70 are generally considered acceptable for most psychological measures. Test-Retest Reliability Test-retest reliability assesses whether scores remain stable when the same test is administered to the same people on two separate occasions. The correlation between the two sets of scores indicates reliability. The challenge with this approach is timing. If the interval between testings is too short, people may remember their previous answers, artificially inflating the correlation. If the interval is too long, genuine changes in the person (or in the trait being measured) may occur, lowering the correlation. For relatively stable traits like personality, an interval of several weeks to months is appropriate. Test-retest reliability is especially important for traits that are theoretically expected to be stable—personality traits, general intelligence, and aptitudes. For constructs that change frequently, like anxiety or mood, test-retest reliability is less relevant. Internal Consistency Reliability Internal consistency examines how well all items on a test measure the same underlying construct. If a test is supposed to measure a single construct, the items should correlate with each other—answering one item a certain way should predict how you answer related items. The most common statistic is Cronbach's alpha, which measures the average correlation among all items. A high alpha (generally above 0.80) suggests items are homogeneous—they're all pointing toward the same construct. However, an alpha that is too high (above 0.95) may indicate redundant items that are essentially asking the same question in different words, which wastes space on the test. Split-half reliability provides another internal consistency estimate by dividing the test into two halves and correlating scores on each half. If the two halves measure the same construct, scores should correlate highly. Parallel-Forms Reliability Sometimes two versions of a test are created to measure the same construct with different items. For instance, two versions of a college entrance test might include different math problems but measure the same mathematical reasoning ability. Parallel-forms reliability involves correlating scores from both versions. This method reduces practice effects (performance improving simply because you've taken the test before) because test-takers encounter entirely different items. The challenge is that creating truly parallel forms requires careful matching on item difficulty and content—you need to ensure that Form A and Form B are equally difficult and equally representative of the construct. Inter-Rater Reliability Some tests require subjective judgments by a scorer or observer—think of performance assessments where an examiner rates the quality of a performance, or clinical interviews where a clinician makes judgments about a patient's presentation. Inter-rater reliability assesses the agreement between different observers or scorers. Common statistics include Cohen's kappa (for categorical judgments) and intraclass correlation coefficients (for continuous scores). High inter-rater reliability ensures scores reflect the examinee's actual behavior or performance, not the rater's personal biases or interpretation style. Inter-rater reliability can be improved by providing raters with clear scoring rubrics and training that emphasizes consistency in applying standards. Evaluating Test Quality: Validity What Is Validity? Validity refers to the extent to which a test actually measures what it claims to measure and how useful the test scores are for their intended purpose. A critical insight: validity is not a single property but rather an accumulation of evidence from multiple sources. An important distinction: A test can be reliable without being valid, but a valid test must also be reliable. For example, a scale that consistently reads 5 pounds too heavy is reliable (consistent) but not valid (not measuring true weight). Conversely, if a test is unreliable—producing wildly inconsistent scores—it cannot be valid. Validity judgments are made in context. A test might be valid for predicting college success but not for diagnosing depression. The same test can be appropriate for one purpose but not another. Content Validity Content validity evaluates whether test items adequately represent the entire domain of the construct being measured. For an achievement test covering the American Civil War, do the items cover all major topics (causes, key battles, outcomes, famous figures) or only some? Establishing content validity involves expert review: panels of subject-matter experts examine items to ensure: Relevant topics are covered Irrelevant material is omitted Items accurately reflect the curriculum or domain A content validity index quantifies expert agreement numerically. High content validity is essential for achievement tests that must accurately reflect curriculum standards. Without it, test scores don't meaningfully represent what students have learned. Construct Validity Construct validity addresses the core question: Does this test truly measure the theoretical construct it claims to measure? Evidence for construct validity comes from multiple sources: Factor analysis is a statistical technique that groups related items together, revealing the underlying structure of a test. If a personality test claims to measure extraversion, factor analysis should show that relevant items (outgoing, sociable, assertive) correlate highly with each other and separate from items measuring different constructs. Convergent validity shows strong relationships between the test and other established measures of the same construct. If your new depression scale correlates highly with the Beck Depression Inventory (a well-established measure), that's evidence you're measuring depression. Discriminant validity demonstrates that the test has weak relationships with measures of different constructs. A depression scale should not correlate highly with intelligence or reading ability, because depression is distinct from those constructs. As psychological theories evolve and research accumulates, construct validity evidence is continuously refined and updated. Criterion-Related Validity Criterion-related validity examines how well test scores predict or relate to some external criterion—a real-world outcome we care about. Predictive Validity Predictive validity evaluates how well test scores forecast future performance or behavior. College entrance exams illustrate this: they predict first-year academic performance. To establish predictive validity, researchers conduct longitudinal studies, administering the test, then measuring the criterion (like college GPA) months or years later. The correlation between test scores and the future criterion indicates predictive validity. Valid predictive models support important decisions about selection and placement. Low predictive validity suggests the test doesn't capture the key factors that drive the future outcome, limiting its usefulness. Concurrent Validity Concurrent validity assesses the relationship between test scores and a criterion measured at approximately the same time. For instance, a new depression screening scale might be administered alongside a clinical interview (the established criterion) on the same day. High correlation indicates the new test measures depression as effectively as the established method. Concurrent validity is useful when immediate verification is needed and is often used in early test development phases. However, concurrent validity doesn't guarantee predictive validity—just because two measures correlate now doesn't mean one will predict future outcomes. Ethical Practice in Psychological Testing Informed Consent Ethical testing practice begins with informed consent. Before administering any test, administrators must obtain voluntary, informed consent from the test-taker (or parents/guardians for minors). Consent documents should explain: The purpose of testing What procedures will be used Any potential risks or benefits How confidentiality will be protected The person's right to ask questions and decline participation without penalty Special considerations apply to vulnerable populations. Minors cannot provide legal consent; parents or guardians must. Individuals with diminished decision-making capacity may require additional safeguards. Non-English speakers need materials in their language. Documentation of informed consent is required for both ethical integrity and legal compliance. Confidentiality Test results are private information that must be protected from unauthorized disclosure. Only individuals with a legitimate professional need to know should access scores. This typically includes the person tested, parents (for minors), and the professionals directly involved in assessment and decision-making. Protecting confidentiality requires practical steps: Secure data storage using passwords, encryption, or locked filing systems Removing identifiers or anonymizing data when reporting results to groups Being cautious when discussing results in shared spaces Following legal regulations like HIPAA (in medical settings) or FERPA (in educational settings) Breaches of confidentiality cause real harm—students feel violated, patients distrust providers, and professional relationships suffer. Breaches also violate ethical codes and may have legal consequences. Cultural and Linguistic Appropriateness Tests must be appropriate for the test-taker's cultural background and language proficiency. A test normed only on English speakers may be unfair to bilingual students. A test developed in one culture may contain items reflecting different values, experiences, or knowledge in another culture. When tests are translated, they require linguistic validation (ensuring the translation accurately conveys meaning) and cultural adaptation (adjusting content to be culturally relevant and fair). Simply translating words is insufficient. Using culturally biased instruments can lead to inaccurate scores and unfair decisions—students may perform poorly not because they lack the ability or skill, but because the test is culturally unfamiliar. Test administrators should be trained to recognize and mitigate cultural bias. Appropriate norms are also culturally essential: norms should be derived from populations that resemble the test-taker's demographic group. Using inappropriate reference groups leads to misinterpretation. Clear Communication of Results Test results must be presented in language the examinee and relevant stakeholders can understand. Technical jargon should be minimized, and scores should be interpreted in context, emphasizing what they indicate and what they do not. Visual aids such as graphs or profile charts can enhance understanding. A feedback session should allow the test-taker to ask questions and discuss implications. Rather than simply stating a score ("You scored a 105 on the IQ test"), interpretation provides context ("Your score suggests strong overall intellectual ability, with particular strengths in reasoning"). Reports must avoid deterministic language. Never imply that test scores permanently determine a person's future or define their worth. Scores are one piece of information, not a final verdict. Using Scores Appropriately Test scores are not definitive judgments of a person's worth, potential, or character. They should be integrated with observations, interviews, work samples, and other relevant information. Decision-making must account for the test's limitations and confidence intervals—the range within which the true score likely falls. Overreliance on a single test can lead to misdiagnosis or inappropriate educational placement. A student might score low on a timed achievement test due to test anxiety, processing speed issues, or language barriers—not low ability. A single score cannot capture the full picture. Ethical practice requires ongoing evaluation of whether a test remains appropriate and continues to serve its intended purpose. Even well-validated tests should be periodically reexamined to ensure they remain fair and effective.

Flashcards

What is the systematic use of standardized instruments to assess aspects of a person’s mental life called?

Psychological testing

What four specific areas do psychological tests typically evaluate?

Abilities Personality traits Attitudes Psychological problems

What is the primary benefit of using standardized instruments in testing compared to other methods of gathering information?

It provides a structured way to compare information that would otherwise be difficult to compare.

In the context of testing, what does it mean for an instrument to be standardized?

All test takers receive the same items under the same conditions.

What is the goal of ensuring that all test takers receive the same set of items under identical conditions?

To ensure differences in scores reflect true differences in the construct being measured.

From what kind of samples are test norms derived to allow for comparison across individuals?

Large, representative samples

What specific demographic factors are commonly used to create norms to improve comparison accuracy?

Age, grade, gender, and culture

What do criteria or cut‑off scores indicate in psychological testing?

Levels of performance or severity

What are the two primary purposes of psychological tests in clinical settings?

Diagnosing mental‑health disorders Guiding treatment decisions

What is the benefit of performing repeated testing on a clinical patient?

To monitor progress over time

What is the primary purpose of an ability test?

To estimate how well someone can solve problems or learn new material.

What do achievement tests specifically measure?

Mastery of specific academic content

What stable patterns do personality inventories aim to assess?

Thinking, feeling, and behaving

Which specific personality inventory is used to evaluate psychopathology and personality dimensions?

The Minnesota Multiphasic Personality Inventory (MMPI)

Which five major personality traits are measured by the Big Five questionnaire?

Openness Conscientiousness Extraversion Agreeableness Neuroticism

What type of rating scale is usually used to record responses on personality inventories?

Likert‑type rating scales

When are neuropsychological batteries most commonly administered?

After brain injury or neurological illness

What do diagnostic scales specifically measure in a mental health context?

The severity of specific symptoms

How do computer‑adaptive tests determine the difficulty of the next item presented?

Based on the respondent’s previous answers.

What are the incorrect response options in a multiple-choice item called?

Distractors

What is the defining requirement for a respondent during a performance task?

Completing a physical or mental activity within a set time.

On what three factors are scores for performance tasks based?

Accuracy Speed Quality of performance

What term refers to the theoretical idea (e.g., intelligence or anxiety) that a test is built to measure?

Psychological construct

What is the purpose of quantification in the context of psychological constructs?

To convert qualitative responses into numerical scores.

How is reliability defined in psychological testing?

Consistency of test scores across repeated administrations or parallel forms.

What is the numerical range for reliability coefficients?

0 (no consistency) to 1 (perfect consistency)

What is the relationship between reliability and validity?

Reliability is a prerequisite for establishing validity.

How is test‑retest reliability assessed?

Administering the same test to the same group on two different occasions.

What is generally considered an acceptable test‑retest coefficient for psychological measures?

Above .70

What does internal consistency measure regarding test items?

How well items on a test measure the same underlying construct.

What is the most common statistic used to measure internal consistency?

Cronbach’s alpha

What does a Cronbach’s alpha above .95 typically suggest about a test?

The items may be redundant.

What method involves splitting a test in two and correlating the scores to estimate internal consistency?

Split‑half reliability

What does parallel‑forms reliability involve?

Two different versions of a test that measure the same construct.

What is the primary advantage of using parallel forms over test-retest methods?

It reduces practice effects.

What does inter‑rater reliability assess?

The agreement between different observers or scorers.

How is validity defined in the context of psychological tests?

The extent to which a test measures what it claims to measure.

What are the three main domains of validity evidence?

Content Construct Criterion‑related

What does content validity evaluate?

How well test items represent the entire domain of the construct.

What is the difference between convergent and discriminant validity?

Convergent validity shows strong relationships with similar tests, while discriminant validity shows weak relationships with different constructs.

What is the difference between predictive and concurrent validity?

Predictive validity forecasts future performance, while concurrent validity relates to a criterion measured at the same time.

What five elements should be included in an informed consent document for testing?

Purpose of the test Procedures Risks Benefits Confidentiality

Why is it important to use culturally and linguistically appropriate tests?

To avoid inaccurate scores and unfair decisions resulting from cultural bias.

What should test reports avoid using to ensure scores are not seen as definitive judgments?

Deterministic language

Quiz

Introduction to Psychological Testing Quiz Question 1: In a classroom setting, psychological tests are primarily used to:

Identify academic strengths and weaknesses for instructional planning (correct)
Replace teachers in delivering curriculum
Diagnose personality disorders
Determine students' future career choices without other data

Introduction to Psychological Testing Quiz Question 2: Which factor improves the accuracy of normative comparisons?

Age, grade, gender, and cultural norms (correct)
Randomly assigning norms from any population
Using only the test taker’s self‑report
Ignoring demographic information

Introduction to Psychological Testing Quiz Question 3: Ability tests are designed to estimate a person’s capacity to:

Solve problems or learn new material (correct)
Recall specific factual information only
Demonstrate motor skills exclusively
Express emotions through art

Introduction to Psychological Testing Quiz Question 4: Neuropsychological batteries are most commonly used after:

Brain injury or neurological illness (correct)
Graduation from college
Receiving a physical fitness test
Participating in a musical performance

Introduction to Psychological Testing Quiz Question 5: What is a defining feature of multiple‑choice items?

A stem with several response options, only one correct (correct)
Only true‑false responses
Open‑ended essay responses
Rating scales from “strongly disagree” to “strongly agree”

Introduction to Psychological Testing Quiz Question 6: True‑false statements are particularly vulnerable to which issue?

Guessing (correct)
Time‑consuming administration
Complex scoring algorithms
Need for extensive written justification

Introduction to Psychological Testing Quiz Question 7: Performance tasks assess abilities that are best measured by:

Completing a physical or mental activity within set time (correct)
Selecting the best answer from multiple options
Writing an essay on a topic
Choosing true or false for statements

Introduction to Psychological Testing Quiz Question 8: Before developing test items, a construct must be:

Clearly defined in theoretical terms (correct)
Randomly selected from everyday language
Assumed to be universally understood without definition
Based solely on cultural myths

Introduction to Psychological Testing Quiz Question 9: Test‑retest reliability is assessed by:

Correlating scores from two administrations of the same test (correct)
Comparing scores from two different tests measuring different constructs
Analyzing the internal consistency of items
Evaluating rater agreement

Introduction to Psychological Testing Quiz Question 10: Which statistic is most commonly used to assess internal consistency?

Cronbach’s alpha (correct)
Correlation coefficient between two forms
Kappa statistic
Factor loading

Introduction to Psychological Testing Quiz Question 11: Parallel‑forms reliability helps reduce what effect?

Practice effects (correct)
Memory decay
Response bias
Social desirability

Introduction to Psychological Testing Quiz Question 12: Validity is best described as the extent to which a test:

Measures what it claims to measure (correct)
Produces consistent scores over time
Has items that are all identical
Is administered quickly

Introduction to Psychological Testing Quiz Question 13: Predictive validity is demonstrated when a test:

Accurately forecasts future performance (correct)
Matches current criterion measures
Shows high internal consistency
Is based on expert opinion alone

Introduction to Psychological Testing Quiz Question 14: Which questionnaire assesses the five major personality traits: openness, conscientiousness, extraversion, agreeableness, and neuroticism?

The Big Five questionnaire (correct)
Minnesota Multiphasic Personality Inventory
Beck Depression Inventory
Rorschach Inkblot Test

Introduction to Psychological Testing Quiz Question 15: One benefit of high reliability in a psychological test is that it:

Reduces measurement error and increases confidence in scores (correct)
Eliminates the need for validity evidence
Guarantees perfect predictive power
Allows the test to be administered without standardized procedures

Introduction to Psychological Testing Quiz Question 16: Before administering a psychological test, what must test administrators obtain?

Voluntary informed consent (correct)
A notarized legal affidavit
A signed employment contract
A letter of recommendation

Introduction to Psychological Testing Quiz Question 17: Which statistic is most commonly used to quantify inter‑rater reliability?

Cohen’s kappa (correct)
Pearson’s r
Cronbach’s alpha
Spearman’s rho

In a classroom setting, psychological tests are primarily used to:

1 of 17

Key Concepts

Testing Fundamentals

Psychological testing

Standardization

Norms

Reliability

Validity

Assessment Tools

Personality inventory

Neuropsychological battery

Diagnostic and symptom scale

Computer‑adaptive testing

Ethical Considerations

Informed consent (psychological testing)

Definitions

Psychological testing

The systematic use of standardized instruments to assess mental abilities, traits, attitudes, or problems.

Standardization

The process of administering identical test items under uniform conditions to ensure score differences reflect true construct differences.

Norms

Representative reference data derived from large samples that allow individual test scores to be compared across populations.

Reliability

The consistency of test scores across administrations, forms, or raters, indicating the degree of measurement error.

Validity

The extent to which a test measures what it claims to measure and yields useful, accurate interpretations for its intended purpose.

Personality inventory

A questionnaire that assesses stable patterns of thinking, feeling, and behaving, often using Likert‑type scales.

Neuropsychological battery

A collection of tests evaluating cognitive functions such as memory, attention, and executive control, typically after brain injury or illness.

Diagnostic and symptom scale

A self‑report instrument that quantifies the severity of specific mental‑health symptoms and monitors treatment progress.

Computer‑adaptive testing

A testing format that dynamically adjusts item difficulty based on the respondent’s previous answers to improve precision.

Informed consent (psychological testing)

The ethical requirement that test administrators obtain voluntary, knowledgeable agreement from participants before testing.