Item Response Theory (IRT), also known as Latent Response Theory, refers to a family of mathematical models that attempt to determine the relationship between latent traits (unobservable trait or attribute) and their manifestations (i.e. observed outcomes, Reactions or performance). They create a link between the properties of elements on an instrument, the people who respond to those elements, and the underlying measured characteristic. The IRT assumes that the latent construct (e.g. stress, knowledge, attitudes) and items of a measure are organized in an unobservable continuum. Hence, their main purpose focuses on determining the individual's position on this continuum.
Classical test theory
Classical test theory [Spearman, 1904, Novick, 1966] focuses on the same goal and prior to the conceptualization of IRT; it has been (and still is) used to predict a person's latent trait based on an overall observed score on an instrument. In the CTT, the true score predicts the level of the latent variables and the observed score. The error is normally distributed with a mean of 0 and an SD of 1.
Item response theory vs. classic test theory
1) Monotony - The assumption suggests that the higher the trait level, the greater the likelihood of a correct response 2) One-dimensionality - The model assumes that a dominant latent trait is measured and that this trait is the driving force behind the observed responses for each Item in the measure 3) Local independence - answers to the individual items in a test are independent of one another at a certain skill level. 4) Invariance - We are allowed to estimate the item parameters from any position on the item response curve. Accordingly, we can estimate the parameters of an item from each group of subjects who answered the item.
If the assumptions are correct, the differences in observing correct answers between respondents are due to the variation in their latent trait.
IRT models predict respondents' responses to the items of an instrument based on their position in the latent trait continuum and the properties of the items, also known as parameters. The item response function characterizes this association. The underlying assumption is that any answer to an item on an instrument gives some bias to the level of the individual's latent quality or ability. The person's ability (θ), in simple terms, is the likelihood of confirming the correct answer for that item. The higher the person's ability, the higher the chance of a correct answer. This relationship can be represented graphically and is called the article characteristic. As shown in the picture, the curve is S-shaped (Sigmoid / Ogive). In addition, the likelihood of confirming a correct answer increases monotonously with increasing ability of the respondent. It should be noted that the capability (θ) theoretically ranges from -∞ to + ∞, but in applications it is typically from -3 to +3.
As people's abilities change, their position in the continuum of the latent construct changes and is determined by the sample of respondents and the item parameters. An item must be sensitive enough to evaluate the respondents within the suggested non-observable continuum.
Item Difficulty (bi) is the parameter that determines how the item behaves along the skill scale. It is determined at the point of mean probability, i.e. H. the ability to have 50% of respondents in favor of the correct answer. On an item curve, items that are difficult to confirm are shifted to the right on the scale, which indicates the higher ability of the respondents who correctly advocate it, while those that are simpler are shifted more to the left of the ability scale.
Item Discrimination (ai) determines the rate at which the probability of advocating a correct item changes at given skill levels. This parameter is imperative to distinguish between individuals who have similar levels of the latent construct of interest. The ultimate purpose for developing a precise measurement is to include items with high discrimination in order to be able to map individuals along the continuum of the latent trait. On the other hand, researchers should exercise caution when discriminating against an item because the likelihood of confirming the correct answer should not decrease as the respondent's skills increase. A revision of these points should therefore be carried out. Theoretically, the scale for item discrimination ranges from -∞ to + ∞; and usually does not exceed 2; therefore it is realistically between (0.2)
Guessing (ci) Guessing objects is the third parameter that defines guessing an object. It reduces the likelihood of confirming the correct answer as the skill approaches -∞.
Population invariance Put simply, the item parameters behave similarly in different populations. This is not the case if the CTT is followed in the measurement. Since the unit of analysis is the item in the IRT, the position of the item (difficulty) can be standardized (linearly transformed) across the populations and thus easily compared. An important note to add is that even after the linear transformation, the parameter estimates derived from two samples are not identical. The invariance, as the name suggests, relates to the population invariance and therefore only applies to the item population parameters.
IRT Model Model
One-dimensional models One-dimensional models predict the ability of objects to measure a dominant latent feature.
The dichotomous IRT models are used when the responses to the items of a measure are dichotomous (i.e. 0.1).
The 1-parameter logistics model
The model is the simplest form of IRT models. It consists of a parameter that describes the latent property (ability - θ) of the person who reacts to the items, and another parameter for the item (difficulty). The following equation represents its mathematical form:
The model represents the item response function for the 1-parameter logistics model, which predicts the probability of a correct answer given the ability and difficulty of the respondent's item. In the 1-PL model, the discrimination parameter is specified for all items, and accordingly all item characteristic curves that correspond to the various items in the key figure run parallel along the ability scale. The figure shows 5 items, the rightmost one being the most difficult and would likely be properly endorsed by those with a higher skill level.
§ It is the sum of the probabilities of confirming the correct answer for all items of the measure and therefore estimates the expected test result.
§ In this figure, the red line shows the joint probability of all 5 items (black)
The article information function
Shows you the amount of information that each element provides. It is calculated by multiplying the probability of a correct answer multiplied by the probability of an incorrect answer.
Note that the amount of information at a given skill level is the reciprocal of its variance. Therefore, the greater the amount of information provided by the item, the greater the accuracy of the measurement. When item information is plotted against ability, a revealing graph shows the amount of information provided by the item. Elements that are measured more accurately provide more information and are graphically displayed longer and narrower than their counterparts, which provide less information. The apex of the curve corresponds to the value of bi - the ability at the point of mean probability. The maximum amount of information would be given if the probability of a correct or incorrect answer is equal, i.e. H. 50%. Items are most informative among respondents who represent the entire latent continuum, and especially among those who have a 50% chance of answering in either direction.
The assumption of local independence states that item responses should be independent and only associated through ability. This enables us to estimate the likelihood function of the individual response pattern for the managed measure by multiplying the item response probabilities. Next, the estimate of the maximum likelihood of the ability is calculated through an iterative process. Estimating the maximum likelihood simply gives us the expected scores for each person.
The Rasch model vs. 1-parameter logistics models
The 2-parameter logistics model
The differentiating parameter may vary between the elements. From now on, the ICC of the various elements can overlap and have different slopes. The steeper the slope, the higher the discrimination against the item, as it can recognize subtle differences in the abilities of the respondents.
The article information function
As with the 1-PL model, the information is calculated as the product between the probability of a correct and an incorrect answer. However, the product is multiplied by the square of the discrimination parameter. The implication is that the larger the distinguishing parameter, the more information is provided by the element. Since the differentiating factor may vary between the elements, the diagrams of the element information function can also look different.
In the 2-PL model, the assumption of local independence still applies and the maximum likelihood estimate of the capability is used. Although the probabilities for the response patterns are still summed up, they are now weighted with the item discrimination factor for each response. Their likelihood functions can therefore differ from one another and peak at different levels of θ.
The 3-parameter logistics model
The model predicts the probability of a correct answer in the same way as the 1-PL and 2-PL models, but is controlled by a third parameter called the rateing parameter (also known as the pseudo-chance parameter ), which reduces the likelihood of advocating a correct answer as the respondent's ability approaches -∞. When respondents answer an item by guessing, the amount of information provided by that item decreases and the function of the information element reaches a lower level compared to other functions. In addition, the degree of difficulty is no longer delimited with the mean probability. Questions answered by guessing indicate that the respondent's ability is less than their difficulty.
One way of selecting the appropriate model is to evaluate the relative fit of the model based on its information criteria. AIC estimates are compared and the model with the lower AIC is selected. Alternatively, we can use the chi-square (deviation) and measure the change in the 2 * loglikelihood ratio. Since it is a chi-square distribution, we can estimate whether the two models differ statistically from one another.
Other IRT models
Include models that process polytomic data, e.g. B. the graduated response model and the partial credit model. These models predict the expected score for each answer category. On the other hand, other IRT models such as the nominal response models predict the expected scores of people who answer items with disordered answer categories (e.g. yes, no, maybe). In this brief summary, we have focused on one-dimensional IRT models that deal with the measurement of a latent feature, however these models would not be suitable for the measurement of more than one latent construct or feature. In the latter case, the use of multi-dimensional IRT models is recommended. For more information on these models, see the resource list below.
IRT models can be used successfully in many settings that use assessments (education, psychology, health research, etc.). It can also be used to design and refine scales / measures by including high discrimination items that increase the precision of the measurement tool and reduce the burden of answering long questionnaires. Since the unit of analysis of the IRT model is the item, they can be used to compare items from different measures, provided that they measure the same latent construct. In addition, they can be used in the differential item function to assess why items that are being calibrated and tested still behave differently between groups. This can lead to research into the causative agents of the different reactions being investigated and associated with group characteristics. Finally, they can be used in computational adaptive tests.
Textbooks & Chapters
Hambleton, R.K. & Swaminathan, H. (1985). Principles and Applications of Item Response Theory. Boston, MA: Kluwer-Nijhoff Publishing. Available here and here
Embretson, Susan E. and Steven P. Travel. Item-Response-Theory. Psychology Press, 2013. Available here
Van der Linden, W.J. & Hambleton, R.K. (Ed.). (1997). Handbook of Modern Item Response Theory. New York, NY: Springer. Available here
These three books (Principles and Applications of Item Response Theory, Item Response Theory and Handbook of Modern Item Response Theory) convey the basic principles of the IRT models to the reader. However, they do not contain the latest updates and IRT software packages.
DeMars C. Item Response Theory. Cary, NC, USA: Oxford University Press, USA; 2010. Available here and here
On 138 pages, DeMars C. has succeeded in creating a concise, yet extremely informative resource that does not fail to demystify the toughest IRT concepts. The book is an introductory book that looks at the assumptions, parameters, and requirements of the IRT and then explains how to describe results in reports and how researchers should consider the context of the test execution, the respondent population, and the effective use of scores.
Ayala RJd. Theory and Practice of Item Response Theory. (2009). Reference and Research Book News, 24 (2). Available here
The Theory and Practice of Item Response Theory is an applied and practice-oriented book. It provides a thorough explanation of both unideminsional and multidimensional IRT models, highlighting the conceptual evolution and assumptions of each model. The underlying principles of the model are then demonstrated using illustrative examples.
Li Y, Baron J. Behavioral Research Data Analysis with R: Springer New York; 2012 (Chapter 8)
The book was developed with behavioral practitioners in mind. It helps you navigate statistical methods with R. Chapter 8, focuses on item response theory, and offers a number of comments and a wealth of commented examples.
A Visual Guide to Item Response Theory by Ivailo Partchev, Friedrich Schiller University Jena (2004)
As the name suggests, the guide provides a visual representation of the basic concepts of the IRT. Java applets penetrate the text and make it easy to understand as these basic concepts are explained. Excellent resource and I would recommend reading it a couple of times and practicing on the applets!
Baker, Frank (2001). The basics of item response theory. ERIC Clearinghouse on Assessment and Evaluation, University of Maryland, College Park, MD
A unique book that focuses on providing the reader with the joy of learning the fundamentals of IRT theory without delving into math complexities.
Thissen, D., & Wainer, H. (Eds.). (2001). Test scoring. Mahwah, New Jersey: Lawrence Erlbaum. Available here and here
Herr, F. M. (1980). Applications of item response theory to practical test problems. Hillsdale, New Jersey: Lawrence Erlbaum. Available here
Baker, F. B. & Kim, S. H. (2004). Item Response Theory: Parameter estimation techniques. New York, NY: Marcel Dekker. Available here and here
Mr., F.M. (1983). Undistorted estimators of ability parameters, their variance and their parallel form reliability. Psychometrika, 48, 233-245
Mr., F.M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement 23 (2): 157-162
Stone approx. Restoring Marginal Maximum Likelihood Estimates in the Two-Parameter Logistics Response Model: An Assessment by MULTILOG. Applied psychological measurement. 1992; 16 (1): 1-16
Green, D.R., Yen, W.M. & Burket, G.R. (1989). Experience in the application of the item response theory in the test setup. Applied Measurement in Education, 2 (4), 297-312
Da Rocha NS, Chachamovich E Fau - de Almeida Fleck MP, de Almeida Fleck MP Fau - Tennant A, Tennant A: An introduction to Rasch analysis for psychiatric practice and research. (1879-1379)
Cook KF, O’Malley KJ, Roddey TS. Dynamic Health Outcome Assessment: Time To Let The CAT Out Of The Bag? Health research. 2005; 40 (5 Pt 2): 1694-711
Edwards MC. An introduction to item response theory using the need for cognition scale. Compass for social and personality psychology. 2009; 3 (4): 507-29
Choi SW, Swartz RJ. Comparison of the CAT article selection criteria for polytome articles. 2009 (0146-6216 (print)).
Rizopoulos, D. (2006). ltm: An R package for modeling latent variables and item response theory analysis. Journal of Statistics Software, 17 (5). 1-25
The main goal of the article is to introduce the ltm package in R, which is crucial in customizing IRT models. The ltm package focuses on both dichotomous and polytomous data. The paper provides illustrations with real world data samples from the Law School Admission Test (LSAT) and from the environmental section of the British Social Attitudes Survey from 1990.
For the full list, please click the link below: http://www.umass.edu/remp/software/CEA-652.ZH-IRTSoftware.pdf
Youtube tutorials (Very useful and informative)
Courses offered at the Mailman School of Public Health
P8417 - Selected Measurement Issues
P8158 - Modeling Latent Variables and Structural Equations for Health Sciences
Upcoming online courses and workshops
Past courses and materials
ICPSR Summer Workshop 9. Juli 2012 – 13. Juli 2012. Dr. Jonathan Templin (Associate Professor am Department of Psychology and Research in Education – Kansas University)
This link offers workshops and materials led by Dr. Jonathan Templin (Associate Professor at the Department of Psychology and Research in Education - Kansas University) for the years 2007, 2011 & 2012