Home Others Age-period cohort analysis

Age-period cohort analysis




Web pages




This page briefly describes the age-period cohort analysis and provides an annotated list of resources.


Age period cohort effect

Age cohort analysis (APC) plays an important role in understanding time-varying elements in epidemiology. In particular, the APC analysis distinguishes three types of temporally varying phenomena: age effects, period effects and cohort effects. (1)
Age effects are variations associated with individual biological and social aging processes. (2) They include physiological changes and the accumulation of social experiences associated with aging, but not associated with the time period or birth cohort to which a person belongs. In epidemiological studies, age effects are usually characterized by different incidence rates between the age groups.
Periodeneffekte result from external factors that affect all age groups equally at a certain calendar point in time. It can result from a number of environmental, social and economic factors, e.g. War, famine, economic crisis. Methodological changes to the definition of results, classifications or the method of data collection could also lead to period effects in the data. (3)
Cohort effects are variations resulting from the unique experience / exposure of a group of subjects (cohort) as they move over time. The most commonly defined group in epidemiology is the birth cohort based on the year of birth and is described as the difference in risk of a health outcome based on the year of birth. A cohort effect occurs when the distribution of diseases resulting from exposure affects age groups differently. In epidemiology, a cohort effect is conceptualized as an interaction or effect modification due to a period effect that is experienced differently due to age-specific exposure or susceptibility to this event or cause. (4)
Identifikationsproblem in APC : The APC analysis aims to describe and estimate the independent influence of age, period and cohort on the examined health outcome. The various strategies used aim to break down the variance into the unique components attributable to age, period and cohort effects (4). However, there is a major obstacle to independently estimating age, period, and cohort effects by modeling the data, known as the identification problem in APC. This is due to the exact linear relationship between age, period and cohort: period - age = cohort; ie the cohort (year of birth) can be determined from the calendar year and age (5). The presence of perfectly collinear predictors (age, period, and cohort) in a regression model creates a singular, non-identifiable design matrix from which it is statistically impossible to estimate unambiguous estimates for the three effects. (5)

Traditional solutions to the APC identification problem

GLIM Estimator for Bounded Coefficients (CGLIM)
Proxy variable approach
Nonlinear parametric (algebraic) transformation approach
Intrinsic estimation method
Median Polish analysis
The epidemiological definition of a cohort effect as the interaction between age and period is the basis for the Polish median analysis. It extracts the non-linearity of age and period effects and divides the non-linear variance into cohort effect and random error (4). In other words, this approach evaluates the interaction between age and period beyond what one would expect from their additive influences.

Guide to Estimating APC Models (Based on Yang and Land) (5):

  1. Descriptive data analysis through graphical representation of data is the first step of an APC analysis. This helps in the qualitative assessment of patterns of time-based variation

  2. Rule out that the data can be explained by a single or two factor model of age, period, and cohort. A goodness of fit statistic is often used to compare between reduced log linear models: three separate models for age, period and cohort effects; and three two-factor models, one for each of three possible pairs of effects, namely AP, AC and PC effect models. All of these models are then compared to a full APC model in which all three factors are controlled at the same time. Two most commonly used selection criteria for penalized likelihood models are used, namely the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). evaluate the model, since likelihood ratio tests prefer models with a larger number of parameters. BIC and AIC both adjust the influence of model dimensions on model deviations.

  3. If the descriptive analyzes show that all three of the A, P and C dimensions are not functional, the analysis can be concluded with a reduced model that leaves out the non-operational dimension and there is no problem with identification.

  4. However, if these analyzes indicate that all three dimensions are at work, then use one of the specific methods of APC analysis

Median Polish Analysis Practical Example (3)

Table (3) shows the identification problem in which the three components (age, period and cohort) are perfectly correlated. To identify the cohorts we just need to know the period and age group: we subtract the early age group from the upper and lower period boundaries (z denote the cohort interval as 1940-1944). (9) The colored diagonal fields show the rate for each cohort with increasing age. Contingency tables cannot estimate the mutually exclusive cohort risk due to overlapping cohorts. This convention can misclassify some individuals, but the main purpose of an age-period cohort analysis is to estimate general trends in a cohort-specific form rather than an accurate quantification of a real causal risk. The overlapping cohort reminds us of an overinterpretation of estimates. We are also limited by a lack of data. For example, we only have one data point for the youngest population group (those ages 10-14 over the period 2000-2004). Using this table, we can perform an initial graphic representation with a line diagram in Microsoft Excel.

how to change your mind book

The two charts were created using the line charts in Microsoft Excel. To display both graphs, we simply rearranged the data using the Switch Row / Column function. These two graphs allow us to look for any pattern in the data. The limitation is that each result can be a mixture of two or more effects.

Median polish removes the additive effects of age and period by iteratively subtracting the median value of each row and column. (6) The first step in median polishing is to calculate medians for each row, see Table 2:

The next step is to subtract the row median from each value in the row, for example in row one we are going to subtract 0.610 minus 0.790 = -0.18. In the second row (15-19 years old) we used the same procedure 6.330-5.770 = 0.56 and then for each cell in the table. This created a table with new values, see Table 3:

The next step is to calculate the column median for the new values ​​and then subtract the column median from each cell in the column, for example -0.18 - 19.08 = -19.26. Now that we have created the new table with the values ​​from subtracting each median column for each cell, we calculate the row median (third iteration). These iterations ultimately produce zero row and column medians. For this example, 6 iterations were required to produce zero row and column medians, see Table 4:

Table 4 shows the residual values ​​after 5 iterations. These residuals represent the coefficients without the additive effect of age and period effects. Note that the data for the age groups 75-79 and 80-84 are missing values ​​between 1910 and 1939. If we replace the missing values ​​for zero rates, the calculated residuals will be skewed. The complete procedure was done in Microsoft Excel. To check that these residuals were correct, we created a new table with the product of subtracting the residual value from the original set of values ​​in Table 1. The product of the subtractions is used to create a line chart. This line graph allows us to check the validity of the residuals and we expect lines to be perfectly parallel. Since we subtract the residuals that represent cohort effects from the original values, we are evaluating any age or period effect that is free of cohort effects. See graphics 3 and 4:

how long does radiation last after a nuclear bomb

The Median Polish process is available in R, a freely available software (8). See the next syntax:

mpdata<- read.csv(C:/Users/mydocs/suicidemp.csv, header=FALSE, stringsAsFactors=FALSE)
Line names (mpdata)<- c(10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84)
Column names (mpdata)<- c(1910-1914, 1915-1919, 1920-1924, 1925-1929, 1930-1934, 1935-1939, 1940-1944, 1945-1949, 1950-1954, 1955-1959, 1960-1964, 1965-1969, 1970-1974, 1975-1979, 1980-1984, 1985-1989, 1990-1994, 1995-1999, 2000-2004)
med.p<- medpolish(mpdata, na.rm = TRUE)

Median polish results can be obtained without any transformation of the rates, but using the log transformation of the rates prior to the median polish procedure results in an evaluation of the interaction on the multiplicative scale (or log additive effect). We repeated our Median Polish procedure using the logarithmic transformation of suicide rates. In order to generate logarithmically transformed residuals of the original table with R software, we have created a new function that replaces the rates for logarithmically transformed rates (note the bold font in the syntax):

medpolish2<- function (x, eps = 0.01, maxiter = 10L, trace.iter = TRUE, na.rm = FALSE)
With<- as.matrix( log(x) )
No<- nrow(z)
nc<- ncol(z)
t<- 0
r<- numeric(nr)
c<- numeric(nc)
Oldsum<- 0
for (Iter in 1L:maxiter) abs(newsum – oldsum)Interruption
Oldsum<- newsum
cat(iter, : , newsum, , sep = )
if (converges) {
cat(Finale: , newsum, , sep = )
else warning (sprintf (ngettext (maxiter, medpolish () did not converge in% d iterations, medpolish () did not converge in% d iterations), maxiter), domain = NA)
Name (s)<- rownames(z)
Name (c)<- colnames(z)
Years<- list(overall = t, row = r, col = c, residuals = z, name = deparse(substitute(x)))
Class (years)<- medpolish

med.p2<- medpolish2(mpdata, na.rm = TRUE)

The data is saved as a comma-separated file (.csv), a simple format that can be read in R. Note the command for the median polish, the option for missing data is activated, otherwise the procedure reports an error. Both sets of residuals created with Excel and R are the same.
We reshaped the data by cohort and plotted the residuals against the cohort category. See next table:

We calculated the mean for each cohort and then used these log transformed residuals to plot a chart by cohort. This diagram helps to evaluate the distribution of the residuals, with any significant deviation from zero indicating a strong cohort effect for this cohort, see next graph:

STATA code for plotting residues:

Rate Median Political Residual Chart, BOOK EXAMPLE (log scale)
rename var2 var1
rename var16 var15
reshape long var, i (cohort) j (number)
Define label Cohort 1 1830-1834 2 1835-1839… 32 1985-1989 33 1990-1994
rename var rest
twoway(scatter Restkohorte, msize(vsmall)) (verbundene mittlere Kohorte, msize(vsmall) msymbol(dreieck) lwidth(thin) lpattern(solid)), ytitle(Median Polish Residuals) yscale (range(-2 2)) ylabel (#7) xtitle(Kohorte) xlabel(#33, label labsize(small) angle(vertikal) labgap(minuscule) valuelabel) title(, size(medsmall) ring(0)) legend(size(small))

These residuals help us assess the extent of the cohort effect using a linear regression of the residual values ​​by cohort. We choose the years 1910-1914 as the reference cohort. Similar to Figure 6, it appears that the cohorts born after 1950 had a statistically significantly higher risk of suicide than the cohort from 1910-1014. The coefficients calculated with the linear regression are on a logarithmic scale, to estimate the rate ratios we used the exponent function for each coefficient [exp (x)].

STATA code for the regression of suicide rate residuals.

Character cohort [omit] 17
xi: Regress Rest i.Cohorte

  1. Yang Y, Schulhofer-Wohl S, Fu WJ, Land KC. The Intrinsic Estimator for Age-Period Cohort Analysis: What It Is and How to Use It1. American Journal of Sociology 2008; 113 (6): 1697-736.

  2. Reither EN, Hauser RM, Yang Y. Are birth cohorts important? Age-period cohort analyzes of the obesity epidemic in the United States. Social Sciences and Medicine 2009; 69 (10): 1439-48.

  3. Keyes KM, Li G. Age-Period Cohort Modeling. Injury research: Springer, 2012: 409-26.

  4. Keyes KM, Utz RL, Robinson W, Li G. What is a cohort effect? Comparing Three Statistical Methods for Modeling Cohort Effects in Obesity Prevalence in the United States, 1971-2006. Soc Sci Med 2010; 70 (7): 1100-8

  5. Yang, Yang and Kenneth C. Land. Age-period cohort analysis: new models, methods and empirical applications. CRC press, 2013

  6. Mason, Karen Oppenheim, et al. Some methodological problems in cohort analysis of archive data. American Sociological Review (1973): 242-258

  7. O’Brien, R. M. 2000. Age-Period Cohort Characteristic Models. Social Science Research 29: 123-139

  8. http://www.r-project.org/

  9. Keyes KM, Li G. A multiphase method for estimating cohort effects in contingency table data for age periods. Ann Epidemiol 2010; 20: 779-785.


  • Yang, Yang and Kenneth C. Land. Age-period cohort analysis: new models, methods and empirical applications. CRC press, 2013.

  • Keyes, Katherine M., and Guohua Li. Age-Period Cohort Modeling. Injury Research. Springer USA, 2012. 409-426.

    health administration masters programs
  • Glenn, Norval D., Ed. Cohort Analysis. vol. 5. Sage, 2005

  • Hobcraft, John, Jane Menken, and Samuel Preston. Age, Period, and Cohort Effects in Demography: An Overview. Springer New York, 1985.

Methodical articles

  • Ryder, Norman B. The Cohort as a Concept in Research into Social Change. American Sociological Review (1965): 843-861

  • Mason, Karen Oppenheim, et al. Some methodological problems in cohort analysis of archive data. American Sociological Review (1973): 242-258

  • Mason, William M., and Stephen E. Fienberg. Cohort Analysis in Social Research: Beyond the Identification Problem. (1985)

  • Yang, Yang et al. The Intrinsic Estimator for Age-Period Cohort Analysis: What It Is and How to Use It1. American Journal of Sociology 113.6 (2008): 1697-1736.

  • Keyes, Katherine M., et al. What is a cohort effect? Comparison of Three Statistical Methods for Modeling Cohort Effects in Obesity Prevalence in the United States, 1971–2006. Social Sciences and Medicine 70.7 (2010): 1100-1108.

  • Keyes, K. & Li, G., Age-Period Cohort Modeling. In Li, G. & Baker, S. (Eds.), Injury Research: Theories, Methods, and Approaches. Springer, Chapter 22, pages 409-426. New York, 2012

Application item

  • Keyes, Katherine M., et al. Age, period, and cohort effects in mental distress in the United States and Canada. American Journal of Epidemiology (2014): kwu029.

Web pages


Interesting Articles