Latent class analysis: a method for capturing heterogeneity.

Author:Rosato, Nancy Scotto

Social work researchers often use variable-centered approaches such as regression and factor analysis. However, these methods do not capture important aspects of relationships that are often imbedded in the heterogeneity of samples. Latent class analysis (LCA) is one of several person-centered approaches that can capture heterogeneity within and between groups. This method is illustrated in the present study, in which LCA is used to explicate differences in symptomatology in a nonclinical, national representative sample of youths. Data (N= 14,738) from the National Longitudinal Study of Adolescent Health were analyzed using externalizing and internalizing behavioral constructs and then validated against a number of sociodemographic characteristics and behavior outcomes typically associated with type and severity of symptomatology. Findings revealed important differences within the externalizing symptomatology construct and class differences across racial and ethnic groups, gender, age categories, and several behavior outcomes. Research and clinical implications on the importance of modeling heterogeneity using a person-centered approach are discussed.

KEY WORDS: Add Health; latent class analysis; mixture modeling; person-centered analysis


Attention to the variability of human experience is fundamental to social work research and practice. Issues such as differences in prevalence, treatment effects, coping strategies, and normal within-group variations permeate both practice and research agendas. In addition, social work is often concerned with racial and ethnic differences, sociodemographic characteristics, and other variables that may influence or modify focal study relationships (Kataoka, Zhang, & Wells, 2002). Thus, capturing identifiable differences in subpopulations is an important area of social work inquiry.

Traditionally, much research, including protocols and evidence-based practice, has been based on variable-oriented methods that capture information about relationships between the variables of interest for the overall sample. In contrast, person-oriented methods capture information at the personal level, enabling researchers to distinguish patterns of characteristics in subgroups (Nurius & Macy, 2008). Person-oriented methods, such as latent class analysis (LCA), enable the researcher to identify important intraindividual and interindividual differences and thus model distinct configurations of heterogeneity within a given sample. Although traditional variable-level studies contain valuable information, they have also been criticized because they obscure diversity and foster the misleading and over-generalized conclusion that study findings represent the overall sample (von Eye & Bergman, 2003). A comment by Bogat, Levendosky, and von Eye (2005) illustrates this obfuscation: "[R]esearchers often write about these analyses 'as it" they say something about individuals, but they are really statements about variables" (p. 50).

The importance of significant heterogeneity within subsets of populations has been noted within the larger social sciences (Costello, Mostillo, Erkanli, Keeler, & Angold, 2003). Inadequate attention to the heterogeneity inherent in the complexity of human social activity, such as the variations in symptom manifestations, or the reliance on categorical-based assessments to obtain a particular diagnosis by dichotomizing symptomatology as either being present or not (Krueger & Piasecki, 2002) has resulted in a number of important phenomena left largely unexplored.

LCA comes under the rubric of structural equation modeling and is a type of person-centered analysis that uses finite mixture modeling to empirically determine whether interrelationships exist among observed variables that explain the underlying (that is, latent) phenomena (McCutcheon, 1987). Latent variables are statistically inferred from the direct measures, as in factor analysis. LCA classifies or clusters observed occurrences and the patterns among them. This clustering or classifying of groups is based on response patterns. The specific goal of LCA is to identify the smallest number of latent classes that describe the associations among a set of observed indicators using their posterior probabilities (Clogg, 1995).

Finite mixture modeling captures the unobserved heterogeneity, which has a direct physical interpretation and is a latent variable. There is a mixture distribution that is heterogeneous across the sample but homogeneous within subsamples. Given the sample heterogeneity, the variables of interest have a different probability distribution within each subgroup. Computations are executed using maximum likelihood estimation, which is considered one of the most robust methods of estimation. Maximum likelihood estimation generates a probabilistic model that makes the data "most likely" for a given distribution that will best describe the data (Fisher, 1922). For example, in our study, the finite mixture model has a distribution of diagnostic criteria (that is, externalizing and internalizing behaviors), and we estimate the probability distribution of behaviors within these internalizing and externalizing categories. Estimates of class probabilities are given for each individual. A categorical latent variable is used to represent classes that correspond to a subpopulation that has its own set of parameters. The analysis adds classes until the model fits the data. In sum, LCA uses maximum likelihood estimation and, through a series of iterations, computes probabilistic models to determine subgroups. This allows for the identification of groups and explication of the items that make the groups distinctive and that show the prevalence and size of the subgroups (for a full explanation, see Bollen & Long, 1993; McCutcheon, 1987; Muthen & Muthen, 2000).

In contrast, traditional variable-oriented methods operate by partitioning variance between the dependent variable and changes in the independent variables. Findings of variable-oriented methods characterize the overall sample, whereas person-centered methods model distinct configurations of heterogeneity within a sample (Nurius & Macy, 2008).

LCA has been used in typology. For example, researchers have used LCA to examine whether depressive symptoms and anxiety symptoms could be categorically classified within certain populations (Kreuter, Yah, & Tourangeau, 2008; Sullivan, Kessler, & Kendler, 1998; van Lang, Ferdinand, Ormel, & Verhulst, 2006). LCA also extends to outcome research. It can be used to investigate subtypes, such as those that might exist within a group of individuals diagnosed with a depression disorder. Differences in these subtypes might affect treatment outcome. Therefore, a social work researcher who is attempting to assess the effect of a particular treatment on individuals with depression would be able to examine issues of heterogeneity that might be importantly involved. Such within-group differences (for example, being more susceptible to depression because of family background) may be significantly involved in treatment choices and the identification of successful treatment outcomes.

The present study used LCA to derive latent classes for both internalizing and externalizing constructs with the goal of capturing within- and between-group differences. Although internalizing and externalizing behavior is often accessed in clinical settings, we could find no previous studies that have examined the heterogeneity of externalizing and internalizing constructs using LCA.

The specific aim of this study was to examine significant variation in patterns of responses to these internalizing and externalizing subscales. Research has shown that late adolescence is a time of heightened risk of problem behaviors (Johnson, O'Malley, & Bachman, 1998). We used a person-centered approach and a...

To continue reading