In this plant, leaves are modified into tubes that can hold liquid and trap insects. Line drawing of Darlingtonia californica (California pitcherplant). These data are from Table 12.1 of Gotelli & Ellison (2004), and are the same ones that we use to illustrate Principal Component Analysis (PCA). We’ll illustrate LDA using detailed morphological measurements from 87 Darlingtonia californica (cobra lily California pitcherplant) plants at four sites in southern Oregon. Can we use the relationships among the measured variables to predict which site the plant came from? Example Dataset: Darlingtonia californica Their datasheet contains all of the measured variables except the site ID, so this plant isn’t included in our dataset. Here’s another potential application: suppose a field assistant measured another Darlingtonia plant but forgot to record the site they were working at. How well do these variables help us identify the site at which they were collected? For example, the below dataset involves a bunch of measurements on a number of Darlingtonia plants. The most common reason to use LDA is to assess how well observations from different groups can be differentiated from one another on the basis of their measured responses. They also used cluster analysis and PCA to help understand the correlations between these traits and relationships among the species. In their study, they measured a number of traits of the roots of eight grasses and then used LDA to classify observations to species on the basis of these traits. (2009) provide an ecological example of how LDA might be applied. Williams (1983) provides citations of some older ecological examples of LDA. LDA has been used for various ecological applications, as described by McCune & Grace (2002, p.
It is important to keep in mind that it identifies linear combinations of variables.įor more information on LDA, see chapter 8 in Manly & Navarro Alberto (2017) and section 11.3 in Legendre & Legendre (2012). It can be applied to data that satisfy these assumptions environmental and similar types of data might meet these assumptions, but community-level data do not. Given the similarity between LDA and MANOVA, it is perhaps unsurprising that LDA has the same assumptions with respect to multivariate normality, etc. Conceptually, it can be thought of as a MANOVA in reverse: LDA is an eigenanalysis technique (refer back to the chapter about matrix algebra if that term doesn’t sound familiar we’ll discuss eigenanalysis in more detail when we focus on PCA) that also incorporates information about group identity. However, when x1 and x2 are combined into a new composite variable z, that variable clearly distinguishes the two groups (see histogram along the z axis). In the below figure (Figure 11.11 from Legendre & Legendre 2012), analyses that focus just on x1 or just on x2 cannot distinguish these groups (compare the histograms shown along the x1 and x2 axes). Combinations of variables can be more sensitive than individual variables. Rather than relating individual variables to group identity, LDA identifies the combination of variables that best discriminates (distinguishes) sample units from different groups. Linear Discriminant Analysis (LDA) Theory (2016) describe how these techniques can be used in biology (e.g., morphometric analysis for species identification), ecology (e.g., niche separation by sympatric species), and medicine (e.g., diagnosis of diseases).
Linear discriminant analysis (LDA) is the most common method of DA.
In these notes, I demonstrate linear and distance-based DA techniques. Predictive – to predict the group to which an observation belongs, based on its measurement values.Descriptive – to separate groups optimally on the basis of the measurement values.Williams (1983) notes that DA can be conducted for two reasons: This assessment is known as a discriminant analysis (DA) (aka canonical analysis or supervised classification). This type of analysis is sometimes followed by an assessment of how well observations were classified into the identified groups, and how many were misclassified. Techniques such as cluster analysis are used to identify groups a posteriori based on a suite of correlated variables (i.e., multivariate data).