Kernel Fisher discriminant analysis
In statistics, kernel Fisher discriminant analysis (KFD),[1] also known as generalized discriminant analysis[2] and kernel discriminant analysis,[3] is a kernelized version of linear discriminant analysis (LDA). It is named after Ronald Fisher. Using the kernel trick, LDA is implicitly performed in a new feature space, which allows non-linear mappings to be learned.
Contents
Linear discriminant analysis
Intuitively, the idea of LDA is to find a projection where class separation is maximized. Given two sets of labeled data, and
, define the class means
and
to be
where is the number of examples of class
. The goal of linear discriminant analysis is to give a large separation of the class means while also keeping the in-class variance small.[4] This is formulated as maximizing
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): J(\mathbf{w}) = \frac{\mathbf{w}^{\text{T}}\mathbf{S}_B\mathbf{w}}{\mathbf{w}^{\text{T}}\mathbf{S}_W\mathbf{w}},
where is the between-class covariance matrix and
is the total within-class covariance matrix:
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \begin{align} \mathbf{S}_B & = (\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^{\text{T}} \\ \mathbf{S}_W & = \sum_{i=1,2}\sum_{n=1}^{l_i}(\mathbf{x}_n^i-\mathbf{m}_i)(\mathbf{x}_n^i-\mathbf{m}_i)^{\text{T}}. \end{align}
Differentiating with respect to
, setting equal to zero, and rearranging gives
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): (\mathbf{w}^{\text{T}}\mathbf{S}_B\mathbf{w})\mathbf{S}_W\mathbf{w} = (\mathbf{w}^{\text{T}}\mathbf{S}_W\mathbf{w})\mathbf{S}_B\mathbf{w}.
Since we only care about the direction of and
has the same direction as
,
can be replaced by
and we can drop the scalars Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): (\mathbf{w}^{\text{T}}\mathbf{S}_B\mathbf{w})
and Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): (\mathbf{w}^{\text{T}}\mathbf{S}_W\mathbf{w}) to give
Kernel trick with LDA
To extend LDA to non-linear mappings, the data, given as the points
, can be mapped to a new feature space,
, via some function
. In this new feature space, the function that needs to be maximized is[1]
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): J(\mathbf{w}) = \frac{\mathbf{w}^{\text{T}}\mathbf{S}_B^{\phi}\mathbf{w}}{\mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w}},
where
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \begin{align} \mathbf{S}_B^{\phi} & = (\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})(\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})^{\text{T}} \\ \mathbf{S}_W^{\phi} & = \sum_{i=1,2}\sum_{n=1}^{l_i}(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})^{\text{T}}, \end{align}
and
Further, note that . Explicitly computing the mappings
and then performing LDA can be computationally expensive, and in many cases intractable. For example,
may be infinitely dimensional. Thus, rather than explicitly mapping the data to
, the data can be implicitly embedded by rewriting the algorithm in terms of dot products and using the kernel trick in which the dot product in the new feature space is replaced by a kernel function,
.
LDA can be reformulated in terms of dot products by first noting that will have an expansion of the form[5]
Then note that
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{w}^{\text{T}}\mathbf{m}_i^{\phi} = \frac{1}{l_i}\sum_{j=1}^{l}\sum_{k=1}^{l_i}\alpha_jk(\mathbf{x}_j,\mathbf{x}_k^i) = \mathbf{\alpha}^{\text{T}}\mathbf{M}_i,
where
The numerator of can then be written as:
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \begin{align} \mathbf{w}^{\text{T}}\mathbf{S}_B^{\phi}\mathbf{w} & = \mathbf{w}^{\text{T}}(\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})(\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})^{\text{T}}\mathbf{w} \\ & = \mathbf{\alpha}^{\text{T}}\mathbf{M}\mathbf{\alpha}, \end{align}
where Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{M} = (\mathbf{M}_2-\mathbf{M}_1)(\mathbf{M}_2-\mathbf{M}_1)^{\text{T}} . Similarly, the denominator can be written as
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w}=\mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha},
where
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{N} = \sum_{j=1,2}\mathbf{K}_j(\mathbf{I}-\mathbf{1}_{l_j})\mathbf{K}_j^{\text{T}},
with the Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): n^{\text{th}}, m^{\text{th}}
component ofdefined as
,
is the identity matrix, and
the matrix with all entries equal to
. This identity can be derived by starting out with the expression for Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w} and using the expansion of
and the definitions of
and
![]()
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \begin{align} \mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w} & = \left(\sum_{i=1}^l\alpha_i\phi^{\text{T}}(\mathbf{x}_i)\right)\left(\sum_{j=1,2}\sum_{n =1}^{l_j}(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})^{\text{T}}\right) \left(\sum_{k=1}^l\alpha_k\phi(\mathbf{x}_k)\right)\\ & = \sum_{j=1,2}\sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l\alpha_i\phi^{\text{T}}(\mathbf{x}_i)(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})^{\text{T}} \alpha_k\phi(\mathbf{x}_k) \\ & = \sum_{j=1,2}\sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l \left(\alpha_ik(\mathbf{x}_i,\mathbf{x}_n^j)-\frac{1}{l_j}\sum_{p=1}^{l_j}\alpha_ik(\mathbf{x}_i,\mathbf{x}_p^j)\right) \left(\alpha_kk(\mathbf{x}_k,\mathbf{x}_n^j)-\frac{1}{l_j}\sum_{q=1}^{l_j}\alpha_kk(\mathbf{x}_k,\mathbf{x}_q^j)\right) \\ & = \sum_{j=1,2}\left( \sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l\Bigg( \alpha_i\alpha_kk(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_n^j)\right.\\ & \left.{} - \frac{2\alpha_i\alpha_k}{l_j}\sum_{p=1}^{l_j}k(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_p^j) \left. + \frac{\alpha_i\alpha_k}{l_j^2}\sum_{p=1}^{l_j}\sum_{q=1}^{l_j}k(\mathbf{x}_i,\mathbf{x}_p^j)k(\mathbf{x}_k,\mathbf{x}_q^j) \right)\right) \\ & = \sum_{j=1,2}\left( \sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l\left( \alpha_i\alpha_kk(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_n^j) - \frac{\alpha_i\alpha_k}{l_j}\sum_{p=1}^{l_j}k(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_p^j) \right)\right) \\ & = \sum_{j=1,2} \mathbf{\alpha}^{\text{T}} \mathbf{K}_j\mathbf{K}_j^{\text{T}}\mathbf{\alpha} - \mathbf{\alpha}^{\text{T}} \mathbf{K}_j\mathbf{1}_{l_j}\mathbf{K}_j^{\text{T}}\mathbf{\alpha} \\ & = \mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha}. \end{align}
With these equations for the numerator and denominator of , the equation for
can be rewritten as
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): J(\mathbf{\alpha}) = \frac{\mathbf{\alpha}^{\text{T}}\mathbf{M}\mathbf{\alpha}}{\mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha}}.
Then, differentiating and setting equal to zero gives
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): (\mathbf{\alpha}^{\text{T}}\mathbf{M}\mathbf{\alpha})\mathbf{N}\mathbf{\alpha} = (\mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha})\mathbf{M}\mathbf{\alpha}.
Since only the direction of , and hence the direction of
, matters, the above can be solved for
as
Note that in practice, is usually singular and so a multiple of the identity is added to it [1]
Given the solution for , the projection of a new data point is given by[1]
Multi-class KFD
The extension to cases where there are more than two classes is relatively straightforward.[2][6][7] Let be the number of classes. Then multi-class KFD involves projecting the data into a
-dimensional space using
discriminant functions
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): y_i = \mathbf{w}_i^{\text{T}}\phi(\mathbf{x}) \qquad i= 1,\ldots,c-1.
This can be written in matrix notation
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{y} = \mathbf{W}^{\text{T}}\phi(\mathbf{x}),
where the are the columns of
.[6] Further, the between-class covariance matrix is now
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{S}_B^{\phi} = \sum_{i=1}^c l_i(\mathbf{m}_i^{\phi}-\mathbf{m}^{\phi})(\mathbf{m}_i^{\phi}-\mathbf{m}^{\phi})^{\text{T}},
where is the mean of all the data in the new feature space. The within-class covariance matrix is
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{S}_W^{\phi} = \sum_{i=1}^c \sum_{n=1}^{l_i}(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})^{\text{T}},
The solution is now obtained by maximizing
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): J(\mathbf{W}) = \frac{\left|\mathbf{W}^{\text{T}}\mathbf{S}_B^{\phi}\mathbf{W}\right|}{\left|\mathbf{W}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{W}\right|}.
The kernel trick can again be used and the goal of multi-class KFD becomes[7]
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{A}^* = \underset{\mathbf{A}}{\operatorname{argmax}} = \frac{\left|\mathbf{A}^{\text{T}}\mathbf{M}\mathbf{A}\right|}{\left|\mathbf{A}^{\text{T}}\mathbf{N}\mathbf{A}\right|},
where and
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \begin{align} M & = \sum_{j=1}^cl_j(\mathbf{M}_j-\mathbf{M}_{*})(\mathbf{M}_j-\mathbf{M}_{*})^{\text{T}} \\ N & = \sum_{j=1}^c\mathbf{K}_j(\mathbf{I}-\mathbf{1}_{l_j})\mathbf{K}_j^{\text{T}}. \end{align}
The are defined as in the above section and
is defined as
can then be computed by finding the
leading eigenvectors of
.[7] Furthermore, the projection of a new input,
, is given by[7]
- Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \mathbf{y}(\mathbf{x}_t) = \left(\mathbf{A}^{*}\right)^{\text{T}}\mathbf{K}_t,
where the component of
is given by
.
Classification using KFD
In both two-class and multi-class KFD, the class label of a new input can be assigned as[7]
where is the projected mean for class
and
is a distance function.
Applications
Kernel discriminant analysis has been used in a variety of applications. These include:
- Face recognition[3][8][9] and detection[10][11]
- Hand-written digit recognition[1][12]
- Palmprint recognition[13]
- Classification of malignant and benign cluster microcalcifications[14]
- Seed classification[2]
See also
References
<templatestyles src="Reflist/styles.css" />
Cite error: Invalid <references>
tag; parameter "group" is allowed only.
<references />
, or <references group="..." />
External links
- Kernel Discriminant Analysis in C# - C# code to perform KFD.
- Matlab Toolbox for Dimensionality Reduction - Includes a method for performing KFD.
- Handwriting Recognition using Kernel Discriminant Analysis - C# code that demonstrates handwritten digit recognition using KFD.
- ↑ 1.0 1.1 1.2 1.3 1.4 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ 2.0 2.1 2.2 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ 3.0 3.1 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ 6.0 6.1 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ 7.0 7.1 7.2 7.3 7.4 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.