Matrix Diagonalisation in Python

Working with matrices involves finding their inverse, determinant, multiplying them and often exponentiating them. All these operations, particularly exponentiation (raising a matrix to the n’th power), are a lot less computationally expensive to do on a diagonal matrix.

Suppose we want to multiply two matrices A and B:

$$
\definecolor{blue}{RGB}{39,111,191}
\definecolor{red}{RGB}{242,54,2}
\begin{bmatrix}
\color{blue}a_1 & \color{blue}b_1 \\
\color{blue}a_2 & \color{blue}b_2 \\
\end{bmatrix} \begin{bmatrix}
\color{red}x_1 & \color{red}y_1 \\
\color{red}x_2 & \color{red}y_2 \\
\end{bmatrix} = \begin{bmatrix}
\color{blue}a_1 \color{red}x_1 + \color{blue}b_1 \color{red}x_2 &
\color{blue}a_1 \color{red}y_1 + \color{blue}b_1 \color{red}y_2 \\
\color{blue}a_2 \color{red}x_1 + \color{blue}b_2 \color{red}x_2 &
\color{blue}a_2 \color{red}y_1 + \color{blue}b_2 \color{red}y_2 \\
\end{bmatrix}
$$

Now imagine calculating the 100th power of a 1000×1000 matrix. Lots of math. Multiplying diagonal matrices is a lot easier. Just multiply the diagonals:

$$AB = \begin{bmatrix}
\color{blue}a_1 & 0 \\
0 & \color{blue}b_2 \\
\end{bmatrix} \begin{bmatrix}
\color{red}x_1 & 0 \\
0 & \color{red}y_2 \\
\end{bmatrix} = \begin{bmatrix}
\color{blue}a_1 \color{red}x_1 & 0 \\
0 & \color{blue}b_2 \color{red}y_2 \\
\end{bmatrix}
$$

Calculating the nth power (exponentiating a matrix):

$$A^n = \begin{bmatrix}
a & 0 & 0 \\
0 & b & 0 \\
0 & 0 & c \\
\end{bmatrix}^n = \begin{bmatrix}
a^n & 0 & 0 \\
0 & b^n & 0 \\
0 & 0 & c^n \\
\end{bmatrix}
$$

Converting a matrix into a diagonal form can be done with the help of eigendecomposition. While many sources explain the math, they don’t often explain where diagonalising matrices can be used.

Where would we use diagonal matrices?

Markov chains. Weather forecasting, modelling traffic flow, studying disease transmission and other applications rely on Markov chains where a system is described in terms of states and a state transition matrix. To find the state of a system after n steps we need to raise the transition matrix to the n’th power.

Google PageRank. In a nutshell this is an algorithm that uses a massive matrix with its elements representing in the most simplistic form hyperlinks connecting web pages. The importance (rank) of a page is determined by the probability of an Internet user landing on that page. This is another example of a Markov chain.

Principal Component Analysis. Imagine you’re studying the genetic make up of people that are afflicted with acquired metastructural pediculosis and those who are immune to it. You have a sample of 500 genes. PCA can help you pick a combination of genes that “explains” the difference between these two groups of people and narrow your search down considerably. PCA can be used for speech recognition, document classification, image compression and other applications.

Spectral Clustering. This technique uses eigendecomposition and a diagonal matrix to reduce dimensions before clustering in fewer dimensions. It can be used to analyse social networks, classify documents, predict functions of proteins to aid in drug discovery, detect anomalies such as credit card fraud, identify groups of genes, or even recommend music based on its pitch or rhythm.

Differential Equations. Certain differential equations lend themselves well to being written in a matrix form. Diagonalisation can then be used to solve these equations. This can be used to analyse electrical circuits, chemical kinetics, and even solve Maxwell and Schrödinger equations.

Diagonalising Matrices in Python

Skipping the theorems and the proofs the basic idea of diagonalising a matrix is finding an invertible matrix S and a diagonal matrix D so that:

$$A = S \times D \times S^{-1}$$

Turns out that as long as our matrix A is diagonalisable, the matrix S consists of eigenvectors and D of eigenvalues of A.

Let’s start with a simple 2D matrix:

$$A = \begin{bmatrix}
3 & 4 \\
0 & 1 \\
\end{bmatrix}
$$

We can diagonalise the matrix by finding the eigenvectors and the corresponding eigenvalues:

import numpy as np

A = [[3,4],[0,1]]
lambdas, S = np.linalg.eig(A)
D = np.diag(lambdas)

print(S, np.linalg.inv(S), D)

Our matrix A diagonalises as follows:

$$A = S \times D \times S^{-1}$$

$$\begin{bmatrix} 3 & 4 \\ 0 & 1 \\ \end{bmatrix} =
\begin{bmatrix} 1.00 & -0.89 \\ 0.00 & 0.45 \\ \end{bmatrix}
\begin{bmatrix} 3 & 0 \\ 0 & 1 \\ \end{bmatrix}
\begin{bmatrix} 1.00 & 2.00 \\ 0.00 & 2.24 \\ \end{bmatrix}$$

The columns of S are the eigenvectors of the matrix A and the values in D are the corresponding eigenvalues. Now exponentiating a matrix is as simple as exponentiating eigenvalues in D and then reconstructing A as shown above.

Here’s an example in Python:

import numpy as np

def mpower(A, n):
	lambdas, S = np.linalg.eig(A) # diagonalise
	D = np.diag(lambdas ** n) # exponentiate
	return S @ D @ np.linalg.inv(S) # reconstruct

A = [[1,2],[3,4]]
A1 = np.linalg.matrix_power(A, 2) # use numpy ..
A2 = mpower(A, 2) # .. vs diagonalisation

print("A1:\n{0}\nA2:\n{1}".format(A1, A2))

Copy and run this code to see that the two methods of exponentiating a matrix produce identical results.

So Did That Make Exponentiation Faster?

I used the code above to see exactly how much matrix diagonalisation speeds things up. The code exponentiated matrices of random non-integer numbers using the built-in numpy’s linalg.matrix_power() function and timed it against exponentiating using matrix diagonalisation. As it turns out (at least in my case) diagonalisation lost to the the built-in function every single time. In fact, the larger the matrix the worse the diagonalisation method performed.

DISCLAIMER: I’m a complete beginner in linear algebra so take these results with a grain of salt; most likely I missed something or perhaps the built-in function already uses diagonalisation and does it better.