Numerical Linear Algebra

Numerical Linear Algebra#

Fundamental Roles#

Linear algebra is a foundation in modern mathematics. It enables multidimensional calculus, multivariate statistics, differential geometry, functional analysis, control theory, and more. It enables systematic and predictable problem-solving. It is fair to say we understand most of the linear mathemtics. When faced with non-linear problems, it is common to transform them into linear approximations for analysis and solution.

Numerical linear algebra further enhances this power, extending it to computational physics. It enables solving partial differential equations (PDEs), optimization problems, eigenvalue analyses, and more. By bridging theory and computation, numerical linear algebra helps solving some of the most challenging problems in physics and engineering.

Motivations from Physics#

Normal Modes: Normal modes describe vibration patterns in systems near equilibrium. It is a generalized eigenvalue problem. Applications include identifying resonance frequencies in materials, analyzing sound in acoustics, and studying plasma waves. Linear algebra’s role in solving eigenvalue problems is central to understanding these phenomena.

Quantum Mechanics: Quantum mechanics is linear, governed by the Schrödinger equation. This linearity allows quantum systems to be expressed in terms of vectors and matrices, enabling numerical approaches. Applications include calculating electronic band structures, solving molecular orbital equations, and diagonalizing Hamiltonians in spin systems. Efficient algorithms like Lanczos and Davidson extract eigenvalues and eigenvectors from large matrices are essential for studying quantum systems.

Discretized PDEs: PDEs, such as the Poisson and heat equations, describe continuous physical systems. Discretization transforms them into large (sparse) linear systems, enabling efficient iterative solvers like Conjugate Gradient (CG) and GMRES. These methods apply to problems in electrostatics, heat conduction, fluid flow, and structural mechanics. Even nonlinear PDEs often require solving linear subproblems iteratively, highlighting the importance of linear algebra in computational physics.

Linearization of Nonlinear Problems: Nonlinear problems, like turbulence or combustion, are approximated through linearization. Applications include solving non-linear PDEs, perturbation theory, and computational fluid dynamics. Nonlinear problems often reduce to sequences of linear solves, highlighting the foundational role of linear algebra.

Motivations from Computation#

Large-Scale Data Physics experiments and machine learning (ML) tasks both generate large datasets, such as images, sensor readings, or high-dimensional feature vectors. Matrix decompositions—including QR, Singular Value Decomposition (SVD), and Principal Component Analysis (PCA)—are critical tools for analyzing and compressing these datasets. PCA, in particular, is widely used for dimensionality reduction, feature extraction, noise reduction, and visualization in high-dimensional spaces. Similarly, in physics, these techniques are applied to identify dominant modes in fluid flow or material vibrations. This shared reliance on linear algebra underscores its cross-disciplinary importance in processing complex data.

Neural Networks Neural networks rely heavily on linear algebra for their core operations. Forward and backpropagation, essential processes in deep learning, primarily involve matrix multiplication between layer weight matrices and input or activation vectors. Training large neural networks involves repeated and massive matrix-vector and matrix-matrix operations. Efficient linear algebra routines are therefore vital to the success of modern ML systems.

Hardware Accelerators Efficient linear algebra implementations are critical for leveraging modern hardware, such as GPUs and TPUs, which are optimized for matrix operations. Vectorized computations not only accelerate ML tasks but also ensure scalability for large datasets and complex models.

Note

If Everything Were Linear, We Wouldn’t Need Computers.

A popular statement suggests that in a purely linear world, everything would be easy to solve analytically. However, this is an oversimplification. Even perfectly linear problems can pose significant computational challenges due to two key factors.

First, high dimensionality makes solving linear systems computationally intensive. For instance, systems with millions of unknowns arise in large PDE grids or massive machine learning models. Processing such large-scale data requires significant computational power, regardless of linearity.
Second, real-world computations face constraints from finite precision. Hardware limitations, such as floating-point arithmetic, introduce numerical stability and conditioning challenges, even in linear systems. Addressing these issues requires robust algorithms to ensure accurate and efficient solutions.

Importance of Numerical Linear Algebra#

Floating-Point Arithmetic and Rounding Errors: As covered in last chapter, numerical analysis addresses the limitations of finite precision in computational hardware. Small rounding errors, particularly in ill-conditioned systems, can amplify significantly, leading to unreliable results. Techniques like pivoting in Gaussian elimination or employing stable iterative algorithms are crucial for ensuring numerical stability.

Condition Numbers and Sensitivity: Ill-conditioned matrices amplify small input perturbations, making solutions highly sensitive to errors. This issue is prevalent in physics problems, such as nearly degenerate modes, and in machine learning tasks, like poorly scaled data that slows network training. Understanding and managing condition numbers is essential for robust computations.

Solver Complexity and Algorithm Selection: The choice of solver depends on the problem scale and structure. Direct solvers, such as Gaussian elimination or LU decomposition, scale as \(\mathcal{O}(n^3)\) for dense systems, which can be infeasible for large \(n\). Iterative methods, like Jacobi, Conjugate Gradient, and GMRES, exploit matrix sparsity to handle massive systems efficiently. Preconditioning further improves convergence, benefiting both physics simulations and ML optimizations.

Leveraging Established Theory: Linear algebra is a deeply studied field, and once a problem is expressed as a matrix, powerful theoretical and numerical tools can be applied. Techniques like eigen-decomposition and SVD provide consistent and reliable approaches across domains, benefiting applications from quantum mechanics to deep learning. This universality makes numerical linear algebra indispensable in modern computational science.

Condition Numbers & Floating-Point Issues#

Accurate numerical solutions in computational physics depend heavily on floating-point arithmetic details and the condition number of the problem at hand. Small floating-point errors can be amplified dramatically if the system is ill-conditioned. In this set of notes, we’ll explore the nature of rounding errors, how to spot ill-conditioned problems, and why it all matters.

Machine Epsilon and Rounding Errors#

Recalling from last lecture…

Finite Precision
- On a real computer, numbers are stored with a finite number of bits (in Python, float64 has 53 bits of precision in the significand).
- The smallest gap between distinct floating-point numbers near 1.0 is the machine epsilon (\(\approx 2.22 \times 10^{-16}\) for double precision).
Rounding & Representation
- Because of finite precision, most real numbers can’t be represented exactly.
- Operations like addition and multiplication are done to a certain precision; the final result is rounded back into the floating-point format.
Catastrophic Cancellation
- If two nearly equal numbers are subtracted, the leading digits can cancel out, leaving the result with fewer significant bits of accuracy.
- Example: (1e16 + 1) - 1e16 might yield 0.0 in floating-point, even though the exact answer is 1.0.

Condition Number#

Intuitive Definition:

Sensitivity to Input Changes
- A condition number measures how much the solution to a problem can change relative to small changes in the input data.
- In the context of solving \(A\mathbf{x} = \mathbf{b}\), if the matrix \(A\) is ill-conditioned, small perturbations in \(\mathbf{b}\) or rounding errors in \(A\) can lead to large changes in \(\mathbf{x}\).
Mathematical Insight
- For a non-singular matrix \(A\), the condition number (in the 2-norm) is
  
  (13)#\[\begin{align} \kappa(A) = |A|_2\,|A^{-1}|_2, \end{align}\]
  
  which also equals the ratio of the largest singular value to the smallest singular value of \(A\).
- A large \(\kappa(A)\) (e.g., \(10^6\) or \(10^9\)) indicates potential numerical instability.

Consider the following simple 2×2 example:

(14)#\[\begin{align} A = \begin{pmatrix} 1 & 1\\ 1 & 1.0001 \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 2\\ 2.0001 \end{pmatrix}. \end{align}\]

In the exact World: If we solve \(A\mathbf{x} = \mathbf{b}\) symbolically, we’ll get a unique solution. However, suppose we change \(\mathbf{b}\) slightly to \(\mathbf{b}' = (2, 2.0000)^T\). This tiny change can produce a surprisingly large change in \(\mathbf{x}\).

The following python code illustrate this.

import numpy as np

A = np.array([[1, 1],
              [1, 1.0001]])
b = np.array([2,
              2.0001])

# Solve the original system
x = np.linalg.solve(A, b)
print("Solution x for original b:", x)

# Slightly perturb b
bp = np.array([2, 2.0000])
xp = np.linalg.solve(A, bp)
print("Solution x for perturbed b:", xp)

# Check the condition number of A
cond_A = np.linalg.cond(A)
print("Condition number of A:", cond_A)

Solution x for original b: [1. 1.]
Solution x for perturbed b: [2. 0.]
Condition number of A: 40002.00007491187

The result indicates:

The original and perturbed solutions might differ noticeably despite only a 0.0001 change in the second component of \(\mathbf{b}\).
The condition number cond_A will be relatively large, indicating that \(A\) is ill-conditioned.

Why Does This Matter?#

Direct solvers (e.g., Gaussian Elimination) can become numerically unreliable if the system is ill-conditioned because floating-point errors from pivoting or rounding are easily amplified. Although pivoting helps, it cannot fully correct a severely ill-conditioned system.

Iterative methods (e.g., Conjugate Gradient, GMRES) also degrade under ill-conditioning, resulting in slow convergence or the need for heavy preconditioning. Even then, convergence may require many iterations or yield poor accuracy.

Matrix decompositions (e.g., eigenvalue, SVD) are similarly affected. Small errors in a nearly singular matrix can lead to large deviations in the computed eigenvalues or singular values.

In practice, ill-conditioned matrices can undermine results in physics simulations (e.g., discretized PDEs, linearized nonlinear models) and degrade performance in data science tasks such as large-scale regression or neural network training.

Direct Solvers#

Direct methods are often the first approach taught for solving linear systems \(A\mathbf{x} = \mathbf{b}\). They involve algebraic factorizations that can be computed in a fixed number of steps (roughly \(\mathcal{O}(n^3)\)) for an \(n \times n\) matrix.

Gaussian Elimination#

Gaussian Elimination transforms the system \(A \mathbf{x} = \mathbf{b}\) into an equivalent upper-triangular form \(U \mathbf{x} = \mathbf{c}\) via systematic row operations. Once in upper-triangular form, one can perform back-substitution to solve for \(\mathbf{x}\).

Row Operations
- Subtract a multiple of one row from another to eliminate entries below the main diagonal.
- Aim to create zeros in column \(j\) below row \(j\).
Partial Pivoting
- When a pivot (diagonal) element is small (or zero), swap the current row with a row below that has a larger pivot element in the same column.
- This step mitigates numerical instability by reducing the chance that small pivots lead to large rounding errors in subsequent operations.
Result
- After eliminating all sub-diagonal entries, the matrix is in upper-triangular form \(U\).
- Solve \(U\mathbf{x} = \mathbf{c}\) via back-substitution.

Here is an exampmle:

System of equations	Row operations	Augmented matrix
(15)#\[\begin{alignat}{4} 2x &{}+{}& y &{}-{}& z &{}={}& 8 & \\ -3x &{}-{}& y &{}+{}& 2z &{}={}& -11 & \\ -2x &{}+{}& y &{}+{}& 2z &{}={}& -3 & \end{alignat}\]		(16)#\[\begin{align} \left[\begin{array}{rrr\|r} 2 & 1 & -1 & 8 \\ -3 & -1 & 2 & -11 \\ -2 & 1 & 2 & -3 \end{array}\right] \nonumber \end{align}\]
(17)#\[\begin{alignat}{4} 2x &{}+{}& y &{}-{}& z &{}={}& 8 & \\ & & \tfrac12 y &{}+{}& \tfrac12 z &{}={}& 1 & \\ & & 2y &{}+{}& z &{}={}& 5 & \end{alignat}\]	(18)#\[\begin{align} L_2 + \tfrac32 L_1 &\to L_2 \\ L_3 + L_1 &\to L_3 \end{align}\]	(19)#\[\begin{align} \left[\begin{array}{rrr\|r} 2 & 1 & -1 & 8 \\ 0 & \frac12 & \frac12 & 1 \\ 0 & 2 & 1 & 5 \end{array}\right] \end{align}\]
(20)#\[\begin{alignat}{4} 2x &{}+{}& y &{}-{}& z &{}={}& 8 & \\ & & \tfrac12 y &{}+{}& \tfrac12 z &{}={}& 1 & \\ & & & & -z &{}={}& 1 & \end{alignat}\]	(21)#\[\begin{align} L_3 + -4 L_2 \to L_3 \end{align}\]	(22)#\[\begin{align} \left[\begin{array}{rrr\|r} 2 & 1 & -1 & 8 \\ 0 & \frac12 & \frac12 & 1 \\ 0 & 0 & -1 & 1 \end{array}\right] \end{align}\]

The matrix is now in echelon form (also called triangular form):

System of equations	Row operations	Augmented matrix
(23)#\[\begin{alignat}{4} 2x &{}+{}& y & & &{}={} 7 & \\ & & \tfrac12 y & & &{}={} \tfrac32 & \\ & & &{}-{}& z &{}={} 1 & \end{alignat}\]	(24)#\[\begin{align} L_1 - L_3 &\to L_1\\ L_2 + \tfrac12 L_3 &\to L_2 \end{align}\]	(25)#\[\begin{align} \left[\begin{array}{rrr\|r} 2 & 1 & 0 & 7 \\ 0 & \frac12 & 0 & \frac32 \\ 0 & 0 & -1 & 1 \end{array}\right] \end{align}\]
(26)#\[\begin{alignat}{4} 2x &{}+{}& y &\quad& &{}={}& 7 & \\ & & y &\quad& &{}={}& 3 & \\ & & &\quad& z &{}={}& -1 & \end{alignat}\]	(27)#\[\begin{align} 2 L_2 &\to L_2 \\ -L_3 &\to L_3 \end{align}\]	(28)#\[\begin{align} \left[\begin{array}{rrr\|r} 2 & 1 & 0 & 7 \\ 0 & 1 & 0 & 3 \\ 0 & 0 & 1 & -1 \end{array}\right] \end{align}\]
(29)#\[\begin{alignat}{4} x &\quad& &\quad& &{}={}& 2 & \\ &\quad& y &\quad& &{}={}& 3 & \\ &\quad& &\quad& z &{}={}& -1 & \end{alignat}\]	(30)#\[\begin{align} L_1 - L_2 &\to L_1 \\ \tfrac12 L_1 &\to L_1 \end{align}\]	(31)#\[\begin{align} \left[\begin{array}{rrr\|r} 1 & 0 & 0 & 2 \\ 0 & 1 & 0 & 3 \\ 0 & 0 & 1 & -1 \end{array}\right] \end{align}\]

Below is a simple example comparing naive Gaussian Elimination (no pivoting) with partial pivoting on a small system. While this code is for illustrative purposes (and not as robust as numpy.linalg.solve()), it shows the basic process.

import numpy as np

def gaussian_elimination_naive(A, b):
    """
    Perform naive (no pivoting) Gaussian elimination to solve A x = b.
    Returns the solution vector x.
    """
    A = A.astype(float)  # Ensure floating-point
    b = b.astype(float)
    n = A.shape[0]

    # Forward elimination
    for k in range(n-1):
        for i in range(k+1, n):
            if A[k, k] == 0:
                raise ValueError("Zero pivot encountered (no pivoting).")
            factor = A[i, k] / A[k, k]
            for j in range(k, n):
                A[i, j] -= factor * A[k, j]
            b[i] -= factor * b[k]

    # Back-substitution
    x = np.zeros(n)
    for i in reversed(range(n)):
        s = b[i]
        for j in range(i+1, n):
            s -= A[i, j] * x[j]
        x[i] = s / A[i, i]
    return x

def gaussian_elimination_partial_pivot(A, b):
    """
    Gaussian elimination with partial pivoting to solve A x = b.
    Returns the solution vector x.
    """
    A = A.astype(float)
    b = b.astype(float)
    n = A.shape[0]

    # Forward elimination with partial pivoting
    for k in range(n-1):
        # Pivot selection: find max pivot in column k
        pivot_row = np.argmax(np.abs(A[k:, k])) + k
        if A[pivot_row, k] == 0:
            raise ValueError("Matrix is singular or pivot is zero.")
        # Swap rows if needed
        if pivot_row != k:
            A[[k, pivot_row]] = A[[pivot_row, k]]
            b[k], b[pivot_row] = b[pivot_row], b[k]

        for i in range(k+1, n):
            factor = A[i, k] / A[k, k]
            A[i, k:] = A[i, k:] - factor * A[k, k:]
            b[i] = b[i] - factor * b[k]

    # Back-substitution
    x = np.zeros(n)
    for i in reversed(range(n)):
        s = b[i]
        for j in range(i+1, n):
            s -= A[i, j] * x[j]
        x[i] = s / A[i, i]
    return x

A = np.random.random((3, 3))
b = np.random.random((3))

print("Naive Gaussian Elimination:")
x_naive = gaussian_elimination_naive(A.copy(), b.copy())
print("Solution:", x_naive, end="\n\n")

print("Gaussian Elimination with Partial Pivoting:")
x_pivot = gaussian_elimination_partial_pivot(A.copy(), b.copy())
print("Solution:", x_pivot, end="\n\n")

# Compare with numpy's solve
x_np = np.linalg.solve(A.copy(), b.copy())
print("Numpy's solve:")
print("Solution:", x_np)

Naive Gaussian Elimination:
Solution: [-0.1468276   0.63110022  0.95447362]

Gaussian Elimination with Partial Pivoting:
Solution: [-0.1468276   0.63110022  0.95447362]

Numpy's solve:
Solution: [-0.1468276   0.63110022  0.95447362]

\(LU\) Decomposition#

\(LU\) Decomposition is a systematic way to express \(A\) as \(A = P L U\), where:

\(P\) is a permutation matrix
\(L\) is lower triangular (with 1s on the diagonal following the standard convention).
\(U\) is upper triangular.

Once \(A\) is factored as \(P L U\), solving \(A\mathbf{x} = \mathbf{b}\) becomes:

\(L \mathbf{y} = P^t \mathbf{b}\) (forward substitution)
\(U \mathbf{x} = \mathbf{y}\) (back-substitution)

Gaussian elimination essentially constructs the \(L\) and \(U\) matrices behind the scenes:

The multipliers used in the row operations become the entries of \(L\).
The final upper-triangular form is \(U\).

import scipy.linalg as la

# Perform LU decomposition with pivoting
P, L, U = la.lu(A)  # P is the permutation matrix

print("Permutation matrix P:")
print(P)
print("Lower triangular matrix L:")
print(L)
print("Upper triangular matrix U:")
print(U, end="\n\n")

# Forward substitution for L y = Pt b
y = la.solve_triangular(L, np.dot(P.T, b), lower=True, unit_diagonal=True)
# Back substitution for U x = y
x_LU = la.solve_triangular(U, y)

print("Solution using LU decomposition:", x_LU)
print("Check with np.linalg.solve(A, b):", np.linalg.solve(A, b))

Permutation matrix P:
[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
Lower triangular matrix L:
[[1.         0.         0.        ]
 [0.47730276 1.         0.        ]
 [0.79877989 0.35756653 1.        ]]
Upper triangular matrix U:
[[0.96745561 0.72212169 0.16153305]
 [0.         0.41597775 0.28557393]
 [0.         0.         0.40593175]]

Solution using LU decomposition: [-0.1468276   0.63110022  0.95447362]
Check with np.linalg.solve(A, b): [-0.1468276   0.63110022  0.95447362]

Matrix Inverse#

Although the matrix inverse \(A^{-1}\) is a central theoretical concept, explicitly forming \(A^{-1}\) just to solve \(A \mathbf{x} = \mathbf{b}\) is almost always unnecessary and can degrade numerical stability. Instead, direct approaches (like Gaussian Elimination or LU factorization) find \(\mathbf{x}\) with fewer operations and less error accumulation. Both methods cost about \(\mathcal{O}(n^3)\), but computing the entire inverse introduces extra steps and can magnify floating-point errors.

In rare cases, you might actually need \(A^{-1}\). For example, when:

Multiple Right-Hand Sides: If you must solve \(A \mathbf{x} = \mathbf{b}_i\) for many different \(\mathbf{b}_i\), you might form or approximate \(A^{-1}\) for convenience.
Inverse-Related Operations: Some advanced algorithms (e.g., computing discrete Green’s functions or certain control theory design methods) explicitly require elements of the inverse.

Below is a demonstration in Python, showing how to compute the inverse using np.linalg.inv()—though in practice, it’s generally safer and more efficient to solve individual systems via a factorization method.

# Simple 2x2 matrix
A = np.array([[2.0, 1.0],
              [1.0, 2.0]], dtype=float)

# Compute inverse using NumPy
A_inv = np.linalg.inv(A)
print("Inverse of A:\n", A_inv)

# Verify A_inv is indeed the inverse
I_check = A @ A_inv
print("\nCheck A * A_inv = I ?\n", I_check)

Inverse of A:
 [[ 0.66666667 -0.33333333]
 [-0.33333333  0.66666667]]

Check A * A_inv = I ?
 [[1. 0.]
 [0. 1.]]

Iterative Solvers for Large/Sparse Systems#

Large-Scale Problems: PDE discretizations in 2D/3D can easily lead to systems with millions of unknowns, making \(\mathcal{O}(n^3)\) direct approaches infeasible.
Sparse Matrices: Many physical systems produce matrices with a significant number of zero entries. Iterative methods typically access only nonzero elements each iteration, saving time and memory.
Scalability: On parallel architectures (clusters, GPUs), iterative methods often scale better than direct factorizations.

Jacobi Iteration#

Idea: Rewrite \(A\mathbf{x} = \mathbf{b}\) as \(\mathbf{x} = D^{-1}(\mathbf{b} - R\mathbf{x})\), where \(D\) is the diagonal of \(A\) and \(R\) is the remainder.
Update Rule:

(32)#\[\begin{align} x_i^{(k+1)} = \frac{1}{a_{ii}} \Big(b_i - \sum_{j \neq i} a_{ij} x_j^{(k)}\Big). \end{align}\]
Pros/Cons:
- Easy to implement; each iteration updates \(\mathbf{x}\) using only values from the previous iteration.
- Slow convergence unless \(A\) is well-conditioned (e.g., diagonally dominant).

def jacobi_iteration(A, b, max_iter=1000, tol=1e-8):
    """
    Solve A x = b using Jacobi iteration.
    A is assumed to be square with non-zero diagonal.
    """
    n = A.shape[0]
    x = np.zeros(n)
    
    for k in range(max_iter):
        x_old = np.copy(x)
        for i in range(n):
            # Sum over off-diagonal terms
            s = 0.0
            for j in range(n):
                if j != i:
                    s += A[i,j] * x_old[j]
            x[i] = (b[i] - s) / A[i, i]
        
        # Check for convergence
        if np.linalg.norm(x - x_old, ord=np.inf) < tol:
            print(f"Jacobi converged in {k+1} iterations.")
            return x
    
    print("Jacobi did not fully converge within max_iter.")
    return x

# Example system (small, but let's pretend it's "sparse")
A = np.array([[4.0, -1.0, 0.0],
              [-1.0, 4.0, -1.0],
              [0.0, -1.0, 4.0]], dtype=float)
b = np.array([6.0, 6.0, 6.0], dtype=float)

x_jacobi = jacobi_iteration(A, b)
print("Jacobi solution:", x_jacobi)
print("Direct solve comparison:", np.linalg.solve(A, b))

Jacobi converged in 20 iterations.
Jacobi solution: [2.14285714 2.57142857 2.14285714]
Direct solve comparison: [2.14285714 2.57142857 2.14285714]

Gauss-Seidel#

Gauss-Seidel iteration is closely related to Jacobi but typically converges faster for many problems, particularly if the system matrix is strictly or diagonally dominant.

Idea:
- Instead of using only the old iteration values (from step \(k\)), Gauss-Seidel uses the most recent updates within the same iteration.
- This often provides faster convergence because you incorporate newly computed values immediately rather than waiting for the next iteration.
Update Rule:

(33)#\[\begin{align} x_i^{(k+1)} = \frac{1}{a_{ii}} \Big( b_i - \sum_{j < i} a_{ij} x_j^{(k+1)} - \sum_{j > i} a_{ij} x_j^{(k)} \Big). \end{align}\]

Note that \(\mathbf{x}^{(k+1)}\) is already partially updated (for \(j < i\)).
Convergence Properties:
- Gauss-Seidel can be shown to converge for strictly diagonally dominant matrices, and often outperforms Jacobi in practice.
- Still relatively slow for large, ill-conditioned systems.

Eigenvalue Problems#

Eigenvalue problems lie at the heart of many physics applications, including normal mode analysis, quantum Hamiltonians, and stability analyses.

Normal Modes
- Vibrational analyses of structures or molecules reduce to \(K \mathbf{x} = \omega^2 M \mathbf{x}\). The solutions are eigenpairs \((\omega^2, \mathbf{x})\).
- Each eigenvector \(\mathbf{x}\) describes a mode shape; the eigenvalue \(\omega^2\) is the squared frequency.
Quantum Hamiltonians
- The Schrödinger equation in matrix form \(\hat{H} |E\rangle = E |E\rangle\) yields eigenvalues \(E\) (energy levels) and eigenstates \(|E\rangle\).
- Collecting all eigenvectors (as columns) in a matrix diagonalizes \(\hat{H}\) if it is Hermitian.
Stability & Modal Analysis
- A linearized system near equilibrium produces a Jacobian \(J\). Eigenvalues of \(J\) show growth/decay rates of perturbations.
- If real parts of eigenvalues are negative, the system is stable; if positive, it’s unstable.

The Eigenvalue Problem#

Given an \(n \times n\) matrix \(A\), the eigenvalue problem seeks scalars \(\lambda\) (eigenvalues) and nonzero vectors \(\mathbf{v}\) (eigenvectors) such that:

(34)#\[\begin{align} A \mathbf{v} = \lambda \mathbf{v}. \end{align}\]

Eigenvalues can be real or complex.
Eigenvectors identify the directions in which \(A\) acts as a simple scale transformation (\(\lambda\)).

Power Method for the Dominant Eigenpair#

The Power Method is a straightforward iterative technique for finding the eigenpair associated with the largest-magnitude eigenvalue:

Algorithm Sketch
- Begin with an initial vector \(\mathbf{x}^{(0)}\).
- Apply \(A\) repeatedly and normalize:
  
  (35)#\[\begin{align} \mathbf{x}^{(k+1)} = \frac{A \mathbf{x}^{(k)}}{\|A \mathbf{x}^{(k)}\|}. \end{align}\]
- Converges to the eigenvector of the eigenvalue with the largest magnitude, assuming it’s distinct.
Limitations
- Only computes one eigenvalue/eigenvector.
- Convergence can be slow if other eigenvalues have magnitude close to the dominant one.

def power_method(A, max_iter=1000, tol=1e-8):
    """
    Returns (lambda_dom, v_dom), the eigenvalue of largest magnitude and its eigenvector.
    """
    n = A.shape[0]
    v = np.ones(n)
    v /= np.linalg.norm(v)
    
    lam_old = 0.0
    for _ in range(max_iter):
        w = A @ v # matrix multiplication
        lam = np.linalg.norm(w)
        v = w / lam
        if abs(lam - lam_old) < tol:
            return lam, v
        lam_old = lam
    return lam, v

# Test the Power Method
A_test = np.array([[2, 1],
                   [1, 3]], dtype=float)
lam_dom, v_dom = power_method(A_test)
print("Dominant eigenvalue (Power Method):", lam_dom)
print("Corresponding eigenvector:", v_dom, end="\n\n")

# Compare with NumPy
vals, vecs = np.linalg.eig(A_test)
idx = np.argmax(np.abs(vals))
print("Numpy's dominant eigenvalue:", vals[idx])
print("Numpy's eigenvector:", vecs[:, idx])

Dominant eigenvalue (Power Method): 3.6180339883736066
Corresponding eigenvector: [0.52573618 0.85064767]

Numpy's dominant eigenvalue: 3.618033988749895
Numpy's eigenvector: [-0.52573111 -0.85065081]

QR Algorithm for Dense Matrices#

For dense matrices of moderate size, the \(QR\) algorithm (or related methods) is the workhorse for computing all eigenvalues (and optionally eigenvectors).

Idea:
- Repeatedly factor \(A\) as \(QR\) (\(Q\) orthonormal, \(R\) upper triangular).
- Set \(A \leftarrow RQ\).
- After enough iterations, \(A\) becomes near-upper-triangular, revealing the eigenvalues on its diagonal.
Practical Approach
- In Python, use numpy.linalg.eig or numpy.linalg.eigh (for Hermitian matrices).
- These functions wrap optimized LAPACK routines implementing \(QR\) or related transformations (e.g., divide-and-conquer, multiple relatively robust representations (MRRR)).

A = np.array([[4,  1,  2],
              [1,  3,  1],
              [2,  1,  5]], dtype=float)

vals, vecs = np.linalg.eig(A)
print("Eigenvalues:", vals)
print("Eigenvectors (columns):\n", vecs)

Eigenvalues: [7.04891734 2.30797853 2.64310413]
Eigenvectors (columns):
 [[ 0.59100905  0.73697623 -0.32798528]
 [ 0.32798528 -0.59100905 -0.73697623]
 [ 0.73697623 -0.32798528  0.59100905]]

Future Topics#

Singular Value Decomposition (SVD)#

The singular value decomposition (SVD) factorizes a matrix \(A\) into \(U \Sigma V^T\), where \(U\) and \(V\) are orthonormal matrices and \(\Sigma\) is diagonal with singular values. This decomposition describes how \(A\) stretches vectors in orthogonal directions and provides the best low-rank approximation of \(A\). In practice, the SVD is immensely valuable for data compression and noise reduction, especially in problems where \(A\) may be ill-conditioned. By leveraging its numerical stability, the SVD is frequently used to compute pseudoinverses in cases where direct inversion is error-prone.

Principal Component Analysis (PCA)#

Principal Component Analysis (PCA) is a data-driven method that identifies the directions of maximum variance in high-dimensional data, typically by applying the SVD to a mean-centered data matrix. In physics and related fields, PCA helps extract features and reduce noise in large experimental or simulation datasets. By finding a small number of orthogonal “principal components”, researchers can focus on dominant modes or patterns in the data, often leading to simpler models and clearer insights.

These topics will be explored in more detail later, when we delve into data modeling and machine learning. Both SVD and PCA are foundational tools for dimensionality reduction, feature engineering, and tackling ill-conditioned or noisy data challenges.