Efficient Implementation of Matrix Chain Multiplication: From Triple Loops to Vectorization and Parenthesis Recovery
Matrix Chain Multiplication is a classical dynamic programming problem that aims to determine the optimal parenthesization of a matrix product sequence such that the total number of scalar multiplications is minimized. This article begins with the mathematical formulation and naive triple-loop implementation, then introduces a NumPy-based vectorized approach, and finally extends the implementation with parenthesis reconstruction support.
Given a sequence of matrices
Define
With base condition:
The final result is ( m[0][n-1] ), the minimum cost to multiply the entire matrix chain.
import numpy as np
def matrix_chain_order_naive(p):
n = len(p) - 1
m = np.zeros((n, n))
for l in range(2, n + 1):
for i in range(n - l + 1):
j = i + l - 1
m[i][j] = np.inf
for k in range(i, j):
cost = m[i][k] + m[k+1][j] + p[i] * p[k+1] * p[j+1]
if cost < m[i][j]:
m[i][j] = cost
return mWe can vectorize the inner k loop by precomputing all possible values and using np.min to select the best cost:
def matrix_chain_order_vectorized(p):
p = np.array(p) # Convert to numpy array for vectorized indexing
n = len(p) - 1
m = np.full((n, n), np.inf)
np.fill_diagonal(m, 0)
for l in range(2, n + 1):
for i in range(n - l + 1):
j = i + l - 1
k_vals = np.arange(i, j)
left = m[i, k_vals]
right = m[k_vals + 1, j]
mult_cost = p[i] * p[k_vals + 1] * p[j + 1]
total_cost = left + right + mult_cost
m[i, j] = np.min(total_cost)
return mAn important question arises: Why doesn't the naive version need to convert p to a numpy array, while the vectorized version requires p = np.array(p)?
# In the naive version:
cost = m[i][k] + m[k+1][j] + p[i] * p[k+1] * p[j+1]
# ^^^^ ^^^^^^ ^^^^^^
# All are single integer indicesp[i],p[k+1],p[j+1]are all scalar indexing operations- Each index is a single integer:
p[0],p[1],p[2], etc. - Python lists natively support scalar indexing - no conversion needed
# In the vectorized version:
k_vals = np.arange(i, j) # e.g., [0, 1, 2]
mult_cost = p[i] * p[k_vals + 1] * p[j + 1]
# ^^^^^^^^^^^
# This is an array: [1, 2, 3]k_vals + 1produces a numpy array like[1, 2, 3]p[k_vals + 1]is array indexing - using an array as the index- This means: "get
p[1],p[2],p[3]all at once" - Python lists don't support array indexing - only numpy arrays do
import numpy as np
# Test data
p_list = [1, 2, 3, 4, 5]
p_array = np.array([1, 2, 3, 4, 5])
# Scalar indexing - works with both
print(p_list[2]) # ✓ Works: 3
print(p_array[2]) # ✓ Works: 3
# Array indexing - only works with numpy arrays
indices = np.array([1, 2, 3])
print(p_list[indices]) # ✗ TypeError: only integer scalar arrays can be converted to a scalar index
print(p_array[indices]) # ✓ Works: [2 3 4]The core idea of vectorization is to replace loops with array operations:
# Instead of this loop:
for k in range(i, j):
result.append(p[k + 1])
# We do this array operation:
k_vals = np.arange(i, j)
result = p[k_vals + 1] # Get all values at onceThis simultaneous access to multiple elements is what makes vectorization faster - but it requires numpy arrays that support array indexing.
To reconstruct the parenthesis structure, we store the optimal k that yields the minimum cost:
def matrix_chain_order_optimized(p):
p = np.array(p) # Convert to numpy array for vectorized indexing
n = len(p) - 1
m = np.full((n, n), np.inf)
s = np.full((n, n), -1, dtype=int)
np.fill_diagonal(m, 0)
for l in range(2, n + 1):
for i in range(n - l + 1):
j = i + l - 1
k_vals = np.arange(i, j)
left = m[i, k_vals]
right = m[k_vals + 1, j]
mult_cost = p[i] * p[k_vals + 1] * p[j + 1]
total_cost = left + right + mult_cost
best_k_idx = np.argmin(total_cost)
m[i, j] = total_cost[best_k_idx]
s[i, j] = k_vals[best_k_idx]
return m, sdef build_optimal_parens(s, i, j):
if i == j:
return f"A{i+1}"
else:
k = s[i][j]
left = build_optimal_parens(s, i, k)
right = build_optimal_parens(s, k + 1, j)
return f"({left} x {right})"p = [30, 35, 15, 5, 10, 20, 25]
m, s = matrix_chain_order_optimized(p)
print("Minimum multiplication cost:", m[0, len(p)-2])
print("Optimal parenthesization:", build_optimal_parens(s, 0, len(p)-2))| Feature | Triple Loop | Vectorized (k loop) |
|---|---|---|
| Implementation Complexity | Simple and direct | Slightly more complex |
| Inner Loop Performance | Slow in Python | Faster with NumPy vector ops |
| Extendibility | Easy to add path tracking | Achievable via separate matrix |
| Parallelization Potential | Limited | High (e.g., JAX, Numba) |
| Array Conversion Required | No (scalar indexing) | Yes (array indexing) |
We demonstrated a complete optimization path for Matrix Chain Multiplication: from mathematical modeling to efficient vectorized implementation with path reconstruction. By identifying the structure of the recurrence relation, we directly constructed vector operations, avoiding inefficient Python-level loops.
A key insight is understanding the difference between scalar and array indexing - naive implementations using scalar indexing can work with Python lists, while vectorized implementations require numpy arrays to support simultaneous access to multiple elements. This method is widely applicable to other DP problems with similar repetitive structure.