Efficient Implementation of Matrix Chain Multiplication: From Triple Loops to Vectorization and Parenthesis Recovery

Abstract

Matrix Chain Multiplication is a classical dynamic programming problem that aims to determine the optimal parenthesization of a matrix product sequence such that the total number of scalar multiplications is minimized. This article begins with the mathematical formulation and naive triple-loop implementation, then introduces a NumPy-based vectorized approach, and finally extends the implementation with parenthesis reconstruction support.

1. Problem Description and Mathematical Model

Given a sequence of matrices $A_1, A_2, \dots, A_n$, where matrix $A_i$ has dimensions $p_{i-1} imes p_i$, the goal is to determine the optimal order of multiplication using the associativity of matrix multiplication.

Define $m[i][j]$ as the minimum number of scalar multiplications required to compute the product $A_i \cdot A_{i+1} \cdots A_j$. The recurrence relation is:

$$m[i][j] = \min_{i \leq k < j} \left\{ m[i][k] + m[k+1][j] + p_i \cdot p_{k+1} \cdot p_{j+1} \right\}$$

With base condition:

$$m[i][i] = 0, \quad orall i$$

The final result is ( m[0][n-1] ), the minimum cost to multiply the entire matrix chain.

2. Naive Triple Loop Implementation

import numpy as np

def matrix_chain_order_naive(p):
    n = len(p) - 1
    m = np.zeros((n, n))
    
    for l in range(2, n + 1):
        for i in range(n - l + 1):
            j = i + l - 1
            m[i][j] = np.inf
            for k in range(i, j):
                cost = m[i][k] + m[k+1][j] + p[i] * p[k+1] * p[j+1]
                if cost < m[i][j]:
                    m[i][j] = cost
    return m

3. Vectorized Optimization

We can vectorize the inner k loop by precomputing all possible values and using np.min to select the best cost:

def matrix_chain_order_vectorized(p):
    p = np.array(p)  # Convert to numpy array for vectorized indexing
    n = len(p) - 1
    m = np.full((n, n), np.inf)
    np.fill_diagonal(m, 0)

    for l in range(2, n + 1):
        for i in range(n - l + 1):
            j = i + l - 1
            k_vals = np.arange(i, j)
            left = m[i, k_vals]
            right = m[k_vals + 1, j]
            mult_cost = p[i] * p[k_vals + 1] * p[j + 1]
            total_cost = left + right + mult_cost
            m[i, j] = np.min(total_cost)
    
    return m

4. Key Implementation Detail: Scalar vs Array Indexing

An important question arises: Why doesn't the naive version need to convert p to a numpy array, while the vectorized version requires p = np.array(p)?

The Answer: Different Indexing Patterns

Naive Version Uses Scalar Indexing

# In the naive version:
cost = m[i][k] + m[k+1][j] + p[i] * p[k+1] * p[j+1]
#                              ^^^^   ^^^^^^   ^^^^^^
#                              All are single integer indices

p[i], p[k+1], p[j+1] are all scalar indexing operations
Each index is a single integer: p[0], p[1], p[2], etc.
Python lists natively support scalar indexing - no conversion needed

Vectorized Version Uses Array Indexing

# In the vectorized version:
k_vals = np.arange(i, j)  # e.g., [0, 1, 2]
mult_cost = p[i] * p[k_vals + 1] * p[j + 1]
#                    ^^^^^^^^^^^
#                    This is an array: [1, 2, 3]

k_vals + 1 produces a numpy array like [1, 2, 3]
p[k_vals + 1] is array indexing - using an array as the index
This means: "get p[1], p[2], p[3] all at once"
Python lists don't support array indexing - only numpy arrays do

Demonstration with Examples

import numpy as np

# Test data
p_list = [1, 2, 3, 4, 5]
p_array = np.array([1, 2, 3, 4, 5])

# Scalar indexing - works with both
print(p_list[2])   # ✓ Works: 3
print(p_array[2])  # ✓ Works: 3

# Array indexing - only works with numpy arrays
indices = np.array([1, 2, 3])
print(p_list[indices])   # ✗ TypeError: only integer scalar arrays can be converted to a scalar index
print(p_array[indices])  # ✓ Works: [2 3 4]

Why This Matters for Vectorization

The core idea of vectorization is to replace loops with array operations:

# Instead of this loop:
for k in range(i, j):
    result.append(p[k + 1])

# We do this array operation:
k_vals = np.arange(i, j)
result = p[k_vals + 1]  # Get all values at once

This simultaneous access to multiple elements is what makes vectorization faster - but it requires numpy arrays that support array indexing.

5. Further Optimization with Path Recovery

To reconstruct the parenthesis structure, we store the optimal k that yields the minimum cost:

def matrix_chain_order_optimized(p):
    p = np.array(p)  # Convert to numpy array for vectorized indexing
    n = len(p) - 1
    m = np.full((n, n), np.inf)
    s = np.full((n, n), -1, dtype=int)
    np.fill_diagonal(m, 0)

    for l in range(2, n + 1):
        for i in range(n - l + 1):
            j = i + l - 1
            k_vals = np.arange(i, j)
            left = m[i, k_vals]
            right = m[k_vals + 1, j]
            mult_cost = p[i] * p[k_vals + 1] * p[j + 1]
            total_cost = left + right + mult_cost
            best_k_idx = np.argmin(total_cost)
            m[i, j] = total_cost[best_k_idx]
            s[i, j] = k_vals[best_k_idx]
    
    return m, s

Function to build parenthesis structure:

def build_optimal_parens(s, i, j):
    if i == j:
        return f"A{i+1}"
    else:
        k = s[i][j]
        left = build_optimal_parens(s, i, k)
        right = build_optimal_parens(s, k + 1, j)
        return f"({left} x {right})"

Example:

p = [30, 35, 15, 5, 10, 20, 25]
m, s = matrix_chain_order_optimized(p)

print("Minimum multiplication cost:", m[0, len(p)-2])
print("Optimal parenthesization:", build_optimal_parens(s, 0, len(p)-2))

6. Structural and Performance Comparison

Feature	Triple Loop	Vectorized (k loop)
Implementation Complexity	Simple and direct	Slightly more complex
Inner Loop Performance	Slow in Python	Faster with NumPy vector ops
Extendibility	Easy to add path tracking	Achievable via separate matrix
Parallelization Potential	Limited	High (e.g., JAX, Numba)
Array Conversion Required	No (scalar indexing)	Yes (array indexing)

Conclusion

We demonstrated a complete optimization path for Matrix Chain Multiplication: from mathematical modeling to efficient vectorized implementation with path reconstruction. By identifying the structure of the recurrence relation, we directly constructed vector operations, avoiding inefficient Python-level loops.

A key insight is understanding the difference between scalar and array indexing - naive implementations using scalar indexing can work with Python lists, while vectorized implementations require numpy arrays to support simultaneous access to multiple elements. This method is widely applicable to other DP problems with similar repetitive structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient Implementation of Matrix Chain Multiplication: From Triple Loops to Vectorization and Parenthesis Recovery

Abstract

1. Problem Description and Mathematical Model

2. Naive Triple Loop Implementation

3. Vectorized Optimization

4. Key Implementation Detail: Scalar vs Array Indexing

The Answer: Different Indexing Patterns

Naive Version Uses Scalar Indexing

Vectorized Version Uses Array Indexing

Demonstration with Examples

Why This Matters for Vectorization

5. Further Optimization with Path Recovery

Function to build parenthesis structure:

Example:

6. Structural and Performance Comparison

Conclusion

FilesExpand file tree

2.Vectorize_triple_loops.md

Latest commit

History

2.Vectorize_triple_loops.md

File metadata and controls

Efficient Implementation of Matrix Chain Multiplication: From Triple Loops to Vectorization and Parenthesis Recovery

Abstract

1. Problem Description and Mathematical Model

2. Naive Triple Loop Implementation

3. Vectorized Optimization

4. Key Implementation Detail: Scalar vs Array Indexing

The Answer: Different Indexing Patterns

Naive Version Uses Scalar Indexing

Vectorized Version Uses Array Indexing

Demonstration with Examples

Why This Matters for Vectorization

5. Further Optimization with Path Recovery

Function to build parenthesis structure:

Example:

6. Structural and Performance Comparison

Conclusion