Fastest way to construct sparse block matrix in python

134 Views Asked by At

I want to construct a matrix of shape (N,2N) in Python.

I can construct the matrix as follows

import numpy as np 

N = 10 # 10,100,1000, whatever

some_vector = np.random.uniform(size=N) 

some_matrix = np.zeros((N, 2*N))

for i in range(N):
    some_matrix[i, 2*i] = 1
    some_matrix[i, 2*i + 1] = some_vector[i]

So the result is a matrix of mostly zeros, but where on row i, the 2*i and 2*i + 1 columns are populated.

Is there a faster way to construct the matrix without the loop? Feels like there should be some broadcasting operation...

Edit: There have been some very fast, great answers! I am going to extend the question a touch to my actual use case.

Now suppose some_vector has a shape (N,T). I want to construct a matrix of shape (N,2*N,T) analogously to the previous case. The naive approach is:

N = 10  # 10,100,1000, whatever
T = 500 # or whatever

some_vector = np.random.uniform(size=(N,T)) 

some_matrix = np.zeros((N, 2*N,T))

for i in range(N):
    for t in range(T):
        some_matrix[i, 2*i,t] = 1
        some_matrix[i, 2*i + 1,t] = some_vector[i,t]

Can we extend the previous answers to this new case?

3

There are 3 best solutions below

0
Brian61354270 On BEST ANSWER

Variant 1

You can replace the loop with a broadcasting assignment, which interleaves the columns of an identity matrix with the columns of a diagonal matrix:

a = np.eye(N)
b = np.diag(some_vector)

c = np.empty((N, 2*N))
c[:, 0::2] = a
c[:, 1::2] = b

This is concise, but not optimal. This requires allocating some unnecessary intermediate matrices (a and b), which becomes increasing expensive as N grows larges. This also involves performing random-access assignments to every position in c, instead of just the non-zero entries.

This may be faster or slower than the implementation in the question for different values of N.

Variant 2

Similar idea Variant 1, but only performs a single allocation and avoids unnecessary assignments to zero entries.

We create a single vector of size 2*N**2, which essentially represents the rows of some_matrix concatenated with one another. The non-zero positions are populated using broadcasting assignments. We then create an (N, 2*N) view into this vector using np.ndarray.reshape.

some_matrix = np.zeros(2 * n**2)

step = 2 * (n + 1)

some_matrix[::step] = 1
some_matrix[1::step] = some_vector
some_matrix = some_matrix.reshape(n, 2 * n)

Performance Comparison

For relatively small N (N=100):

  • Variant 1 is about 40% faster than the Python loop (12 us vs 21 us)
  • Variant 2 is about 80% faster than the Python loop (4 us vs 21 us)

For large N (N=10_000):

  • Variant 1 is slower than the Python loop.
  • Variant 2 is about 20% faster than the Python loop (10 us vs 13 us)

Setup:

import numpy as np


def original(n):
    some_vector = np.random.uniform(size=n)
    some_matrix = np.zeros((n, 2 * n))

    for i in range(n):
        some_matrix[i, 2 * i] = 1
        some_matrix[i, 2 * i + 1] = some_vector[i]


def variant_1(n):
    some_vector = np.random.uniform(size=n)

    a = np.eye(n)
    b = np.diag(some_vector)
    
    c = np.empty((n, 2*n))
    c[:, 0::2] = a
    c[:, 1::2] = b


def variant_2(n):
    some_vector = np.random.uniform(size=n)
    
    some_matrix = np.zeros(2 * n**2)
    step = 2 * (n + 1)
    some_matrix[::step] = 1
    some_matrix[1::step] = some_vector
    some_matrix = some_matrix.reshape(n, 2*n)

Timing at N=100:

%timeit -n1000  original(100)
21.1 µs ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit -n1000  variant_1(100)
12.2 µs ± 431  ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit -n1000  variant_2(100)
4.37 µs ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Timing at N=1_000

%timeit -n100  original(1_000)
631 µs ± 98.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit -n100  variant_1(1_000)
5.24 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit -n100  variant_2(1_000)
408 µs ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Timing at N=10_000

%timeit -n100  original(10_000)
12.6 ms ± 225 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit -n100  variant_2(10_000)
10.1 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Tested using Python 3.10.12 and Numpy v1.26.0.

0
Andrej Kesely On

Another approach:

some_matrix = np.zeros(N * 2 * N)

a = np.arange(0, N * 2 * N, 2 * N) + np.arange(0, N) * 2

some_matrix[a] = 1
some_matrix[a + 1] = some_vector

print(some_matrix.reshape((N, 2 * N)))

Benchmark (including versions):

import numpy as np
import pandas as pd
import perfplot
from matplotlib import pyplot as plt
from numba import njit, prange

plt.rcParams["figure.autolayout"] = True
np.random.seed(0)


def get_matrix(some_vector):
    N = len(some_vector)
    some_matrix = np.zeros((N, 2 * N))

    for i in range(N):
        some_matrix[i, 2 * i] = 1
        some_matrix[i, 2 * i + 1] = some_vector[i]
    return some_matrix


def get_matrix_brian_1(some_vector):
    N = len(some_vector)

    a = np.eye(N)
    b = np.diag(some_vector)

    c = np.empty((N, 2 * N))
    c[:, 0::2] = a
    c[:, 1::2] = b
    return c


def get_matrix_brian_2(some_vector):
    n = len(some_vector)

    some_matrix = np.zeros(2 * n**2)

    step = 2 * (n + 1)

    some_matrix[::step] = 1
    some_matrix[1::step] = some_vector
    return some_matrix.reshape(n, 2 * n)


def get_matrix_andrej(some_vector):
    N = len(some_vector)
    some_matrix = np.zeros(N * 2 * N)

    a = np.arange(0, N * 2 * N, 2 * N) + np.arange(0, 2 * N, 2)

    some_matrix[a] = 1
    some_matrix[a + 1] = some_vector

    return some_matrix.reshape((N, 2 * N))


@njit
def get_matrix_andrej_numba(some_vector):
    N = len(some_vector)
    some_matrix = np.zeros((N, 2 * N))

    for i in range(N):
        some_matrix[i, 2 * i] = 1
        some_matrix[i, 2 * i + 1] = some_vector[i]
    return some_matrix


@njit(parallel=True)
def get_matrix_andrej_numba_parallel(some_vector):
    N = len(some_vector)
    some_matrix = np.zeros((N, 2 * N))

    for i in prange(N):
        some_matrix[i, 2 * i] = 1
        some_matrix[i, 2 * i + 1] = some_vector[i]
    return some_matrix


# compile numba functions
get_matrix_andrej_numba(np.array([1, 2, 3]))
get_matrix_andrej_numba_parallel(np.array([1, 2, 3]))


perfplot.show(
    setup=lambda n: np.random.uniform(size=n),
    kernels=[
        get_matrix,
        get_matrix_andrej,
        get_matrix_brian_1,
        get_matrix_brian_2,
        get_matrix_andrej_numba,
        get_matrix_andrej_numba_parallel,
    ],
    labels=[
        "get_matrix",
        "get_matrix_andrej",
        "get_matrix_brian_1",
        "get_matrix_brian_2",
        "get_matrix_andrej_numba",
        "get_matrix_andrej_numba_parallel",
    ],
    n_range=[10, 50, 100, 250, 500, 1000, 2_500, 5_000],
    xlabel="N",
    logx=True,
    logy=True,
)

Creates this graph: enter image description here

0
Nick ODell On

Given that most elements of this array are zero, you could save a lot of CPU and memory if you only stored the non-zero elements.

You could do this with either SciPy or Sparse. (SciPy has the advantage of supporting more types of operations. Sparse has the advantage of supporting 3D arrays.)

2D example:

import scipy.sparse


def get_matrix_njo(some_vector):
    N = len(some_vector)
    data = np.empty(N * 2)
    data[0::2] = 1
    data[1::2] = some_vector
    cols = np.arange(N * 2, dtype='int32')
    rows = cols // 2
    some_matrix = scipy.sparse.coo_matrix((data, (rows, cols)), shape=(N, 2*N))
    return some_matrix

Whether this is faster or slower depends on whether you have to convert the sparse result of this to a dense matrix.

If you do convert to dense, then it is somewhat slower than the other two answers.

If you can keep it as a sparse matrix, it is much is faster than the other answers for large matrices, as it runs in O(N) time and space, where the other answers need O(N2) time and space.

Next, I looked at how to extend this to a 3D array. SciPy does not support 3D arrays, but another package, Sparse, does.

3D example:

from sparse import COO

def make_matrix_3d_njo(some_vector):
    N, T = some_vector.shape
    number_non_zero_entries = 2 * N * T
    data = np.zeros((N, 2 * T))
    coords = np.zeros((3, 2 * N, T), dtype='int32')
    c1 = np.arange(2 * N, dtype='int32')
    c0 = c1 // 2
    c2 = np.arange(T, dtype='int32')
    coords[0] = c0[:, None]
    coords[1] = c1[:, None]
    coords[2] = c2[None, :]
    coords = coords.reshape(3, number_non_zero_entries)
    data[:, :T] = 1
    data[:, T:] = some_vector
    data = data.flatten()
    return COO(coords, data, shape=(N, 2 * N, T))