How to speed up slow matrix multiplication in SymPy?
I was writing a tool to solve specific recurrence equations with SymPy and found that one of the steps involved in matrix multiplication was extremely time consuming. For example, if I try the following in iPython console,
In [1]: from sympy import *
In [2]: A = Matrix(500, 500, lambda i,j: 2 + abs(i-j) if i-j in [-1, 0, 1] else 0)
In [3]: A[0:7, 0:7]
Out[3]:
Matrix([
[2, 3, 0, 0, 0, 0, 0],
[3, 2, 3, 0, 0, 0, 0],
[0, 3, 2, 3, 0, 0, 0],
[0, 0, 3, 2, 3, 0, 0],
[0, 0, 0, 3, 2, 3, 0],
[0, 0, 0, 0, 3, 2, 3],
[0, 0, 0, 0, 0, 3, 2]])
In [4]: A * A
...
I never waited for him to finish.
I appreciate that symbolic manipulation is significantly slower than numerical evaluation, but it seems pretty ridiculous. Using Octave or other linear algebra packages can do this calculation in a split second.
Does anyone have any experience with Sympy matrix class for ~ 1000 rows and columns with Rational records?
source to share
Arbitrary precision as mandatory, speed as pleasant to use
There are several pythonic classes for this purpose that may interest your arbitrary precision calculations.
-
decimal.Decimal()
-
fractions.Fraction( numerator = 0, denominator = 1 )
They are required for accurate computations, whether for large-scale astronomy, DEP modeling, or other fields where accuracy should not degrade along with advances in long-time / event / recursion computational strategies and similar threats to standard representation numbers.
Personally, I have worked with decimal.Decimal()
(up to 5,000,000 digits in the randomness analysis of a crypto sequence), but have not focused on getting the "em" inside . " numpy
Numpy can wrap these types in it matrix-dataStructures ( dtype = decimal.Decimal
syntax and others), however testing is required to check its vectorized function operations and overall speed after processing these fine pythonic instances.
Performance
As an initial observation of numpy speed on standard ("dense" sounds funny for this scale) 2x2 decimal.Decimal
-s:
>>> aClk.start();m*m;aClk.stop()
array([[Decimal('0.01524157875323883675019051999'),
Decimal('5.502209507697009702374335655')],
[Decimal('1.524157875323883675019051999'),
Decimal('11.94939027587381419881628113')]], dtype=object)
5732L # 5.7 msec_______________________________________________________________
While the same did not dtype = numpy.float96
take
>>> aClk.start();f*f;aClk.stop()
array([[ 0.042788046, 0.74206772],
[ 0.10081096, 0.46544855]], dtype=float96)
2979L # 2.9 msec_______________________________________________________________
For 500 x 500 fully filled dtype = fractions.Fraction
>>> aClk.start();M*M;aClk.stop()
array([[Fraction(9, 64), Fraction(1, 4), Fraction(64, 25), ...,
Fraction(64, 81), Fraction(16, 81), Fraction(36, 1)],
..,
[Fraction(1, 1), Fraction(9, 4), Fraction(4, 1), ...,
Fraction(1, 4), Fraction(25, 36), Fraction(1, 1)]], dtype=object)
2692088L # 2.7 sec_<<<_Fraction_______________________________vs. 19 msec float96
For 500 x 500 fully filled dense dtype = decimal.Decimal
>>> aClk.start();D*D;aClk.stop()
array([[Decimal('0.140625'), Decimal('0.25'), Decimal('2.56'), ...,
Decimal('0.7901234567901234567901234568'),
Decimal('0.1975308641975308641975308642'), Decimal('36')],
[Decimal('3.24'), Decimal('0.25'), Decimal('0.25'), ...,
Decimal('0.02040816326530612244897959185'), Decimal('0.04'),
Decimal('0.1111111111111111111111111111')],
[Decimal('0.1111111111111111111111111111'), Decimal('0.25'),
Decimal('2.25'), ..., Decimal('0.5102040816326530612244897959'),
Decimal('0.25'), Decimal('0.0625')],
...,
[Decimal('0'), Decimal('5.444444444444444444444444443'),
Decimal('16'), ..., Decimal('25'), Decimal('0.81'), Decimal('0.04')],
[Decimal('1'), Decimal('7.111111111111111111111111113'),
Decimal('1'), ..., Decimal('0'), Decimal('81'), Decimal('2.25')],
[Decimal('1'), Decimal('2.25'), Decimal('4'), ..., Decimal('0.25'),
Decimal('0.6944444444444444444444444444'), Decimal('1')]], dtype=object)
4789338L # 4.8 sec_<<<_Decimal_______________________________vs. 19 msec float96
2692088L # 2.7 sec_<<<_Fraction______________________________vs. 19 msec float96
Do waits for a 1000x1000 tridiagonal matrix in less than 50ms?
since there are 3000 non-zero (sparse) elements, as opposed to the 250,000 cells in the fully filled 500x500 matrices tested above, there is a huge performance leap for using these pythonic classes for arbitrary precision calculations. The fraction has more space where the engine can take advantage of the design numerator
/ denominator
for MUL / DIV operations over the decimal, however exact performance envelopes must be tested in-vivo, in real-life cases your computational approach is used.
Sympy Syntax for SparseMatrix
and 1000x1000 Tri-Diagonals Tests
the real 1000x1000 test as SparseMatrix
suggested by @tmyklebu took a little longer to get the details due to installation issues, however may give you some insight into the real world implementation. Projects:
>>> F = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} ) # .Fraction()
>>> D = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} ) # .Decimal()
>>> for i in range( 1000 ): # GEN to have F & D hold
... for j in range( 1000 ): # SAME values,
... if i-j in [-1,0,1]: # but DIFF representations
... num = int( 100 * numpy.random.random() ) #
... den = int( 100 * numpy.random.random() ) + 1 # + 1 to avoid DIV!0
... F[i,j] = fractions.Fraction( numerator = num, denominator = den )
... D[i,j] = decimal.Decimal( str( num ) ) / decimal.Decimal( str( den ) )
# called in Zig-Zag(F*F/D*D/F*F/D*D/...) order to avoid memory-access cache artifacts
>>> aClk.start();VOID=F*F;aClk.stop()
770353L # notice the 1st eval took TRIPLE LONGER
205585L # notice the 2nd+
205364L # 0.205 sec_<<<_Fraction()____________________________vs. 0.331 sec Decimal()
>>> aClk.start();VOID=D*D;aClk.stop()
383137L # 0.383 sec_<<<_Decimal()____________________________vs. 0.770 sec 1st Fraction()
390164L # 0.390 sec_<<<_Decimal()____________________________vs. 0.205 sec 2nd Fraction()
331291L # 0.331 sec_<<<_Decimal()____________________________vs. 0.205 sec 3rd Fraction()
>>> F[0:4,0:4]
Matrix([
[ 1/52, 6/23, 0, 0],
[42/29, 29/12, 1, 0],
[ 0, 57/88, 39/62, 13/57],
[ 0, 0, 34/83, 26/95]])
>>> D[0:4,0:4]
Matrix([
[0.0192307692307692, 0.260869565217391, 0, 0],
[ 1.44827586206897, 2.41666666666667, 1.0, 0],
[ 0, 0.647727272727273, 0.629032258064516, 0.228070175438596],
[ 0, 0, 0.409638554216867, 0.273684210526316]])
source to share
Why not use nice sparse matrices instead of dense matrices? The matrices that arise when solving (linear) repetitions are usually sparse; one method gives you a matrix with 1 on the first overrange and zeros everywhere except the bottom row where the repetition rates go.
source to share
I have not used it sympy
for matrix operations, but I can reproduce the slowness you encountered with this code. It seems that the matrix operations in sympy are not that big.
I will recommend you to use numpy
which has great matrix operations and is very fast. Here is a copy of your code in numpy that does multiplication in less than 1 second on my laptop:
In [1]: import numpy as np
In [2]: A = np.matrix([[2 + abs(i-j) if i-j in [-1, 0, 1] else 0 for i in range(0, 500)] for j in range(0, 500)])
In [3]: A[0:7,0:7]
Out[3]:
matrix([[2, 3, 0, 0, 0, 0, 0],
[3, 2, 3, 0, 0, 0, 0],
[0, 3, 2, 3, 0, 0, 0],
[0, 0, 3, 2, 3, 0, 0],
[0, 0, 0, 3, 2, 3, 0],
[0, 0, 0, 0, 3, 2, 3],
[0, 0, 0, 0, 0, 3, 2]])
In [4]: A * A
Out[4]:
matrix([[13, 12, 9, ..., 0, 0, 0],
[12, 22, 12, ..., 0, 0, 0],
[ 9, 12, 22, ..., 0, 0, 0],
...,
[ 0, 0, 0, ..., 22, 12, 9],
[ 0, 0, 0, ..., 12, 22, 12],
[ 0, 0, 0, ..., 9, 12, 13]])
source to share