97. SciPy Scientific Computing Basics#
97.1. Introduction#
SciPy is an open-source library for mathematics, science, and engineering, integrating modules such as statistics, optimization, linear algebra, Fourier transforms, signal and image processing, and ODE solvers. It is one of the important tools for scientific computing using Python. This course will introduce you to the basic usage of SciPy.
97.2. Key Points#
- Constants module 
- Linear algebra 
- Interpolation functions 
- Image processing 
- Optimization methods 
- Signal processing 
- Statistical functions 
- Sparse matrices 
 
                  SciPy generally represents an open-source software ecosystem for mathematical, scientific, and engineering development using Python, which includes a series of core libraries and tools such as NumPy, Pandas, Matplotlib, SymPy, IPython, etc. These tools also belong to the sponsored projects of NumFOCUS.
Of course, the SciPy mentioned in this chapter specifically refers to the SciPy core library, which is built on top of NumPy and inherits NumPy’s efficient computing capabilities for multi-dimensional data. Therefore, before learning the content of this section, it is recommended that you first study the NumPy Basic Course on Numerical Computing.
First, we import SciPy and check the version number:
import scipy
scipy.__version__
                        '1.13.1'
                        In a sense, SciPy can be regarded as an extension and supplement to NumPy. If you read the official SciPy documentation, you will also find that it mentions a lot of NumPy usage. Next, we will try to understand and master the unique content in the SciPy library and learn the commonly used modules and classes.
97.3. Constants Module#
                    For the convenience of scientific computing, SciPy provides
                    a module called
                    scipy.constants, which contains common physical and mathematical constants
                    and units. You can view these constants and units through
                    the link given above. Here we give a few examples.
                  
For example, the mathematical constants of pi and the golden ratio.
from scipy import constants
constants.pi
                        3.141592653589793
                        constants.golden
                        1.618033988749895
                        Constants such as the speed of light in a vacuum and Planck’s constant, which are often used in physics.
constants.c, constants.speed_of_light  # 两种写法
                        (299792458.0, 299792458.0)
                        constants.h, constants.Planck
                        (6.62607015e-34, 6.62607015e-34)
                        Regarding these constants, you don’t need to memorize each one. In fact, their names are logical and regular, mostly abbreviations of common English letters. When actually using them, just refer to the official documentation.
97.4. Linear Algebra#
                    Linear algebra should be one of the most frequently involved
                    computational methods in scientific computing. SciPy
                    provides detailed and comprehensive linear algebra
                    computational functions. These functions are basically
                    placed under the module
                    scipy.linalg. Among them, they are roughly divided into several
                    categories: basic solution methods, eigenvalue problems,
                    matrix decompositions, matrix functions, matrix equation
                    solving, special matrix construction, etc. Next, we will try
                    to use the relevant linear algebra solution functions in
                    SciPy.
                  
                    For example, to find the inverse of a given matrix, we can
                    use the
                    scipy.linalg.inv
                    function. What is generally passed into the function is of
                    the type of the array
                    np.array
                    or matrix
                    np.matrix
                    given by NumPy.
                  
import numpy as np
from scipy import linalg
linalg.inv(np.matrix([[1, 2], [3, 4]]))
                        array([[-2. ,  1. ],
       [ 1.5, -0.5]])
                        
                    Singular value decomposition should be a pain point for
                    everyone in the process of learning linear algebra. Using
                    the
                    scipy.linalg.svd
                    function provided by SciPy can very conveniently complete
                    this process. For example, we perform singular value
                    decomposition on a
                    \(5 \times 4\)
                    random matrix.
                  
U, s, Vh = linalg.svd(np.random.randn(5, 4))
U, s, Vh
                        (array([[-0.75119934,  0.39250178, -0.07910727,  0.47990549, -0.21230796],
        [-0.14271361, -0.60602855, -0.76526919,  0.1603031 ,  0.03206556],
        [ 0.42156309,  0.46901068, -0.48956675, -0.06906547, -0.59822061],
        [-0.41860256,  0.2416375 , -0.27300769, -0.8046263 ,  0.21077602],
        [ 0.24977758,  0.44756233, -0.30642504,  0.30298534,  0.7426996 ]]),
 array([3.35911649, 2.61360688, 1.30618213, 0.29283192]),
 array([[-0.20372706, -0.39790835,  0.04096205,  0.89358063],
        [ 0.3559683 ,  0.82078498, -0.00236764,  0.4467583 ],
        [ 0.77481996, -0.33587023,  0.53557332,  0.00253815],
        [ 0.48107784, -0.23488751, -0.84349139,  0.04375207]]))
                        
                    Finally, the unitary matrices
                    U
                    and
                    Vh, and the singular values
                    s
                    are returned.
                  
                    In addition,
                    scipy.linalg
                    also includes functions such as the least squares solution
                    function
                    scipy.linalg.lstsq. Now try to use it to complete a least squares solution
                    process.
                  
First, we give the \(x\) and \(y\) values of the samples. Then assume that they conform to the distribution of \(y = ax^2 + b\).
x = np.array([1, 2.5, 3.5, 4, 5, 7, 8.5])
y = np.array([0.3, 1.1, 1.5, 2.0, 3.2, 6.6, 8.6])
                        Next, we calculate \(x^2\) and add the intercept term coefficient 1.
M = x[:, np.newaxis]**[0, 2]
M
                        array([[ 1.  ,  1.  ],
       [ 1.  ,  6.25],
       [ 1.  , 12.25],
       [ 1.  , 16.  ],
       [ 1.  , 25.  ],
       [ 1.  , 49.  ],
       [ 1.  , 72.25]])
                        
                    Then use
                    linalg.lstsq
                    to perform the least squares calculation, and the first set
                    of parameters returned is the fitting coefficient.
                  
p = linalg.lstsq(M, y)[0]
p
                        array([0.20925829, 0.12013861])
                        We can plot a graph to check whether the parameters obtained by the least squares method are reasonable, and draw the sample and fitting curve graphs.
from matplotlib import pyplot as plt
%matplotlib inline
plt.scatter(x, y)
xx = np.linspace(0, 10, 100)
yy = p[0] + p[1]*xx**2
plt.plot(xx, yy)
                        
                    The module
                    scipy.linalg
                    (https://docs.scipy.org/doc/scipy/reference/linalg.html) contains many methods related to linear algebra
                    operations. I hope everyone can learn about them by reading
                    the official documentation.
                  
97.5. Interpolation Function#
                    Interpolation is a process or method in the field of
                    numerical analysis to deduce new data points within a range
                    through known and discrete data points. The module
                    scipy.interpolate
                    (https://docs.scipy.org/doc/scipy/reference/interpolate.html) provided by SciPy contains a large number of mathematical
                    interpolation methods, covering a very comprehensive range.
                  
Next, we introduce the process of performing linear interpolation using SciPy, which is also the simplest one among the interpolation methods. First, we give a set of values for \(x\) and \(y\).
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
plt.scatter(x, y)
                        Next, we want to insert another value between the two points above. How can we best reflect the distribution trend of the data? At this time, the method of linear interpolation can be used.
from scipy import interpolate
xx = np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5])  # 两点之间的点的 x 坐标
f = interpolate.interp1d(x, y)  # 使用原样本点建立插值函数
yy = f(xx)  # 映射到新样本点
plt.scatter(x, y)
plt.scatter(xx, yy, marker='*')
                        It can be seen that the asterisk point is the point for our interpolation, while the dot points are the original sample points, and the interpolated point can accurately conform to the trend of the known discrete data points.
There are a very rich variety of mathematical interpolation methods. For details, please read the various methods listed in the official SciPy documentation. In fact, if you have studied The Basics of Pandas Data Processing, then you should know that mathematical interpolation methods can be used when Pandas fills in missing values, and the interpolation methods called by Pandas come from SciPy.
97.6. Image Processing#
                    Interestingly, SciPy integrates a large number of functions
                    and methods for image processing. Of course, a color image
                    is composed of RGB channels, which is actually a
                    multi-dimensional array. Therefore, the module for image
                    processing in SciPy,
                    scipy.ndimage, is actually a process for processing multi-dimensional
                    arrays, and you can complete a series of operations such as
                    convolution, filtering, and transformation.
                  
                     
                  
                    Before formally understanding the
                    scipy.ndimage
                    module, we first import an example image of a raccoon using
                    the
                    face
                    method in the
                    scipy.misc
                    module.
                    scipy.misc
                    is a miscellaneous module that contains some methods that
                    cannot be accurately classified.
                  
from scipy import misc
face = misc.face()
face
                        array([[[121, 112, 131],
        [138, 129, 148],
        [153, 144, 165],
        ...,
        [119, 126,  74],
        [131, 136,  82],
        [139, 144,  90]],
       [[ 89,  82, 100],
        [110, 103, 121],
        [130, 122, 143],
        ...,
        [118, 125,  71],
        [134, 141,  87],
        [146, 153,  99]],
       [[ 73,  66,  84],
        [ 94,  87, 105],
        [115, 108, 126],
        ...,
        [117, 126,  71],
        [133, 142,  87],
        [144, 153,  98]],
       ...,
       [[ 87, 106,  76],
        [ 94, 110,  81],
        [107, 124,  92],
        ...,
        [120, 158,  97],
        [119, 157,  96],
        [119, 158,  95]],
       [[ 85, 101,  72],
        [ 95, 111,  82],
        [112, 127,  96],
        ...,
        [121, 157,  96],
        [120, 156,  94],
        [120, 156,  94]],
       [[ 85, 101,  74],
        [ 97, 113,  84],
        [111, 126,  97],
        ...,
        [120, 156,  95],
        [119, 155,  93],
        [118, 154,  92]]], dtype=uint8)
                        
                    By default,
                    face
                    is an RGB array of the image, and we can visualize and
                    restore it.
                  
                    Next, we try some image processing methods in
                    scipy.ndimage. For example, we perform Gaussian blur on the image.
                  
from scipy import ndimage
plt.imshow(ndimage.gaussian_filter(face, sigma=5))
                        Perform a rotation transformation on the image.
                    Or perform a convolution operation on the image. First,
                    randomly define a convolution kernel
                    k, and then perform the convolution.
                  
k = np.random.randn(2, 2, 3)
plt.imshow(ndimage.convolve(face, k))
                        For more operations on multi-dimensional image arrays, please practice referring to the official documentation.
97.7. Optimization Methods#
Optimization is a branch of applied mathematics. Generally, we need to minimize (or maximize) an objective function, and the feasible solution found is also called the optimal solution. The idea of optimization is very common in the field of engineering applications. For example, machine learning algorithms all have an objective function, which we also call the loss function. And the process of finding the minimum value of the loss function is the optimization process. We may use optimization methods such as the least squares method, gradient descent method, Newton’s method, etc. to complete it.
                    The
                    scipy.optimize
                    module provided by SciPy contains a large number of
                    pre-written optimization methods. For example, the least
                    squares function
                    scipy.linalg.lstsq
                    we used above also has a very similar function
                    scipy.optimize.least_squares
                    in the
                    scipy.optimize
                    module. This function can solve non-linear least squares
                    problems.
                  
                    The most commonly used function in the
                    scipy.optimize
                    module is
                    scipy.optimize.minimize, because a large number of optimization methods can be
                    selected just by specifying parameters. For example,
                    scipy.optimize.minimize(method="BFGS")
                    for the quasi-Newton method.
                  
                    Next, we use the same data as in the demonstration of
                    scipy.linalg.lstsq
                    above and use the least squares method of
                    scipy.linalg.lstsq
                    to handle the least squares calculation process. It will be
                    a bit more troublesome here. First, define the fitting
                    function
                    func
                    and the residual function
                    err_func. In fact, we need to solve the minimum value of the
                    residual function.
                  
def func(p, x):
    w0, w1 = p
    f = w0 + w1*x*x
    return f
def err_func(p, x, y):
    ret = func(p, x) - y
    return ret
                        from scipy.optimize import leastsq
p_init = np.random.randn(2)  # 生成 2 个随机数
x = np.array([1, 2.5, 3.5, 4, 5, 7, 8.5])
y = np.array([0.3, 1.1, 1.5, 2.0, 3.2, 6.6, 8.6])
# 使用 Scipy 提供的最小二乘法函数得到最佳拟合参数
parameters = leastsq(err_func, p_init, args=(x, y))
parameters[0]
                        array([0.20925826, 0.12013861])
                        
                    Without accident, the results obtained here are exactly the
                    same as those obtained by
                    scipy.linalg.lstsq
                    above.
                  
97.8. Signal Processing#
Signal Processing (English: Signal processing) refers to the process of processing signal representation, transformation, operation, etc. It is widely used in disciplines such as computer science, pharmaceutical analysis, and electronics. For decades, signal processing has played a key role in a wide range of fields such as voice and data communication, biomedical engineering, acoustics, sonar, radar, seismology, petroleum exploration, instrumentation, robotics, consumer electronics, and many others.
                    The related methods for signal processing in SciPy are in
                    the
                    scipy.signal
                    module, which is further divided into 13 subcategories such
                    as convolution, B-splines, filtering, window functions, peak
                    finding, spectral analysis, etc., with a total of more than
                    a hundred different functions and methods. Therefore, signal
                    processing is one of the very important modules in SciPy.
                  
First, we try to generate some regular waveform diagrams, such as sawtooth waves, square waves, etc.
from scipy import signal
t = np.linspace(-1, 1, 100)
fig, axes = plt.subplots(1, 3, figsize=(12, 3))
axes[0].plot(t, signal.gausspulse(t, fc=5, bw=0.5))
axes[0].set_title("gausspulse")
t *= 5*np.pi
axes[1].plot(t, signal.sawtooth(t))
axes[1].set_title("chirp")
axes[2].plot(t, signal.square(t))
axes[2].set_title("gausspulse")
                        Next, let’s try to apply several filtering functions. First, generate a set of waveform diagrams containing noise.
def f(t): return np.sin(np.pi*t) + 0.1*np.cos(7*np.pi*t+0.3) + \
    0.2 * np.cos(24*np.pi*t) + 0.3*np.cos(12*np.pi*t+0.5)
t = np.linspace(0, 4, 400)
plt.plot(t, f(t))
                        
                    Then, we use the median filtering function
                    scipy.signal.medfilt
                    and the Wiener filtering function
                    scipy.signal.wiener
                    in SciPy.
                  
plt.plot(t, f(t))
plt.plot(t, signal.medfilt(f(t), kernel_size=55), linewidth=2, label="medfilt")
plt.plot(t, signal.wiener(f(t), mysize=55), linewidth=2, label="wiener")
plt.legend()
                        It can be seen from the above figure that the filtering effect is very obvious.
                    Regarding the usage of more functions in the
                    scipy.signal
                    module, a great deal of accumulation of professional
                    knowledge about signal processing is required. We will not
                    explain it here. You can read the official documentation
                    according to your own needs to learn.
                  
97.9. Statistical Functions#
                    Statistical theory has wide applications, especially in the
                    interdisciplinary fields formed with computer science and
                    other fields, providing strong theoretical support for data
                    analysis, machine learning, etc. SciPy naturally has
                    relevant functions for statistical analysis, which are
                    concentrated in the
                    scipy.stats
                    module.
                  
                    The
                    scipy.stats
                    module contains a large number of probability distribution
                    functions, mainly including continuous distributions,
                    discrete distributions, and multivariate distributions. In
                    addition, there are also multiple subcategories such as
                    summary statistics, frequency statistics, transformations,
                    and tests. It basically covers all aspects of statistical
                    applications.
                  
                    Next, we will take the relatively representative continuous
                    random variable function of the normal distribution,
                    scipy.stats.norm, as an example for introduction. We will try to randomly
                    draw 1000 normal distribution samples using the
                    .rvs
                    method and plot a bar chart.
                  
from scipy.stats import norm
plt.hist(norm.rvs(size=1000))
                        
                    In addition to
                    norm.rvs
                    which returns random variables,
                    norm.pdf
                    returns the probability density function,
                    norm.cdf
                    returns the cumulative distribution function,
                    norm.sf
                    returns the survival function,
                    norm.ppf
                    returns the quantile function,
                    norm.isf
                    returns the inverse survival function,
                    norm.stats
                    returns the mean, variance, (Fisher) skewness, (Fisher)
                    kurtosis, and
                    norm.moment
                    returns the non-central moments of the distribution. For
                    these mentioned methods, they are common for most SciPy
                    continuous variable distribution functions (Cauchy
                    distribution, Poisson distribution, etc.).
                  
For example, a normal distribution curve can be plotted based on the probability density function.
x = np.linspace(norm.ppf(0.01), norm.ppf(0.99), 1000)
plt.plot(x, norm.pdf(x))
                        
                    In addition, the
                    scipy.stats
                    module also has many useful methods, such as returning a
                    summary of the data.
                  
from scipy.stats import describe
describe(x)
                        DescribeResult(nobs=1000, minmax=(-2.3263478740408408, 2.3263478740408408), mean=0.0, variance=1.809385737250562, skewness=9.356107558947013e-17, kurtosis=-1.2000024000024)
                        
                    describe
                    can return information such as the maximum value, minimum
                    value, mean, variance, etc. of a given array. The
                    scipy.stats
                    module involves relatively strong professionalism and has a
                    relatively high application threshold. You need to be
                    familiar with statistical theories to understand the
                    interpretations of the functions therein.
                  
97.10. Sparse Matrix#
In numerical analysis, a matrix with most of its elements being zero is called a sparse matrix. Conversely, if most of the elements are non-zero, then the matrix is dense. Large sparse matrices often appear when solving linear models in the fields of science and engineering. However, computers often encounter many troubles when performing sparse matrix operations. Due to its own sparse characteristics, memory cost of sparse matrices can be greatly saved through compression. More importantly, due to their large size, standard algorithms often cannot operate on these sparse matrices.
                    Therefore, the
                    scipy.sparse
                    module in SciPy provides storage methods for sparse
                    matrices, and
                    scipy.sparse.linalg
                    contains processing methods specifically for sparse linear
                    algebra. In addition,
                    scipy.sparse.csgraph
                    also contains some theories of topological graphs of sparse
                    matrices.
                  
                    There are roughly seven types of sparse matrix storage
                    methods in
                    scipy.sparse, namely:
                    csc_matrix
                    (compressed sparse column matrix),
                    csr_matrix
                    (compressed sparse row matrix),
                    bsr_matrix
                    (block sparse row matrix),
                    lil_matrix
                    (row-based linked list sparse matrix),
                    dok_matrix
                    (dictionary-based sparse matrix),
                    coo_matrix
                    (sparse matrix in coordinate format), and
                    dia_matrix
                    (diagonal sparse matrix).
                  
                    Next, using a simple NumPy array as an example, convert it
                    to sparse matrix storage. First is the
                    csr_matrix
                    (compressed sparse row matrix).
                  
from scipy.sparse import csr_matrix
array = np.array([[2, 0, 0, 3, 0, 0], [1, 0, 1, 0, 0, 2], [0, 0, 1, 2, 0, 0]])
csr = csr_matrix(array)
print(csr)
                          (0, 0)	2
  (0, 3)	3
  (1, 0)	1
  (1, 2)	1
  (1, 5)	2
  (2, 2)	1
  (2, 3)	2
                        
                    As can be seen, when storing a sparse matrix, the positions
                    of non-zero elements in the matrix are actually represented
                    in the form of coordinates. Next, try to store the same
                    array using the
                    csc_matrix
                    (compressed sparse column matrix).
                  
from scipy.sparse import csc_matrix
csc = csc_matrix(array)
print(csc)
                          (0, 0)	2
  (1, 0)	1
  (1, 2)	1
  (2, 2)	1
  (0, 3)	3
  (2, 3)	2
  (1, 5)	2
                        
                    In fact, by comparison, it can be seen that the storage
                    orders of the two are different, one is row order and the
                    other is column order. The sparse matrix representation can
                    be converted to a dense matrix representation through
                    todense().
                  
csc.todense()
                        matrix([[2, 0, 0, 3, 0, 0],
        [1, 0, 1, 0, 0, 2],
        [0, 0, 1, 2, 0, 0]])
                        Next, we compare the memory sizes consumed when storing sparse matrices and dense matrices. First, we generate a \(1000 \times 1000\) random matrix, and then replace the values less than 1 with 0 to artificially create a sparse matrix.
from scipy.stats import uniform
data = uniform.rvs(size=1000000, loc=0, scale=2).reshape(1000, 1000)
data[data < 1] = 0
data
                        array([[0.        , 1.71868732, 1.58826023, ..., 1.92015453, 1.95429687,
        0.        ],
       [1.98569154, 0.        , 1.85929614, ..., 0.        , 1.17994845,
        1.3283427 ],
       [1.78262258, 1.13269682, 0.        , ..., 0.        , 0.        ,
        1.97596699],
       ...,
       [0.        , 0.        , 0.        , ..., 1.53329049, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 1.73593538, 1.44965897,
        1.16125691],
       [1.94369028, 1.77482989, 1.35591931, ..., 0.        , 0.        ,
        0.        ]])
                        Next, we print the storage size consumed by this dense matrix in memory, with the unit of MB.
data.nbytes/(1024**2)
                        7.62939453125
                        Next, we store it as a sparse matrix and view the size in the same unit.
data_csr = csr_matrix(data)
data_csr.data.size/(1024**2)
                        0.4768686294555664
                        Without accident, the memory space consumed by storing in the sparse matrix format should be much less than that in the dense matrix storage method.
97.11. Summary#
In this chapter, we introduced a series of methods for scientific computing using SciPy. These include: constant modules, linear algebra, interpolation functions, image processing, optimization methods, signal processing, statistical functions, sparse matrices, etc. However, for this series of modules in SciPy, we only completed basic applications and exercises. Subsequently, you need to conduct in-depth learning and understanding of specific content based on your actual situation and in combination with the introduction in the official documentation.
 
            










