Introduction to NumPy Library in Python

Numerical data is the most frequently used data type in Machine Learning, and to store this tremendous amount of data, we use "array" data structures. NumPy is a famous library package of Python used by Data scientists and analysts for working effectively and efficiently with arrays.
In this article, we will learn about this beautiful package starting with the installation, and then get used to some essential functions frequently used while building machine learning projects.

Key takeaways from this blog

  • What is NumPy?
  • Difference between Python lists and NumPy array
  • Creating NumPy array and essential functions used for the same.
  • The Shape, Reshaping of the NumPy Array.
  • Squeezing and Expanding NumPy arrays.
  • Slicing and Indexing of Numpy arrays.
  • Concatenating and stacking NumPy arrays.
  • Broadcasting of NumPy arrays.
  • Mathematical operations and sorting on Numpy arrays.

Let us start by learning more about NumPy.

What is NumPy?

Numpy is an open-source library that adds support for large, multidimensional arrays and helps us perform high-level mathematical functions effectively and efficiently. Travis Oliphant developed it in 2005. Many famous Python libraries, such as Pandas, MatPlotlib, Seaborn, and Scikit-learn, are built on top of NumPy. To know NumPy in more detail, let's first understand,

What are arrays, ndarray or N-dimensional arrays?

Arrays are a collection of elements and can have single or multiple dimensions. One dimension array is called a vector, and an array of two dimensions is called a matrix. Similarly, an array of three dimensions can be considered a tensor(a set of matrixes). NumPy arrays are called ndarray or N-dimensional arrays.

Vector, Matrix and 3D Matrix example

When ideally can we start using NumPy?

NumPy is an excellent choice to learn after gaining confidence in Python basics. After this, to advance our carrier in data science, we should learn SciPy and Pandas. In short, our learning pattern should follow Python basics, NumPy, SciPy, or Pandas.

We all might wonder if Python lists already exist; what's the need for NumPy? So, let's know their difference first.

Python Lists vs. NumPy Arrays

Python lists act as an array that can store different types of elements. Everything is an object in Python, so it matters how these objects are stored. A Python object is a pointer to a memory that stores different data types

Lists are excellent as it helps work with different data types in a single data structure. But that comes at the price of memory and computing efficiency, especially when we have elements of the same data type.

NumPy array solves this issue as it stores similar types of elements, which helps save memory, especially when we have an array with many elements. Also, numpy makes it possible for element-wise operations, which is impossible in the list. It is also time efficient for mathematical operations, and approximately 14x faster than normal python.

Installation and Import of NumPy library in Python

One can find the detailed instruction to install NumPy on all operating systems in our make your system machine learning-enabled blog. To install NumPy via Python PyPI (pip), we can use the commands below,

Python2 on terminal → pip install numpy
Python3 on terminal → pip3 install numpy
Jupyter notebook python2 → !pip install numpy

Once installed, we can import this library and use it in our codes. For example:

import numpy as np

We have imported NumPy and shortened its name to "np". So in future sections, while using this library, 'np' will be used by us and not the complete name NumPy. As discussed, NumPy has significant advantages when used for mathematical operations. So let's start with creating NumPy arrays first.

Creating NumPy Array

Basic ndarray

np.array() can be used to create a NumPy array. This function needs values in a list and converts them into a ndarray. For example:

np.array([1,2,3])

#Output:
array([1, 2, 3])

We can specify the datatype inside the "np.array" function. Suppose we select a data type as "int", but the input list has float values; then, while creating an array, it will take floors of those float values, as shown in the example below.

np.array([1,2,3.7],dtype=int)

#Output:
array([1, 2, 3])

We saw before that the NumPy array can be multidimensional, and the same can be created by passing a list of lists to the function. Here in the example below, a 2X3 matrix has been made. The matrix shape is defined as N x M, where N is the number of rows and M is the number of columns.

np.array([[1,2,3],[4,5,6]])

#Output:
array([[1, 2, 3],
       [4, 5, 6]])
Array with a fixed number

We use the np.full() function to create an array containing a fixed number. We have to provide the shape of the array and the numbers needed to fill it. This can be observed in the example below.

np.full((2,2),5)

#Output:
array([[5, 5],
       [5, 5]])

We can use np.zeros() andpass the shape of the array as a tuple to get an array with only the number zero. For example:

np.zeros((2,2))

#Output:
array([[0., 0.],
       [0., 0.]])

We can use np.ones() andpass the shape of the array as a tuple to get an array with each element as 1, as shown below.

np.ones(4)
# Output:
array([1., 1., 1., 1.])
Random numbers in the array

We often need for our application in data science that initial values are randomized. np.random.rand() can create an array with random values.We need to pass the shape of the array, and all random values are in the range [0,1), with zero included and 1 excluded:

np.random.rand(2,3)

#Output:
array([[0.76981844, 0.56005659, 0.61075499],
       [0.2434684 , 0.8560164 , 0.22834211]])
Identity Matrix Creation

An identity matrix is a square matrix where only diagonal elements are one, and the rest are zero. Use np.eye() method to create it. Pass the number of rows or columns as they are the same np.eye(4) creates a 4X4 matrix as shown:

np.eye(4)
# Output:
array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

We can move the diagonal upward or downward by specifying the value of k in np.eye(number of rows, k=value). If the value is positive, it moves upward; for a negative value, it moves downward. Please note that the matrices we get are not identity matrices. An example is shown below where the diagonal moves downward:

np.eye(4,k=-1)
# Output:
array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])
Array elements with a constant distance

Use the np.arange() method to get an evenly spaced array:

np.arange(4)
#Output:
array([0, 1, 2, 3])

Specify the starting point, end point, and step size in the function to get a custom array which is as shown:

np.arange(10,30,5)
Output:
array([10, 15, 20, 25])

In the above example, please note that the endpoint is not included in our array. So we see above that 30 is not there in the array as it was our endpoint.

Picking elements from a constant distance in an nd-array.

If we want the end point too, use np.linspace(), but here specify the number of elements wanted in the array instead of step size as shown:

np.linspace(10,30,6)
# Output:
array([10., 14., 18., 22., 26., 30.])

We have learned to make a new array with the help of different methods. Now we will see how to know the shape of an already existing array.

Shape of ndarray

We need to know the array's number of rows, columns, and axes. We also like to see the shape and size of the array.

Let's create an array with the name np_array, which will be directly used for explaining different functions ahead, as shown:

np_array = np.array([[10,20,30],[40,50,60]])

ndarray shape description

Number of axes of the array

Use the ndim attribute to get the number of types of axes (also known as dimensions) of an array as shown:

np_array.ndim
# Output:
2

Here we get an output as 2 as the array has two axes.

The shape of the array

We use the shape attribute to get the shape of an array. We get the result in a tuple where each index tells the number of particular axes. The output is (2,3), corresponding to 2 rows and 3 columns.

np_array.shape
# Output:
(2,3)

We can also get the size of the array, which is the multiplication of each type of axes. Use the size attribute for this. The output is 6 as axis1*axis2 = 2*3 = 6

np_array.size
# Output:
6

Reshaping of ndarray

Here we reshape the array without changing the elements. Use the shape attribute as shown below. The input provided is the shape of the matrix we want. Please note that the shape should have the same number of elements as the original matrix; otherwise, an error will occur.

a = np.array([10,20,30,40,50,60])
print(a)
a.reshape(2,3)
a.reshape(2,5)

Output:
array([10, 20, 30, 40, 50, 60])
###below is the output after reshaping we get
array([[10, 20, 30],
       [40, 50, 60]])
       
## Error in reshaping to 2*5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot reshape array of size 6 into shape (2,5)

In the above example, we know the whole shape of the matrix. In cases we do not know the entire shape, we can give input as -1 in place of the dimension of 1 axis.

a.reshape(3,-1)
# Output:
array([[10, 20],
       [30, 40],
       [50, 60]])
       

a.reshape(-1,3)
# Output:
array([[10, 20, 30],
       [40, 50, 60]])
Transpose of a matrix

Transpose is a shaping method where the number of rows and columns is swapped. Use the transpose attribute as shown below:

np_array.transpose()

Output:
array([[10, 40],
       [20, 50],
       [30, 60]])
Flattening numpy array

We flatten an array while converting a multidimensional array to 1-dimensional. We can use flatten() or ravel().

array1 = np_array.flatten()
array2 = np_array.ravel()
print("array shape after flatten is:",array1.shape)
print("array shape after ravel is:",array2.shape)
print("array after flatten is:",array1)
print("array after ravel is:",array1)

#Output:
array shape after flatten is: (6,)
array shape after ravel is: (6,)
array after flatten is: [10 20 30 40 50 60]
array after ravel is: [10 20 30 40 50 60]

Flattening of an array

After seeing this, we might think that both the functions are the same, but there is a fundamental in the output they return. Here flatten() returns a deep copy while ravel() returns a shallow copy. 
A deep copy creates an entirely new ndarray, and a reference to this new location in memory is returned. Changes made to output will not reflect in the original array. While in shallow copy, reference to original memory is returned, which means that changes made to shallow copy output will also reflect in the original array.

###below is changes made in flatten output
array1[1] = 0
print(np_array)
Output: 
[[10 20 30]
 [40 50 60]]
 
###below is changes made in ravel output
array2[1] = 0
print(np_array)
Output:Output:
[[10  0 30]
 [40 50 60]]

Expanding a NumPy array

Use np.expand_dims() method for this purpose. The input we need to provide is the array and axis along which we wish to expand the array. We try to expand the array along a row as shown below: 

np.expand_dims(a,axis=1)
Output:
array([[1],
       [2],
       [3],
       [4],
       [5]])

Squeezing a NumPy array

Use the np.squeeze() method for compressing an array. Squeezing an array means reducing its dimension along an axis. The axis we choose has a corresponding value equal to 1 in the shape tuple. If, by chance, while selecting an axis, the condition of the corresponding shape value =1 is not followed, an error will occur.

a = np.array([[[1,2,3],[4,5,6]]])
a.shape
# Output:
(1, 2, 3)

np.squeeze(a,axis=2)
# Output: We get the following error as corresponding value is 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 180, in squeeze
  File "/home/avisouser/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1545, in squeeze
    return squeeze(axis=axis)
ValueError: cannot select an axis to squeeze out which has size not equal to one

np.squeeze(a,axis=0)
#Output:
array([[1, 2, 3],
       [4, 5, 6]])

We have seen now to create an array and determine its shape. Now we will see how to access specific elements of an array using slicing and indexing.

Slicing and Indexing of NumPy arrays

Slicing vectors or 1D arrays

Sometimes only a part of the complete array is needed. For that, we only need to pass starting index, end index, and step size → [start index: end index: step size].
Please note that here end index is not included. Step size determines the number of elements to skip between choosing two elements.

np.array([1,2,3,4,5,6])
array([1, 2, 3, 4, 5, 6])

np.array([1,2,3,4,5,6])[1:5]
#Output when no step size is:
array([2, 3, 4, 5])

np.array([1,2,3,4,5,6])[1:5:1]
#Output when step size is 1:
array([2, 3, 4, 5])

np.array([1,2,3,4,5,6])[1:5:2]
#Output when step size is 2:
array([2, 4])

Slicing and indexing of matrices or 2D arrays

We take note that 2D arrays mean two axes are present. So slicing here has to occur for both axes. Please note that this method will work for multidimensional arrays too. We index elements usually like in a list of lists with 0-based indexing.

###Indexing
np_array[0,0]
# Output: Here we get the elemnet from first row and first column
10

np_array[0,2]
# Output: Here we get the elemnet from first row and third column
30

np_array[1,2]
# Output: Here we get the elemnet from second row and third column
60

###Slicing
np_array[:,1:2]
# Output: Here we only choose the second column
array([[20],
       [50]])
np_array[:1,:]
#Output: Here we only choose the first row
array([[10, 20, 30]])

np_array[:1,1:2]
# Output: Here we only choose the common values between first row and second column
array([[20]])

Slicing and indexing of 3D matrices or 3D arrays

We first create a 3D matrix using np.array() method.

a = np.array([[[10,20],[30,40],[50,60]],# first axis array

[[70,80],[90,100],[110,120]],# second axis array
[[130,140],[150,160],[170,180]]])# third axis array
print(a)

# Output:
[[[ 10  20]
  [ 30  40]
  [ 50  60]]
[[ 70  80]
  [ 90 100]
  [110 120]]
[[130 140]
  [150 160]
  [170 180]]]

Please note that the 3D matrix has an additional axis compared to the 2D matrix. We can also say the third axis determines the number of 2D matrices superimposed on one another, as shown in the figure below.

Slicing and Indexing example

Above, we see a representation of a 3D matrix. As discussed in the section → Slicing and indexing of matrices or 2D arrays, we take slices of each axis to get our required elements.

a.shape
#Output:
(3, 3, 2)

## above we see that we get a 3d matrix with a depth of 2 and x, y axis as 3.

###Inexing of array
a[0,0,1]
#Output: Here we get first element for depth 1 with x and y coordinate being 0
20

###Slicing of array
a[1:,0:2,0:2]
# Output: We select first two rows of second and third array
array([[[ 70,  80],
        [ 90, 100]],
[[130, 140],
        [150, 160]]])

Use the np.flip() method to flip the array horizontally or vertically, depending on the axis.

np_array
#Output:
array([[10, 20, 30],
       [40, 50, 60]])

np.flip(np_array,axis=0)
# Output:
array([[40, 50, 60],
       [10, 20, 30]])

We see in the next section how to stack two arrays and conditions while doing the same.

Stacking and Concatenating NumPy arrays

We are stacking and Concatenating to combine two existing arrays to get a new array. The difference is in the Concatenation axis should already exist along which arrays need to be combined. Also, in stacking point to note is that the axis along which arrays combine should have the same size; otherwise, an error will occur. Use the following functions:

  • np.vstack() → Here, two arrays are combined vertically, increasing the number of rows. This method comes under stacking.
  • np.hstack() → Here, two arrays are combined horizontally, increasing the number of columns. This method comes under stacking.
  • np.dstack() → Here, two arrays are combined along depth and increase the array's depth. This method comes under stacking.
  • np.concatenate() → Here, we combine two arrays along a particular axis. This method comes under concatenating.
a = np.array([1,2,3])
b = np.array([4,5,6])
a1 = np.array([[10,20],[30,40]])
b1 = np.array([[50,60],[70,80]])
np.vstack((a,b))

# Output:
array([[1, 2, 3],
       [4, 5, 6]])

np.hstack((a,b))
#Output:
array([1, 2, 3, 4, 5, 6])

np.dstack((a1,b1))
# Output:
array([[[10, 50],
        [20, 60]],
[[30, 70],
        [40, 80]]])
        
np.concatenate((a,b),axis=0)
# Output: Here we concatenate along row
array([1, 2, 3, 4, 5, 6])

We all might wonder if we have said that NumPy is handy for mathematical operations. Still, these operations on ndarrays can be performed by a scalar number or between two ndarray of different dimensions. These operations are impossible without changing the size of smaller ndarray, and internal working for this process is known as broadcasting. We see in the next section more detail about broadcasting.

Broadcasting in NumPy arrays

Broadcasting is Python's internal process, which is very helpful when we want to multiply a scalar with ndarray. It is also useful when we want to operate on 2 ndarrays, and it helps to increase the size of smaller ndarray. Note that the dimension, which does not match, has to be 1 for a smaller matrix, and then broadcasting would work. Otherwise, we will get an error.

a = np.arange(10,100,20)
b = np.array([[3],[3]])

a+b
#Output: Here we get the output when we try to add 2 different dimensional ndarrays.
array([[13, 33, 53, 73, 93],
       [13, 33, 53, 73, 93]])
       
a*2
# Output: Here we multiply by a scalar number for the whole matrix
array([ 20,  60, 100, 140, 180])

Here the scalar number is hypothetically stretched to match the dimensions of ndarray so that it is feasible for multiplication.
Unless two ndarrays have the same dimensions, their calculations would not have been feasible, but now it is possible due to broadcasting.

We have said multiple times till now that NumPy is useful for mathematical operations but have not seen what type of operations can be performed using Numpy. We see these in the next section.

Mathematical Operations on NumPy array

Basic Arithmetic Operations

Basic mathematical operations are performed similarly in standard Maths. These include addition, subtraction, division, etc.

a = np.arange(10,100,20)
a
print("sum output is:",a+2)
print("subtraction output is:",a-2)
print("division output is:",a/2)

#Output:
array([10, 30, 50, 70, 90])
sum output is: [12 32 52 72 92]
subtraction output is: [ 8 28 48 68 88]
division output is: [ 5. 15. 25. 35. 45.]

Mean →Mean can be found using an np.mean() method. For a vector, it means taking the sum of the vector and dividing it by the length of the vector.
Median →We find the median using an np.median() method. The Median is a value that separates the higher half from the lower half of data, a population, or a probability distribution.
Standard deviation → We find the standard deviation using the function np.std().

np.mean(a)
50.0

np.median(a)
50.0

np.std(a)
28.284271247461902

Minimum → Here, the minimum element in the array is found using the np.min() method. The index of a minimum element can be determined using the argmin() method.
Maximum → We find the principal element in the array usingthe np.max() method. The index of the top element can be determined using the argmax() method.
Array Sum → Use the sum() method to find the array sum.

np_array.sum()
# Output:
210

np.min(a,axis=0)
# Output:
10

np.max(a,axis=0)
# Output:
90
### In above case we determine min and max element along the column

Sorting NumPy array

Often in data science problems, we need to sort elements. Depending on its implementation and algorithm used, the time required for sorting can vary greatly. NumPy has implemented various algorithms like mergesort, quicksort, time sort, etc.

a = np.array([10,40,20,500])
np.sort(a, kind='mergesort')
# Output:
array([10, 20, 400, 5000])

Conclusion

We can say that NumPy is a boon for Python developers, which helps us to perform mathematical operations effectively and efficiently. As a quick summary, in this article, we discussed all the basics of the numpy library, starting with the installation and performing various operations on ndarrays. To know more about this library, you can see the official documentation. We hope you enjoyed it.

Next Blog: Introduction to Pandas

Previous Blog: Introduction to OOPS in python

Enjoy Learning!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.