NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides support for handling large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. In this tutorial, we will cover the basics of NumPy and its core functionality, making it a key tool for anyone working in Data Science.
1. Installing NumPy
If you don’t have NumPy installed yet, you can install it using the following pip command:
pip install numpy
2. Creating NumPy Arrays
NumPy arrays are the central data structure in NumPy. Unlike lists in Python, NumPy arrays are faster and more efficient for numerical computations. Here’s how you can create arrays:
- 1D Array:
- 2D Array (Matrix):
- 3D Array:
import numpy as np arr = np.array([1, 2, 3, 4, 5]) print(arr) # Output: [1 2 3 4 5]
arr_2d = np.array([[1, 2], [3, 4], [5, 6]]) print(arr_2d) # Output: # [[1 2] # [3 4] # [5 6]]
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) print(arr_3d) # Output: # [[[1 2] # [3 4]] # # [[5 6] # [7 8]]]
3. NumPy Array Attributes
NumPy arrays have several useful attributes that allow us to quickly inspect the array’s properties:
- Shape: Returns the dimensions of the array (rows, columns, etc.).
- Size: Returns the total number of elements in the array.
- Data Type: Returns the type of the array elements.
print(arr_2d.shape) # Output: (3, 2)
print(arr_2d.size) # Output: 6
print(arr_2d.dtype) # Output: int64 (depending on your system)
4. Array Operations
NumPy provides support for performing mathematical operations on arrays in an efficient manner:
- Element-wise Addition:
- Scalar Multiplication:
- Element-wise Square Root:
- Matrix Multiplication (Dot Product):
arr_1 = np.array([1, 2, 3]) arr_2 = np.array([4, 5, 6]) result = arr_1 + arr_2 print(result) # Output: [5 7 9]
arr = np.array([1, 2, 3]) result = arr * 3 print(result) # Output: [3 6 9]
arr = np.array([4, 9, 16]) result = np.sqrt(arr) print(result) # Output: [2. 3. 4.]
matrix_1 = np.array([[1, 2], [3, 4]]) matrix_2 = np.array([[5, 6], [7, 8]]) result = np.dot(matrix_1, matrix_2) print(result) # Output: # [[19 22] # [43 50]]
5. NumPy Array Indexing & Slicing
NumPy allows for advanced array indexing and slicing. Here are a few examples:
- Accessing Elements in 1D Array:
- Slicing a 2D Array:
- Boolean Indexing: You can also index arrays based on conditions.
arr = np.array([1, 2, 3, 4, 5]) print(arr[2]) # Output: 3
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) print(arr_2d[0:2, 1:3]) # Output: [[2 3] [5 6]]
arr = np.array([1, 2, 3, 4, 5]) print(arr[arr > 3]) # Output: [4 5]
6. Reshaping Arrays
NumPy allows reshaping arrays, which is useful for changing the dimensions of an array.
arr = np.array([1, 2, 3, 4, 5, 6]) reshaped_arr = arr.reshape(2, 3) print(reshaped_arr) # Output: # [[1 2 3] # [4 5 6]]
7. Common NumPy Functions
Here are some common NumPy functions that are widely used in Data Science:
- np.zeros(): Creates an array of zeros.
- np.ones(): Creates an array of ones.
- np.random.rand(): Generates random numbers between 0 and 1.
zeros_arr = np.zeros((2, 3)) print(zeros_arr) # Output: [[0. 0. 0.] # [0. 0. 0.]]
ones_arr = np.ones((2, 3)) print(ones_arr) # Output: [[1. 1. 1.] # [1. 1. 1.]]
random_arr = np.random.rand(2, 3) print(random_arr)
Conclusion
NumPy is an essential library for anyone working with data in Python, especially in the field of Data Science. By understanding the basics of NumPy arrays, array operations, and matrix manipulation, you’ll be well-equipped to work with large datasets and perform complex mathematical operations efficiently.