NumPy Basics for Data Science

NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides support for handling large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. In this tutorial, we will cover the basics of NumPy and its core functionality, making it a key tool for anyone working in Data Science.

1. Installing NumPy

If you don’t have NumPy installed yet, you can install it using the following pip command:

pip install numpy

Try It Now

2. Creating NumPy Arrays

NumPy arrays are the central data structure in NumPy. Unlike lists in Python, NumPy arrays are faster and more efficient for numerical computations. Here’s how you can create arrays:

  • 1D Array:
  • import numpy as np
    
    arr = np.array([1, 2, 3, 4, 5])
    print(arr) # Output: [1 2 3 4 5]
    

    Try It Now

  • 2D Array (Matrix):
  • arr_2d = np.array([[1, 2], [3, 4], [5, 6]])
    print(arr_2d)
    # Output:
    # [[1 2]
    # [3 4]
    # [5 6]]
    

    Try It Now

  • 3D Array:
  • arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
    print(arr_3d)
    # Output:
    # [[[1 2]
    # [3 4]]
    #
    # [[5 6]
    # [7 8]]]
    

    Try It Now

3. NumPy Array Attributes

NumPy arrays have several useful attributes that allow us to quickly inspect the array’s properties:

  • Shape: Returns the dimensions of the array (rows, columns, etc.).
  • print(arr_2d.shape) # Output: (3, 2)
    

    Try It Now

  • Size: Returns the total number of elements in the array.
  • print(arr_2d.size) # Output: 6
    

    Try It Now

  • Data Type: Returns the type of the array elements.
  • print(arr_2d.dtype) # Output: int64 (depending on your system)
    

    Try It Now

4. Array Operations

NumPy provides support for performing mathematical operations on arrays in an efficient manner:

  • Element-wise Addition:
  • arr_1 = np.array([1, 2, 3])
    arr_2 = np.array([4, 5, 6])
    result = arr_1 + arr_2
    print(result) # Output: [5 7 9]
    

    Try It Now

  • Scalar Multiplication:
  • arr = np.array([1, 2, 3])
    result = arr * 3
    print(result) # Output: [3 6 9]
    

    Try It Now

  • Element-wise Square Root:
  • arr = np.array([4, 9, 16])
    result = np.sqrt(arr)
    print(result) # Output: [2. 3. 4.]
    

    Try It Now

  • Matrix Multiplication (Dot Product):
  • matrix_1 = np.array([[1, 2], [3, 4]])
    matrix_2 = np.array([[5, 6], [7, 8]])
    result = np.dot(matrix_1, matrix_2)
    print(result)
    # Output:
    # [[19 22]
    # [43 50]]
    

    Try It Now

5. NumPy Array Indexing & Slicing

NumPy allows for advanced array indexing and slicing. Here are a few examples:

  • Accessing Elements in 1D Array:
  • arr = np.array([1, 2, 3, 4, 5])
    print(arr[2]) # Output: 3
    

    Try It Now

  • Slicing a 2D Array:
  • arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    print(arr_2d[0:2, 1:3]) # Output: [[2 3] [5 6]]
    

    Try It Now

  • Boolean Indexing: You can also index arrays based on conditions.
  • arr = np.array([1, 2, 3, 4, 5])
    print(arr[arr > 3]) # Output: [4 5]
    

    Try It Now

6. Reshaping Arrays

NumPy allows reshaping arrays, which is useful for changing the dimensions of an array.

arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)
# Output:
# [[1 2 3]
# [4 5 6]]

Try It Now

7. Common NumPy Functions

Here are some common NumPy functions that are widely used in Data Science:

  • np.zeros(): Creates an array of zeros.
  • zeros_arr = np.zeros((2, 3))
    print(zeros_arr)
    # Output: [[0. 0. 0.]
    # [0. 0. 0.]]
    

    Try It Now

  • np.ones(): Creates an array of ones.
  • ones_arr = np.ones((2, 3))
    print(ones_arr)
    # Output: [[1. 1. 1.]
    # [1. 1. 1.]]
    

    Try It Now

  • np.random.rand(): Generates random numbers between 0 and 1.
  • random_arr = np.random.rand(2, 3)
    print(random_arr)
    

    Try It Now

Conclusion

NumPy is an essential library for anyone working with data in Python, especially in the field of Data Science. By understanding the basics of NumPy arrays, array operations, and matrix manipulation, you’ll be well-equipped to work with large datasets and perform complex mathematical operations efficiently.