Data Types for Visualization in Data Science

Data visualization relies on understanding different data types because the choice of visualization depends on the nature of the data. Data types determine how we analyze, interpret, and present data effectively. In this tutorial, we will explore various data types and their best-suited visualization techniques.

1. Types of Data in Data Science

Data in Data Science is broadly classified into:

1.1. Qualitative (Categorical) Data

Categorical data represents discrete groups or labels that do not have numerical meaning.

a) Nominal Data

  • Categories with no inherent order (e.g., colors, gender, countries).
  • Best Visualization Types:
    • Bar Chart
    • Pie Chart
    • Count Plot

    Example: Visualizing Gender Distribution with a Bar Chart

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    
    # Sample categorical data
    data = pd.DataFrame({'Gender': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female']})
    
    # Count plot
    sns.countplot(x='Gender', data=data)
    plt.title("Gender Distribution")
    plt.show()
    

    Try It Now

b) Ordinal Data

  • Categories with a meaningful order but unequal differences (e.g., low/medium/high satisfaction levels).
  • Best Visualization Types:
    • Bar Chart (ordered)
    • Histogram
    • Box Plot

    Example: Visualizing Education Level with a Bar Chart

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample ordinal data
education_levels = pd.DataFrame({'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor', 'Master']})

# Count plot with sorted order
order = ['High School', 'Bachelor', 'Master', 'PhD']
sns.countplot(x='Education', data=education_levels, order=order)
plt.title("Education Level Distribution")
plt.show()

Try It Now

1.2. Quantitative (Numerical) Data

Numerical data represents measurable quantities and can be analyzed mathematically.

a) Discrete Data

  • Countable values (e.g., number of students, cars in a parking lot).
  • Best Visualization Types:
    • Bar Chart
    • Histogram
    • Dot Plot

    Example: Visualizing Number of Students in Different Classes

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample discrete data
data = pd.DataFrame({'Class': ['A', 'B', 'C', 'D'], 'Students': [30, 45, 25, 40]})

# Bar chart
sns.barplot(x='Class', y='Students', data=data)
plt.title("Number of Students in Each Class")
plt.show()

Try It Now

b) Continuous Data

  • Any value within a range (e.g., height, weight, temperature).
  • Best Visualization Types:
    • Histogram
    • Line Plot
    • Box Plot
    • Scatter Plot

    Example: Visualizing Height Distribution with a Histogram

import numpy as np
import matplotlib.pyplot as plt

# Generate random continuous data
heights = np.random.normal(loc=170, scale=10, size=1000)

# Histogram
plt.hist(heights, bins=20, edgecolor='black')
plt.title("Height Distribution")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
plt.show()

Try It Now

2. Mixed Data Types in Visualization

Sometimes, we need to visualize relationships between different types of data.

2.1. Categorical vs. Numerical Data

Best Visualization Types:

  • Box Plot (to compare numerical distributions across categories)
  • Violin Plot (to see the density of numerical values across categories)

Example: Visualizing Salary Distribution by Job Title with a Box Plot

 

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Sample mixed data
data = pd.DataFrame({
    'Job': ['Engineer', 'Doctor', 'Teacher', 'Engineer', 'Doctor', 'Teacher'],
    'Salary': [70000, 120000, 50000, 75000, 130000, 52000]
})

# Box plot
sns.boxplot(x='Job', y='Salary', data=data)
plt.title("Salary Distribution by Job Title")
plt.show()

Try It Now

2.2. Numerical vs. Numerical Data

Best Visualization Types:

  • Scatter Plot (for relationships between two continuous variables)
  • Line Chart (for trends over time)
  • Heatmap (for correlation between multiple numerical variables)

Example: Visualizing the Relationship Between Age and Salary with a Scatter Plot

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(10)
data = pd.DataFrame({
    'Age': np.random.randint(20, 60, 100),
    'Salary': np.random.randint(30000, 100000, 100)
})

# Scatter plot
sns.scatterplot(x='Age', y='Salary', data=data)
plt.title("Age vs. Salary Relationship")
plt.show()

Try It Now

3. Summary of Data Types and Visualization Choices

Data Type Example Best Visualizations
Nominal (Categorical) Gender, Country Bar Chart, Pie Chart
Ordinal (Ordered Categorical) Satisfaction Level, Education Ordered Bar Chart, Box Plot
Discrete (Numerical Countable) Number of Students, Cars Bar Chart, Histogram
Continuous (Numerical Measurable) Height, Temperature Histogram, Line Plot, Scatter Plot
Categorical vs. Numerical Job vs. Salary Box Plot, Violin Plot
Numerical vs. Numerical Age vs. Salary Scatter Plot, Line Chart, Heatmap

 

 

Choosing the right visualization for different data types is essential for extracting meaningful insights. By understanding the structure of your data—whether categorical, numerical, or mixed—you can select the most appropriate charts and graphs to make your data more interpretable and impactful.