On this page
article
Data Visualization
Create charts and plots with Matplotlib and Seaborn — line, bar, scatter, histogram, and publication-quality figures.
Visualizing data reveals patterns that numbers alone cannot. Matplotlib is the foundation; Seaborn builds on it with statistical plots.
Matplotlib Basics
pip install matplotlib seaborn
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(8, 4))
plt.plot(x, y, label="sin(x)", color="blue", linewidth=2)
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sine Wave")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("sine_wave.png", dpi=150)
plt.show()
Common Chart Types
Bar Chart
categories = ["Python", "JavaScript", "Java", "Go", "Rust"]
values = [29.9, 19.2, 17.5, 11.5, 5.8]
plt.figure(figsize=(8, 5))
plt.bar(categories, values, color="steelblue")
plt.title("Language Popularity (%)")
plt.ylabel("Percentage")
plt.show()
Scatter Plot
np.random.seed(42)
x = np.random.randn(100)
y = 2 * x + np.random.randn(100) * 0.5
plt.figure(figsize=(6, 6))
plt.scatter(x, y, alpha=0.6, c=y, cmap="viridis")
plt.xlabel("Feature X")
plt.ylabel("Feature Y")
plt.colorbar(label="Value")
plt.show()
Histogram
data = np.random.randn(1000)
plt.figure(figsize=(8, 4))
plt.hist(data, bins=30, edgecolor="black", alpha=0.7)
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Distribution")
plt.show()
Subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
axes[0, 0].plot(x, np.sin(x))
axes[0, 0].set_title("Sine")
axes[0, 1].plot(x, np.cos(x))
axes[0, 1].set_title("Cosine")
axes[1, 0].bar(["A", "B", "C"], [3, 7, 5])
axes[1, 0].set_title("Bar")
axes[1, 1].scatter(x[:50], y[:50])
axes[1, 1].set_title("Scatter")
plt.tight_layout()
plt.show()
Seaborn — Statistical Visualization
import seaborn as sns
import pandas as pd
df = sns.load_dataset("tips")
# Distribution
sns.histplot(data=df, x="total_bill", kde=True)
# Relationships
sns.scatterplot(data=df, x="total_bill", y="tip", hue="time")
# Categorical
sns.boxplot(data=df, x="day", y="total_bill")
# Correlation heatmap
corr = df.select_dtypes("number").corr()
sns.heatmap(corr, annot=True, cmap="coolwarm")
plt.show()
Plotting from Pandas
df = pd.read_csv("sales.csv")
df["date"] = pd.to_datetime(df["date"])
df.set_index("date")["revenue"].plot(figsize=(10, 4), title="Monthly Revenue")
plt.ylabel("Revenue ($)")
plt.show()
df.groupby("category")["amount"].sum().plot(kind="bar")
plt.show()
Customizing for Publication
plt.rcParams.update({
"font.size": 12,
"axes.labelsize": 14,
"axes.titlesize": 16,
"figure.dpi": 150,
})
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(x, y, "o-", markersize=4)
ax.set_xlabel("Time (s)")
ax.set_ylabel("Amplitude")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.savefig("figure.pdf", bbox_inches="tight")
Choosing the Right Chart
| Data Type | Chart |
|---|---|
| Trend over time | Line plot |
| Compare categories | Bar chart |
| Distribution | Histogram, box plot |
| Relationship between two variables | Scatter plot |
| Correlation matrix | Heatmap |
| Part of whole | Pie chart (use sparingly) |
Good visualizations communicate insights clearly. Start simple, label axes, and choose charts that match your data.