to navigate

to select

to close

On this page

Performance Optimization

Profile and optimize Python code with cProfile, line_profiler, caching, algorithmic improvements, and C extensions for bottlenecks.

Python prioritizes developer productivity over raw speed. When performance matters, measure first, then optimize the right bottlenecks.

Rule #1: Profile Before Optimizing

Never guess where your code is slow. Use profilers:

cProfile — Function-Level Profiling

  import cProfile
import pstats

def slow_function():
    total = 0
    for i in range(1_000_000):
        total += i ** 2
    return total

cProfile.run('slow_function()', 'profile.stats')

stats = pstats.Stats('profile.stats')
stats.sort_stats('cumulative').print_stats(10)

Run from CLI:

  python -m cProfile -s cumulative your_script.py

timeit — Micro-Benchmarks

  import timeit

time_list = timeit.timeit(
    "[x**2 for x in range(1000)]",
    number=10000
)
time_gen = timeit.timeit(
    "(x**2 for x in range(1000))",
    number=10000
)
print(f"List comp: {time_list:.4f}s, Generator: {time_gen:.4f}s")

Algorithmic Optimization

The biggest wins come from better algorithms, not faster loops:

  # O(n²) — slow for large inputs
def has_duplicate_slow(items):
    for i, a in enumerate(items):
        for b in items[i+1:]:
            if a == b:
                return True
    return False

# O(n) — use a set
def has_duplicate_fast(items):
    seen = set()
    for item in items:
        if item in seen:
            return True
        seen.add(item)
    return False

Built-in Optimizations

Use Built-in Functions and Libraries

Built-ins are implemented in C and are much faster:

  # Slow
total = 0
for x in data:
    total += x

# Fast
total = sum(data)

NumPy, Pandas, and itertools are optimized C implementations — use them for numerical and iteration-heavy work.

List Comprehensions vs Loops

List comprehensions are generally faster than equivalent for loops:

  # Prefer
squares = [x**2 for x in range(1000)]

# Over
squares = []
for x in range(1000):
    squares.append(x**2)

Generators for Large Data

Generators use constant memory instead of building entire lists:

  def read_large_file(path):
    with open(path) as f:
        for line in f:
            yield line.strip()

Caching with functools.lru_cache

Memoize expensive pure functions:

  from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(100))  # instant

slots for Memory

Reduce memory per instance when creating millions of objects:

  class Point:
    __slots__ = ('x', 'y')

    def __init__(self, x, y):
        self.x = x
        self.y = y

When to Reach for C/Rust Extensions

If profiling shows a specific hot loop that can’t be vectorized:

Cython — compile Python-like code to C
PyO3 / maturin — write Rust extensions
Numba — JIT compile numerical functions

  from numba import jit

@jit(nopython=True)
def fast_sum(arr):
    total = 0.0
    for x in arr:
        total += x
    return total

Optimization Checklist

Measure with cProfile or py-spy
Fix algorithms — O(n²) → O(n log n) beats micro-optimizations
Use the right data structure — set/dict for lookups, deque for queues
Leverage libraries — NumPy, pandas, orjson
Cache repeated pure computations
Parallelize CPU work with multiprocessing
Only then consider C extensions

Premature optimization wastes time. Profile-driven optimization delivers real results.

Packaging & Publishing

Build distributable Python packages with …

Metaprogramming

Explore Python metaprogramming — …

Performance Optimization

Rule #1: Profile Before Optimizing link

cProfile — Function-Level Profiling link

timeit — Micro-Benchmarks link

Algorithmic Optimization link

Built-in Optimizations link

Use Built-in Functions and Libraries link

List Comprehensions vs Loops link

Generators for Large Data link

Caching with functools.lru_cache link

slots for Memory link

When to Reach for C/Rust Extensions link

Optimization Checklist link