Data Structures Deep Dive
Master Python built-in data structures — lists, tuples, dictionaries, sets — with indexing, slicing, methods, and performance characteristics.
Python’s four core collection types — list, tuple, dict, and set — appear in virtually every program. Understanding when and how to use each is essential.
Lists — Ordered, Mutable
fruits = ["apple", "banana", "cherry"]
fruits.append("date") # add to end
fruits.insert(1, "apricot") # insert at index
fruits.remove("banana") # remove by value
last = fruits.pop() # remove and return last item
fruits.sort() # sort in place
fruits.reverse()
Indexing and Slicing
nums = [10, 20, 30, 40, 50]
nums[0] # 10 (first)
nums[-1] # 50 (last)
nums[1:4] # [20, 30, 40] (index 1 up to, not including, 4)
nums[::2] # [10, 30, 50] (every second element)
nums[::-1] # [50, 40, 30, 20, 10] (reversed)
List Comprehensions
squares = [x**2 for x in range(10)]
evens = [x for x in range(20) if x % 2 == 0]
matrix = [[0] * 3 for _ in range(3)] # 3x3 grid
Tuples — Ordered, Immutable
Use tuples for fixed collections (coordinates, database rows, dict keys):
point = (3, 4)
x, y = point # unpacking
rgb = (255, 128, 0)
person = ("Alice", 30, "Engineer")
name, age, job = person
# Tuples as dict keys (lists cannot be keys)
locations = {(0, 0): "origin", (1, 2): "point A"}
Named Tuples
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
print(p.x, p.y) # 3 4
Dictionaries — Key-Value Pairs
user = {"name": "Alice", "age": 30, "city": "NYC"}
user["name"] # "Alice"
user.get("email", "N/A") # safe access with default
user["email"] = "[email protected]" # add/update
del user["city"]
for key, value in user.items():
print(f"{key}: {value}")
Useful Dict Methods and Patterns
keys = user.keys()
values = user.values()
# Merge dicts (Python 3.9+)
defaults = {"role": "user", "active": True}
profile = {**defaults, **user}
# Dict comprehension
word_lengths = {word: len(word) for word in ["hello", "world"]}
# defaultdict — auto-create missing keys
from collections import defaultdict
counts = defaultdict(int)
for word in ["a", "b", "a", "c"]:
counts[word] += 1
# {'a': 2, 'b': 1, 'c': 1}
Sets — Unique, Unordered
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
a | b # union: {1, 2, 3, 4, 5, 6}
a & b # intersection: {3, 4}
a - b # difference: {1, 2}
a ^ b # symmetric difference: {1, 2, 5, 6}
tags = {"python", "web", "api"}
tags.add("django")
tags.discard("web") # no error if missing
Sets are ideal for membership tests — x in my_set is O(1) vs O(n) for lists.
Choosing the Right Structure
| Structure | Ordered | Mutable | Duplicates | Use Case |
|---|---|---|---|---|
list |
Yes | Yes | Yes | General sequences, stacks, queues |
tuple |
Yes | No | Yes | Fixed records, function returns |
dict |
Yes* | Yes | Keys unique | Lookups, mappings, JSON-like data |
set |
No | Yes | No | Unique items, set math |
*Dicts preserve insertion order since Python 3.7.
Nested Structures
Real data is often nested:
students = [
{"name": "Alice", "grades": [90, 85, 92]},
{"name": "Bob", "grades": [78, 88, 80]},
]
avg_alice = sum(students[0]["grades"]) / len(students[0]["grades"])
Copying — Shallow vs Deep
import copy
original = [[1, 2], [3, 4]]
shallow = original.copy() # or list(original)
deep = copy.deepcopy(original)
original[0].append(99)
print(shallow[0]) # [1, 2, 99] — inner lists shared
print(deep[0]) # [1, 2] — fully independent
Master these four structures and you’ll handle 90% of Python data manipulation with confidence.
collections Module — Specialized Structures
Beyond the four built-ins, the standard library adds powerful variants:
deque — Fast Queue Operations
from collections import deque
queue = deque(["task1", "task2"])
queue.append("task3") # add to right
first = queue.popleft() # O(1) remove from left — lists are O(n)
# Fixed-size sliding window
recent = deque(maxlen=3)
for item in [1, 2, 3, 4, 5]:
recent.append(item)
print(list(recent)) # [3, 4, 5]
Counter — Count Occurrences
from collections import Counter
words = ["apple", "banana", "apple", "cherry", "apple"]
counts = Counter(words)
print(counts.most_common(2)) # [('apple', 3), ('banana', 1)]
heapq — Priority Queues
import heapq
tasks = [(3, "low"), (1, "urgent"), (2, "medium")]
heapq.heapify(tasks)
print(heapq.heappop(tasks)) # (1, 'urgent') — smallest priority first
Time Complexity Cheat Sheet
| Operation | list | dict | set |
|---|---|---|---|
| Access by index/key | O(1) | O(1) avg | — |
Search (in) |
O(n) | O(1) avg | O(1) avg |
| Insert at end | O(1) amortized | O(1) avg | O(1) avg |
| Insert at front | O(n) | — | — |
| Delete by value | O(n) | O(1) avg | O(1) avg |
Choose set or dict when you need fast membership tests on large collections.
Sorting and Ordering
students = [
{"name": "Alice", "score": 92},
{"name": "Bob", "score": 85},
{"name": "Charlie", "score": 92},
]
# Sort by score descending, then name
sorted_students = sorted(students, key=lambda s: (-s["score"], s["name"]))
# Sort in place
students.sort(key=lambda s: s["name"])
Immutability Patterns
When you need an immutable sequence with list-like methods:
from types import MappingProxyType
config = {"debug": True, "host": "localhost"}
frozen = MappingProxyType(config)
# frozen["debug"] = False # TypeError — read-only view
Tuples remain the standard choice for immutable ordered data.