These are my notes from working through the basics of algorithms, written quickly so they'd actually stick. Nothing here is novel, it's just the handful of ideas that, once they clicked, made the rest easier. Here's what's covered.
- Binary Search
- Simple Sorting
- Quick Data Structures
- Recursion
- Using what we learnt in Quick Sort
- Hash Tables
Binary Search
Binary search finds an element in a sorted array. Its Big O is O(log n), and the intuition is a question: how many times can you cut the array in half before you're down to one element? That number is log n, and that's how many guesses you ever need.
Implementation
def binary_search(arr, target): high = len(arr) - 1 low = 0 while low <= high: mid = (high + low) // 2 if arr[mid] > target: high = mid - 1 elif arr[mid] < target: low = mid + 1 else: return True return False
Each comparison throws away half of what's left. That's the whole reason a phone book of a billion names only takes about 30 guesses.
Sorting
The fast everyday sort is Quicksort, but selection sort is the gentler place to start, so let's begin there.
Selection Sort
Selection sort works by scanning the array for the largest element, pulling it out, then scanning the rest for the next largest, and so on, until what's left is sorted.
It's slow because of that repeated scanning. You walk the whole array to find each element you place, so for n elements you do roughly n passes of n work, which is O(n^2).
Task: try implementing it yourself before reading on, it's a good way to feel why the nested scan hurts.
Quick Data Structures
Before the faster sort, a quick detour into arrays and linked lists, and when to reach for each.
Arrays are a contiguous block of elements. They're good for fast direct access by index and simple sequential storage where you read elements in order.
Use an array when you need:
- Fast direct access to elements by index
- Simple sequential storage where elements are read in order
- A fixed-size collection whose size you know upfront
Note: arrays are bad for frequent insertions and deletions in the middle, because shifting everything along is O(n).
Linked lists are elements joined by pointers. They're good for frequent insertions and deletions (O(1) once you're at the position) and for sizes that grow and shrink a lot.
Use a linked list when you need:
- Frequent insertions or deletions in the middle (
O(1)once you have the position) - A size that grows and shrinks often
- No wasted memory for empty slots
- A base for other structures like stacks or queues
# Array - good for a fixed collection with direct indexing scores = [85, 92, 78, 90, 88] third_score = scores[2] # fast direct access # Linked List - good for frequent insertions/deletions class Node: def __init__(self, data): self.data = data self.next = None
Recursion
One more idea before the fast sort, because the fast sort leans on it. Recursion solves a problem by breaking it into smaller versions of itself. It needs two things: a base case, the smallest version you can answer outright, and a recursive case, which shrinks the problem and calls itself again.
def countdown(i): print(i) if i <= 0: # base case return else: # recursive case countdown(i - 1)
This prints from
idown to0, then stops. The base case is the thing that stops it from going forever.
Using what we learnt in Quick Sort
Quicksort is fast, recursive, and has a neat trick to it.
The base case is an array of length 0 or 1, which is already sorted, so you just return it. (This is the bit that's easy to get backwards. The base case is len(arr) < 2, not greater. I flipped it the first time and spent a while confused.)
Past the base case, you pick a pivot, the simplest choice being the first element. Then you split the rest into everything less than the pivot and everything greater:
[less than] + pivot + [greater than]
That's sorted around the pivot, but the two halves aren't sorted yet. So you do the same thing to each half, and this is where recursion earns its place. Because each split roughly halves the work and you do n work at each level, the average cost is O(n log n).
def quicksort(arr): if len(arr) < 2: # base case: already sorted return arr pivot = arr[0] less = [i for i in arr[1:] if i <= pivot] greater = [i for i in arr[1:] if i > pivot] return quicksort(less) + [pivot] + quicksort(greater)
Hash Tables
Hash tables store key-value pairs. They're like arrays, except the keys aren't ordered, they're hashed to a location.
Say you have a list of stock prices and want one particular stock. Walking the list one by one is O(n). A hash table lets you store the symbol as the key and the price as the value, and look it up directly in O(1).
# Hash Table - good for fast lookups stocks = {} stocks['AAPL'] = 187.31 stocks['GOOGL'] = 1196.51 stocks['MSFT'] = 123.35 price = stocks['AAPL'] # fast lookup
Hash tables are great for fast lookups but they don't keep order. If you need the keys ordered, reach for an array or a linked list.
A rough cheat sheet for the structures here, average case:
| Data Structure | Search | Insert | Delete |
|---|---|---|---|
| Array | O(n) | O(n) | O(n) |
| Linked List | O(n) | O(1) | O(1) |
| Hash Table | O(1) | O(1) | O(1) |
That's the set I keep coming back to. None of it is complicated once the intuition lands, and writing it down like this is mostly how I made the intuition land.