Robust HTTP: Errors, Retries, and Exponential Backoff

May 6, 2025 · 1 min read

Scrapers fail. The question is whether yours fails once and stops, or retries intelligently and finishes the job.

Here’s the pattern I reach for — a Session with a Retry adapter:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def build_session():
    s = requests.Session()
    retry = Retry(
        total=5,
        backoff_factor=1.5,
        status_forcelist=(429, 500, 502, 503, 504),
        allowed_methods=frozenset(["GET", "HEAD"]),
        respect_retry_after_header=True,
    )
    s.mount("https://", HTTPAdapter(max_retries=retry))
    return s

session = build_session()
resp = session.get(url, timeout=10)

backoff_factor=1.5 waits ~1.5s, 3s, 6s, 12s, 24s between retries — enough for rate limits and transient 502s to clear without hammering the server.

Don’t retry POST by default. It’s usually non-idempotent; retrying can create duplicate records server-side.

And always log which retries fired. A scraper that quietly retries 3 times on every request is masking a bug — probably a bad selector or a real ban — and you want that visible.