Robust HTTP: Errors, Retries, and Exponential Backoff
Scrapers fail. The question is whether yours fails once and stops, or retries intelligently and finishes the job.
Here’s the pattern I reach for — a Session with a Retry adapter:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def build_session():
s = requests.Session()
retry = Retry(
total=5,
backoff_factor=1.5,
status_forcelist=(429, 500, 502, 503, 504),
allowed_methods=frozenset(["GET", "HEAD"]),
respect_retry_after_header=True,
)
s.mount("https://", HTTPAdapter(max_retries=retry))
return s
session = build_session()
resp = session.get(url, timeout=10)
backoff_factor=1.5 waits ~1.5s, 3s, 6s, 12s, 24s between retries — enough for rate limits and transient 502s to clear
without hammering the server.
Don’t retry POST by default. It’s usually non-idempotent; retrying can create duplicate records server-side.
And always log which retries fired. A scraper that quietly retries 3 times on every request is masking a bug — probably a bad selector or a real ban — and you want that visible.