Five Pagination Patterns and How to Scrape Them
Pagination looks trivial until you hit your fifth different implementation. Here are the patterns worth recognizing on sight.
1. Numbered pages in the URL — ?page=2, ?p=3. Easiest: loop until an empty result or a 404.
2. Offset + limit — ?offset=40&limit=20. Same idea, but stop when you get fewer items than limit.
3. Cursor-based — ?after=eyJpZCI6MTIzfQ. The response tells you the next cursor; stop when it’s absent or null.
cursor = None
while True:
params = {"limit": 100, **({"after": cursor} if cursor else {})}
data = session.get(api, params=params).json()
yield from data["items"]
cursor = data.get("next_cursor")
if not cursor:
break
4. “Load more” button — the button triggers an XHR. Open DevTools → Network, click the button, copy the request. You’ve just found the real API.
5. Infinite scroll — same as #4, but the XHR fires on scroll events. Same fix.
Patterns 3–5 are almost always backed by a JSON API that’s simpler and more stable than the HTML. Ninety percent of the “hard” scraping problems I see are actually HTML-scraping problems that disappear once you notice the underlying API.
The tell: if the page shows a loading spinner before new results appear, there’s an XHR behind it. Always check.