Getting Started with requests and BeautifulSoup
The simplest Python scraping stack is still the best place to start: requests to fetch the page, BeautifulSoup to
pick the parts you want.
import requests
from bs4 import BeautifulSoup
resp = requests.get("https://example.com", timeout=10)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "lxml")
title = soup.select_one("h1").get_text(strip=True)
print(title)
Three things to internalize on day one:
Always set a timeout. Without one, a slow server can hang your script indefinitely.
Always call raise_for_status(). Silent 404s and 500s will waste hours when you’re wondering why the selector
returned nothing.
Prefer select and select_one over find_all. CSS selectors read the same way the browser’s DevTools show them,
which shortens the inner loop between “inspect element” and “working scraper.”
Once this pattern is muscle memory, the rest of scraping becomes handling the exceptions to it: pages that need JavaScript, sites that rate-limit you, and data that’s spread across dozens of URLs.