BeautifulSoup Selectors: A Practical Deep Dive
Most BeautifulSoup scrapers use maybe 20% of what .select() supports. Here are the selectors that actually come up.
# Attribute that starts with, ends with, contains
soup.select('a[href^="/product/"]')
soup.select('img[src$=".jpg"]')
soup.select('div[class*="price"]')
# Nth-child and nth-of-type
soup.select('table tr:nth-of-type(n+2)') # skip header row
# Negation
soup.select('li:not(.sold-out)')
# Direct child vs. descendant
soup.select('article > p') # direct children only
soup.select('article p') # any descendant
The one I reach for most: [class*="..."] matching. Sites frequently hash class names for production builds (
price_x7fK), and substring matching survives those rebuilds where exact match doesn’t.
What BeautifulSoup’s select() doesn’t support: :has(), :contains() (BS had its own, non-standard version — don’t
rely on it). If you need those, use lxml.html with XPath, or prefilter with select and then iterate with a plain
Python condition:
labels = [el for el in soup.select('label') if 'Price' in el.get_text()]
That combination — CSS for structure, Python for text predicates — handles nearly every real page.