BeautifulSoup Selectors: A Practical Deep Dive

May 27, 2025 · 1 min read

Most BeautifulSoup scrapers use maybe 20% of what .select() supports. Here are the selectors that actually come up.

# Attribute that starts with, ends with, contains
soup.select('a[href^="/product/"]')
soup.select('img[src$=".jpg"]')
soup.select('div[class*="price"]')

# Nth-child and nth-of-type
soup.select('table tr:nth-of-type(n+2)')   # skip header row

# Negation
soup.select('li:not(.sold-out)')

# Direct child vs. descendant
soup.select('article > p')    # direct children only
soup.select('article p')      # any descendant

The one I reach for most: [class*="..."] matching. Sites frequently hash class names for production builds ( price_x7fK), and substring matching survives those rebuilds where exact match doesn’t.

What BeautifulSoup’s select() doesn’t support: :has(), :contains() (BS had its own, non-standard version — don’t rely on it). If you need those, use lxml.html with XPath, or prefilter with select and then iterate with a plain Python condition:

labels = [el for el in soup.select('label') if 'Price' in el.get_text()]

That combination — CSS for structure, Python for text predicates — handles nearly every real page.