Playwright for JavaScript-Rendered Pages

July 15, 2025 · 1 min read

If requests.get(url).text returns an empty shell with no data, the site renders in the browser. Playwright is the cleanest way to scrape it.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto(url, wait_until="networkidle")

    # Wait for the actual thing you want, not the page
    page.wait_for_selector('[data-testid="price"]')

    price = page.locator('[data-testid="price"]').inner_text()
    browser.close()

Two rules that prevent most flaky Playwright scrapers:

Wait for the content, not the page. networkidle is a good coarse signal, but SPAs often fire stray analytics requests that keep it from ever settling. Waiting for the selector you actually care about is more reliable.

Prefer locator over query_selector. Locators auto-retry until the element is actionable, which eliminates a whole category of “element not found” flakes on lazy-loaded content.

Before reaching for Playwright, though, open DevTools → Network and filter to XHR/Fetch. If the page calls an internal JSON API, hitting that directly with requests is 50–100× faster than driving a browser. More on that in a future post.