Playwright Stealth and Single-Page Apps
Plain Playwright fails on sites with serious bot detection. The problem isn’t Playwright — it’s the tells that a default Chromium automation leaves everywhere.
What bot-detection scripts actually check:
navigator.webdriver === true(the giveaway)navigator.plugins.length === 0- Missing or inconsistent
navigator.languages - Chrome-specific
window.chromeobject behavior - WebGL vendor/renderer returning
"Google Inc."in headless mode - TLS and HTTP/2 fingerprints that don’t match real Chrome
Patching these by hand is a lost cause. Use playwright-stealth (the Python port of puppeteer-extra-plugin-stealth):
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
async def scrape(url):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1366, "height": 768},
user_agent="Mozilla/5.0 ...",
locale="en-US",
)
page = await context.new_page()
await stealth_async(page)
await page.goto(url, wait_until="domcontentloaded")
await page.wait_for_selector('[data-testid="content"]')
return await page.content()
For SPAs specifically: don’t rely on navigation events — they don’t fire on client-side routing. Wait for the DOM changes themselves:
await page.wait_for_function(
"document.querySelectorAll('.product-card').length > 0"
)
And when you can, intercept the XHR the SPA makes rather than scraping the DOM the SPA renders — same principle as hidden JSON APIs, but applied inside the browser context:
async def handle_response(response):
if "/api/products" in response.url:
data = await response.json()
items.extend(data["items"])
page.on("response", handle_response)
await page.goto(url)
That hybrid approach — drive the browser for session/cookies/challenges, capture JSON at the network layer — is the most robust scraping pattern I know for modern SPAs.