<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Python on Web Scraping Python</title>
    <link>https://webscrapingpython.com/tags/python/</link>
    <description>Recent content in Python on Web Scraping Python</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 14 Apr 2026 10:00:00 +0200</lastBuildDate>
    <atom:link href="https://webscrapingpython.com/tags/python/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Playwright Stealth and Single-Page Apps</title>
      <link>https://webscrapingpython.com/posts/playwright-stealth-spa/</link>
      <pubDate>Tue, 14 Apr 2026 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/playwright-stealth-spa/</guid>
      <description>&lt;p&gt;Plain Playwright fails on sites with serious bot detection. The problem isn&amp;rsquo;t Playwright — it&amp;rsquo;s the tells that a default&#xA;Chromium automation leaves everywhere.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deploying a Scraper: Cron, Docker, Lambda</title>
      <link>https://webscrapingpython.com/posts/deploying-scrapers/</link>
      <pubDate>Tue, 24 Mar 2026 10:00:00 +0100</pubDate>
      <guid>https://webscrapingpython.com/posts/deploying-scrapers/</guid>
      <description>&lt;p&gt;A scraper that works on your laptop is a prototype. Here&amp;rsquo;s how to get it running on a schedule without babysitting.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Async Scraping with httpx and asyncio</title>
      <link>https://webscrapingpython.com/posts/async-httpx-asyncio/</link>
      <pubDate>Tue, 03 Feb 2026 10:00:00 +0100</pubDate>
      <guid>https://webscrapingpython.com/posts/async-httpx-asyncio/</guid>
      <description>&lt;p&gt;For scraping, async is a 10–50× speedup basically for free — IO-bound workloads are the textbook use case. &lt;code&gt;httpx&lt;/code&gt; has&#xA;the same ergonomics as &lt;code&gt;requests&lt;/code&gt; with an async client built in.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Scrapy Pipelines and Middlewares Explained</title>
      <link>https://webscrapingpython.com/posts/scrapy-pipelines-middlewares/</link>
      <pubDate>Tue, 13 Jan 2026 10:00:00 +0100</pubDate>
      <guid>https://webscrapingpython.com/posts/scrapy-pipelines-middlewares/</guid>
      <description>&lt;p&gt;Scrapy&amp;rsquo;s two extension points look similar but do opposite things. Middleware sits between Scrapy and the network;&#xA;pipelines sit between the spider and storage.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Scrapy Basics: When to Upgrade from requests</title>
      <link>https://webscrapingpython.com/posts/scrapy-basics/</link>
      <pubDate>Tue, 30 Dec 2025 10:00:00 +0100</pubDate>
      <guid>https://webscrapingpython.com/posts/scrapy-basics/</guid>
      <description>&lt;p&gt;&lt;code&gt;requests&lt;/code&gt; + &lt;code&gt;BeautifulSoup&lt;/code&gt; is fine until you&amp;rsquo;re managing queues, retries, deduplication, concurrency, and pipeline&#xA;logic by hand. That&amp;rsquo;s when Scrapy starts earning its keep.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Storing Scraped Data: CSV, SQLite, Postgres</title>
      <link>https://webscrapingpython.com/posts/storing-scraped-data/</link>
      <pubDate>Tue, 09 Dec 2025 10:00:00 +0100</pubDate>
      <guid>https://webscrapingpython.com/posts/storing-scraped-data/</guid>
      <description>&lt;p&gt;The right storage for scraped data depends less on scale than on what you plan to do with it next.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rate Limiting and Being a Polite Scraper</title>
      <link>https://webscrapingpython.com/posts/rate-limiting-polite-scraper/</link>
      <pubDate>Tue, 18 Nov 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/rate-limiting-polite-scraper/</guid>
      <description>&lt;p&gt;A scraper that hammers a server at 500 requests/sec is a denial-of-service attack with extra steps. Pacing isn&amp;rsquo;t just&#xA;ethics — it&amp;rsquo;s self-interest. Gentle scrapers don&amp;rsquo;t get blocked.&lt;/p&gt;</description>
    </item>
    <item>
      <title>User-Agents and Browser Fingerprinting</title>
      <link>https://webscrapingpython.com/posts/user-agent-fingerprinting/</link>
      <pubDate>Tue, 21 Oct 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/user-agent-fingerprinting/</guid>
      <description>&lt;p&gt;The default &lt;code&gt;requests&lt;/code&gt; User-Agent (&lt;code&gt;python-requests/2.x&lt;/code&gt;) is the fastest possible way to get blocked. But modern&#xA;anti-bot stacks look at far more than one header.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Proxies and Rotating IPs: When You Actually Need Them</title>
      <link>https://webscrapingpython.com/posts/proxies-rotating-ips/</link>
      <pubDate>Tue, 30 Sep 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/proxies-rotating-ips/</guid>
      <description>&lt;p&gt;Most scraping tutorials reach for proxies on page one. In reality, you should reach for them last — after you&amp;rsquo;ve&#xA;verified a single IP with a good &lt;code&gt;User-Agent&lt;/code&gt; and sensible rate limit actually gets blocked.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Headers, Cookies, and Sessions in requests</title>
      <link>https://webscrapingpython.com/posts/headers-cookies-sessions/</link>
      <pubDate>Tue, 09 Sep 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/headers-cookies-sessions/</guid>
      <description>&lt;p&gt;The difference between a scraper that works once and one that works reliably is usually session management.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Find the Hidden JSON API Behind Any Site</title>
      <link>https://webscrapingpython.com/posts/find-hidden-json-apis/</link>
      <pubDate>Tue, 26 Aug 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/find-hidden-json-apis/</guid>
      <description>&lt;p&gt;Most modern sites that &lt;em&gt;look&lt;/em&gt; like HTML are secretly driven by JSON APIs. Finding that API turns a messy scraping job&#xA;into reading documentation you didn&amp;rsquo;t know existed.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Five Pagination Patterns and How to Scrape Them</title>
      <link>https://webscrapingpython.com/posts/pagination-patterns/</link>
      <pubDate>Tue, 05 Aug 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/pagination-patterns/</guid>
      <description>&lt;p&gt;Pagination looks trivial until you hit your fifth different implementation. Here are the patterns worth recognizing on&#xA;sight.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Playwright for JavaScript-Rendered Pages</title>
      <link>https://webscrapingpython.com/posts/playwright-javascript-rendered-sites/</link>
      <pubDate>Tue, 15 Jul 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/playwright-javascript-rendered-sites/</guid>
      <description>&lt;p&gt;If &lt;code&gt;requests.get(url).text&lt;/code&gt; returns an empty shell with no data, the site renders in the browser. Playwright is the&#xA;cleanest way to scrape it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Scraping with lxml and XPath</title>
      <link>https://webscrapingpython.com/posts/lxml-xpath-scraping/</link>
      <pubDate>Tue, 17 Jun 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/lxml-xpath-scraping/</guid>
      <description>&lt;p&gt;When CSS selectors run out of expressive power, XPath is the next step up. &lt;code&gt;lxml&lt;/code&gt; is also substantially faster than&#xA;BeautifulSoup on large pages.&lt;/p&gt;</description>
    </item>
    <item>
      <title>BeautifulSoup Selectors: A Practical Deep Dive</title>
      <link>https://webscrapingpython.com/posts/beautifulsoup-selectors-deep-dive/</link>
      <pubDate>Tue, 27 May 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/beautifulsoup-selectors-deep-dive/</guid>
      <description>&lt;p&gt;Most BeautifulSoup scrapers use maybe 20% of what &lt;code&gt;.select()&lt;/code&gt; supports. Here are the selectors that actually come up.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Robust HTTP: Errors, Retries, and Exponential Backoff</title>
      <link>https://webscrapingpython.com/posts/http-errors-retries-backoff/</link>
      <pubDate>Tue, 06 May 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/http-errors-retries-backoff/</guid>
      <description>&lt;p&gt;Scrapers fail. The question is whether yours fails once and stops, or retries intelligently and finishes the job.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Getting Started with requests and BeautifulSoup</title>
      <link>https://webscrapingpython.com/posts/getting-started-requests-beautifulsoup/</link>
      <pubDate>Tue, 22 Apr 2025 10:00:00 +0200</pubDate>
      <guid>https://webscrapingpython.com/posts/getting-started-requests-beautifulsoup/</guid>
      <description>&lt;p&gt;The simplest Python scraping stack is still the best place to start: &lt;code&gt;requests&lt;/code&gt; to fetch the page, &lt;code&gt;BeautifulSoup&lt;/code&gt; to&#xA;pick the parts you want.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
