Back to blog

Mobile proxies for Scrapy: middleware, retries, and rate limits

2026-02-18
Mobile proxies for Scrapy: middleware, retries, and rate limits

A practical guide on using dedicated mobile proxies with Scrapy: downloader middleware, retries, limits, and throttling for reliable crawling.

Why Scrapy struggles without proxies

Scrapy is a popular Python framework for crawling and scraping. It is fast, asynchronous, and easy to scale. The downside is that high request volume from a single IP quickly triggers anti-bot controls: rate limiting, blocks, challenge pages, or silent “soft bans”.

Proxies are not a magic switch, but for real workflows (price and stock monitoring, catalog tracking, regional availability) they are often a core building block. Mobile proxies can be especially helpful when a site treats mobile networks as “normal users” and applies stricter rules to datacenter traffic.

What “dedicated mobile proxies” means in practice

  • Dedicated access (you are not sharing the same exit IP with other customers).
  • Mobile carrier reputation (4G/LTE/5G networks).
  • Rotation and/or sticky sessions (keep the same IP for a time window).
  • Location/operator options for regional content and testing.

Where proxies fit in Scrapy: downloader middleware

In Scrapy, request/response manipulation is typically implemented via downloader middleware. Proxies are applied by setting request.meta["proxy"] (for example http://user:pass@host:port). A custom proxy middleware lets you centralize logic: which proxy to use per request, how to rotate, and how to react to bans.

What a Scrapy proxy middleware should handle

  • Proxy selection by domain, region, and request type (category vs product page vs API).
  • Authentication and session parameters.
  • Ban/limit detection (403/429/503, block pages, captchas).
  • Switching proxy before a retry (avoid repeating the same failure path).
  • Metrics and logs per proxy/region/domain.

Retries: avoid turning errors into more blocking

Scrapy includes a built-in retry middleware for transient failures (timeouts, some 5xx codes, etc.). Naive retries can make blocking worse: you repeat the same request quickly and often via the same route.

Practical strategy: for 429 (rate limiting) apply exponential backoff and reduce concurrency. For 403 (anti-bot) switching the mobile IP and slowing down is often more effective than hammering retries.

Throttling: AutoThrottle, delays, and concurrency

  • DOWNLOAD_DELAY sets a minimum pause.
  • CONCURRENT_REQUESTS and CONCURRENT_REQUESTS_PER_DOMAIN control parallelism.
  • AutoThrottle adjusts delays based on latency and server load signals.
  • AUTOTHROTTLE_DEBUG helps you understand behavior during early runs.

With mobile proxies, “slower but stable” usually wins for long-running monitoring.

Reading the site’s signals: limits, blocks, and soft bans

  • 429: slow down and back off.
  • 403: IP block, anti-bot decision, or request fingerprint issues.
  • 503/520/521: transient or CDN-related errors.
  • 200 with the wrong HTML: captcha or block page disguised as success.

Request fingerprint: headers, cookies, and flow

Even with mobile IPs you can get blocked if the request fingerprint looks unnatural. Use realistic User-Agent values, keep core headers consistent, manage cookies intentionally (either stable sessions or clean stateless requests), and consider a more human-like flow for a subset of pages (e.g., category → product) when the site is sensitive.

Case: regional price and stock monitoring for retailers

Goal: track price and availability for a list of SKUs across multiple retailers, where results differ by region because of warehouses, delivery zones, and local promotions.

  • Model routes as region + domain and keep a sticky session per route.
  • Crawl category/search pages with conservative parallelism; fetch product pages even more carefully.
  • When 403/429 spikes, slow down, rotate IPs, and defer problematic SKUs.

Production checklist

  • Run a pilot on 50–200 URLs and measure 403/429/captcha rates.
  • Define retry policy: which codes, how many attempts, what backoff.
  • Set per-domain limits (delay and concurrency).
  • Validate content to avoid storing junk data.
  • Monitor success rate, average latency, and failing regions/proxies.

Summary

Mobile proxies for Scrapy can improve access to regional content and reduce blocking compared to datacenter traffic. The best results come from a combined approach: a solid proxy middleware, controlled retries, proper throttling, and response validation. For retailer monitoring, this translates into fewer bans and cleaner, more dependable data.