Back to blog

Crawlee proxies: mobile proxy, IP rotation, and geo scraping

2026-03-05
Crawlee proxies: mobile proxy, IP rotation, and geo scraping

A practical guide to configuring dedicated mobile proxies in Crawlee (Node/Python), validating IP/geo, avoiding common mistakes, and scraping region‑based prices from JS catalogs.

Modern JS-heavy websites (catalogs, marketplaces, configurators) rarely work with “just send a request”. Prices and availability can change by region, data is loaded by scripts, and anti-bot systems may throttle or soft-block repeated traffic. In these scenarios, Crawlee (a crawler framework) plus properly managed proxies — especially mobile proxies — can make your data collection far more stable.

Proxies for Crawlee: what you actually configure

In day-to-day work, proxies for Crawlee means two layers: (1) selection logic — which proxy IP to use for a request/session (rotation vs “sticky” sessions), and (2) where the proxy is injected — into the browser launch/profile (Playwright) or an HTTP client. In Crawlee this is typically handled via ProxyConfiguration and, when needed, SessionPool, which helps you avoid retrying through proxies that are already blocked.

Why Crawlee is a good fit for production crawlers

  • Reliability: queues, retries, concurrency controls, structured error handling.
  • Browser crawling: Playwright integration for JS rendering.
  • Sessions + proxy assignment: built-in rotation and session “stickiness”.

When mobile proxies are worth it

Mobile proxies use IP addresses from mobile carriers and commonly sit behind carrier-grade NAT (CGNAT), where many subscribers share public IPs and addresses can change as the network reallocates them. That can make mobile IPs look more “natural” to some anti-bot systems, and rotation may happen more frequently than with datacenter ranges.

  • Geo QA: verifying prices/shipping/currency for multiple regions.
  • Protected JS catalogs: where APIs are called from the browser and IP-based throttling is strict.
  • Soft blocks: empty lists, alternative content, or forced challenges without a clear 403.

Quick comparison table

Proxy type Pros Cons Best for
Datacenter Fast and cheap Easy to detect, higher block rate Simple sites, internal monitoring, early testing
Residential Better reputation, geo targeting More expensive, variable latency Mid-protection sites and catalogs
Mobile (dedicated) Higher trust signals, “natural” rotation Slower, costly, city-level geo can drift Hard JS targets, region-based pricing

Where to set the proxy in Crawlee “at the profile level”

For browser crawlers, the mental model is: Crawlee picks a proxy URL (via ProxyConfiguration) and then passes it into the browser launch/context. Crawlee docs note that proxy URLs can include username/password (e.g., http://user:pass@host:port).

Required fields and proxy protocols

  • host and port
  • username/password (if your provider requires authentication)
  • protocol: Playwright supports HTTP(S) and SOCKSv5 proxies and lets you specify credentials and bypass hosts.

Crawlee (Node.js): ProxyConfiguration + PlaywrightCrawler

A standard Node setup is: create ProxyConfiguration, pass it into your crawler, and let Crawlee handle assignment. Without a sessionId, proxy URLs rotate round-robin; with a sessionId, the same session can consistently map to the same proxy URL — useful for realistic browsing behavior.

// Node.js (Crawlee + Playwright)
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
  proxyUrls: [
    'http://login:password@host1:port',
    'http://login:password@host2:port',
  ],
});

const crawler = new PlaywrightCrawler({
  proxyConfiguration,
  async requestHandler({ request, session, log }) {
    log.info(`URL: ${request.url}, session: ${session?.id ?? 'none'}`);
    // ... extract data
  },
});

await crawler.run(['https://example.com']);

Tip for mobile proxies: don’t rotate on every request. Keep a sticky session for a batch of pages (for example, one region = one sessionId) and rotate only on block signals or after N pages.

Crawlee (Python): ProxyConfiguration, ProxyInfo, SessionPool

In Crawlee for Python, you configure ProxyConfiguration with proxyUrls and Crawlee will rotate them. In your handler you can inspect which proxy was used via ProxyInfo.

# Python (Crawlee + Playwright)
from crawlee.playwright_crawler import PlaywrightCrawler
from crawlee.configuration import ProxyConfiguration

proxy_config = ProxyConfiguration(proxy_urls=[
    "http://login:password@host1:port",
    "http://login:password@host2:port",
])

crawler = PlaywrightCrawler(proxy_configuration=proxy_config)

@crawler.router.default_handler
async def handle(ctx):
    proxy_info = ctx.proxy_info
    ctx.log.info(f"Using proxy: {proxy_info.url if proxy_info else 'none'}")
    # ... extract data

await crawler.run(["https://example.com"])

How to validate proxies before a real crawl

Validation is more than “the IP changed”. For geo QA you should verify: IP/ASN, country/region, stability (timeouts), and whether all traffic is actually routed via the proxy. Tutorials commonly use HTTPBin to confirm the visible IP quickly.

Check How What “good” looks like
IP is different HTTPBin IP / “what is my IP” Not your office/datacenter IP
Geo matches plan Compare in 2–3 IP-geo services Country matches; city may vary on mobile
Auth works Credentials in URL or proxy fields No 407 Proxy Auth Required
Stability Short test run with low concurrency Low timeout/error rate

Case: scraping product cards from JS catalogs with region-based prices

Goal: collect product name, price, discount, currency, and availability from a JS-rendered catalog that changes pricing by delivery region and applies soft blocks when one IP requests too much.

Practical steps that improve success (without “magic”)

  1. Lock the region beyond IP: align cookies, locale, currency, and shipping settings where possible.
  2. Bind IP to a session: keep “one user” consistent across multiple pages. Crawlee supports mapping proxy URLs to a sessionId.
  3. Detect blocks explicitly: 403/429, challenge pages, empty results, sudden HTML changes — mark the session bad and rotate.
  4. Tune concurrency: mobile proxies are slower; prefer more sessions with lower per-session concurrency.
  5. Log ProxyInfo + page signals to debug region drift and soft blocks.

Why widgets/pixels “break” when you crawl via geo proxies

Switching IP geo often triggers different consent flows and third-party script policies. Many consent tools explicitly block third-party content (maps, video embeds, pixels) until the user grants consent.

Some CMP setups also use geo-targeting (based on IP) to decide whether scripts should load at all. If your proxy geo drifts (common with mobile IPs), scripts may initialize inconsistently — causing missing pixels or broken widgets.

Risks, limits, and common mistakes

  • Geo drift on mobile IPs: country is usually stable; city-level precision is not guaranteed.
  • Latency: increase timeouts and reduce concurrency for mobile networks.
  • Wrong protocol: mixing http/https/socks5.
  • Over-rotation: changing IP per request looks abnormal; prefer sticky sessions.
  • Region mismatch: IP says one thing, cookies/locale/currency say another.
  • Compliance: follow site rules, robots, and legal constraints; avoid collecting personal data without a valid basis.

Sources

  • Crawlee JS proxy management and session-based proxy assignment.
  • Crawlee SessionPool guidance (JS/Python).
  • Crawlee Python ProxyConfiguration/ProxyInfo.
  • Playwright proxy support and parameters.
  • Mobile networks/CGNAT overview (provider explainers).
  • Consent/geo-targeting and third-party script blocking examples.