Multi-carrier pool: failover without session drops for mobile proxies

2026-02-07

How to build a 2–3 carrier mobile proxy pool with health checks, routing and sticky sessions so outages don’t break clients and automation.

Why a multi-carrier pool matters

When you run mobile proxies, relying on a single carrier creates a single point of failure. Cellular networks have localized outages and degradations: a specific cell becomes congested, latency and packet loss spike, IPv4 availability changes, or DNS/HTTPS starts timing out. Add maintenance windows and peak-hour overload, and uptime becomes unpredictable unless you have redundancy.

A 2–3 carrier pool brings practical benefits:

Coverage diversity: one carrier is strong in a district, another works better indoors or outside the city.
Capacity headroom: you can spread load across cells and reduce throttling risk.
Higher availability: one carrier incident does not take down the whole fleet.
Policy flexibility: CGNAT, IPv6 behavior and NAT timeouts differ by carrier, which matters for scraping, ads, SMM and QA flows.

What “session drops” actually means

People often mix three different things: (1) the client’s connection to your proxy endpoint, (2) the TCP/TLS connection from your proxy to the target site, and (3) the “login session” at the application layer (cookies/tokens). When an uplink changes (carrier/modem), the public IP and NAT state usually change too, which breaks active TCP/TLS flows. That is why failover can feel like “everything dropped”.

Start by deciding what you want to preserve: a mobile egress IP (the target sees a carrier IP) or seamless continuity for the client (the client does not notice the uplink switch). The design choices are different.

Four failover models

Model 1. Cold failover

Basic multi-WAN routing: carrier A is primary, carrier B is backup. Health check fails → route switch. Easy to implement, but it typically breaks active connections because the path/IP/NAT state changes.

Model 2. Session-aware failover

This is the most realistic approach for mobile egress proxies. Do not migrate active sticky sessions to another carrier automatically. If a modem dies, some active sessions will fail anyway, but you avoid a “storm” caused by moving everyone. New sessions are immediately assigned to healthy carriers. This contains the blast radius.

Model 3. Tunnel-anchored failover

If you anchor traffic inside an overlay tunnel (to an edge/VPS), you can switch between uplinks under the tunnel and keep client-side continuity. SD‑WAN/bonding products often describe “hot failover” that moves traffic to another WAN while keeping session persistence inside the tunnel.

Trade‑off: if your product must provide mobile egress to the target, a tunnel to a VPS makes the egress data-center-like. This model is about seamlessness for the client, not about exposing carrier IPs to the destination.

Model 4. Multipath (MPTCP)

Multipath TCP can maintain a logical connection across multiple subflows and survive interface changes; if one path fails, it can continue over another.

In practice, MPTCP is usually used inside your own infrastructure (edge ↔ tunnel server), because both endpoints must support it or you must terminate it on your gateway.

Reference architecture for a “multi carrier proxy” pool

Modem/carrier layer: multiple LTE/5G modems (or eSIM routers) split across carriers A/B/C; ideally different locations/cells.
Edge routing layer: MikroTik/OpenWrt/Linux performs policy routing, health checks and uplink selection per session/user.
Proxy layer: HTTP(S)/SOCKS with auth, limits and logging.
Session control: a small service/store that keeps “user/session → modem/carrier” mapping, TTL and error counters.
Observability: per-carrier RTT/loss/jitter plus DNS/HTTPS synthetic checks.

How to implement failover without chaos

1) Define a session at the product level

For commercial mobile proxies, a “session” is usually a sticky period (5–30 minutes, 2 hours, or “until rotate”), not a single TCP connection. Bind that session to a single modem/carrier until TTL expires. If the modem is down, either end the session (Strict mode) or allow fallback to another carrier (Resilient mode) with an IP change.

2) Use multi-layer health checks

Ping alone is not enough. Cellular links can “ping fine” while HTTPS breaks due to MTU issues, DNS problems, or packet loss. Use:

Link: interface/IP/service state.
Network: RTT/loss to multiple control IPs.
Application: DNS + HTTPS GET to your own health endpoint and one or two independent domains.

On MikroTik, a common baseline is route monitoring with check-gateway and “recursive routes”, though the simple approach has limitations and may require smarter checks.

3) “Stop assigning” instead of “move everyone”

When a carrier enters DEGRADED/DOWN, stop assigning new sticky sessions to it. Do not automatically reshuffle existing mappings. This prevents cascading reconnect storms and keeps most users stable.

4) Load distribution: consistent hashing + weights

Consistent hashing (or a deterministic user_id → carrier rule) reduces churn during flaps. Weights let you account for different fleet sizes and real-time quality.

5) Sticky at auth + controlled rotate

Use a session id in the username/token. The proxy looks up the mapping and routes traffic via the assigned modem. Rotate is a session id change or an API call.

If you rely on upstream chaining (local proxy → parent proxies), validate how your software handles multiple parents and fallback. For 3proxy, there are reported nuances around multiple parent proxies and include usage that can affect expected behavior.

CGNAT, IPv6 and silent degradations

CGNAT and shorter NAT timeouts can drop long idle sessions even without failover. Document keep-alives or implement them at the proxy layer. IPv6 availability and anti-bot behavior can vary by carrier; offering IPv4-only / IPv6-prefer / dual-stack modes can reduce surprises.

About keeping the same WAN IP across carriers

Holding the exact same public WAN IP while switching providers typically requires BGP and provider cooperation; it is rarely feasible without ISP involvement.

Checklist

At least 2 carriers (3 is better), ideally across different cells/locations.
Link + network + application health checks.
UP/DEGRADED/DOWN states and cooldown after recovery.
Sticky TTL; no forced migration without explicit client opt-in.
Consistent hashing + weights.
Clear client modes: Strict vs Resilient; controlled rotate.
Monitoring plus regular failure drills.

Conclusion

A multi-carrier pool is not “extra SIMs” — it is an SLA strategy. The best results come from session-aware assignment, strong health checks, and predictable failover rules. That is how you build redundancy without turning every outage into a full client-visible incident.