Scrapeman
Docs: Proxy & Scrape.do

Proxy and Scrape.do Mode

Every request in Scrapeman can route through a proxy. You can use any standard HTTP/HTTPS proxy, or flip into Scrape.do native mode to access residential rotation, JS rendering, geo targeting, and automatic ban retry — all from the Settings tab.

Standard Proxy

Configure per-request in the Settings tab of the request builder.

  • ProtocolHTTP or HTTPS
  • Host and Port
  • Auth — username and password for proxy basic authentication

The proxy is applied via undici's ProxyAgent. All fields support {{var}} variable interpolation, so you can store proxy credentials in environment variables.

Scrape.do Native Mode

Flip the Scrape.do toggle in the Settings tab to route the request through Scrape.do's infrastructure instead of sending it directly.

When enabled, the main process rewrites the target URL to api.scrape.do and injects the configured parameters. Your Scrape.do token is stored as a secret environment variable and never appears in history on disk.

Residential Rotation

Automatically rotates the outgoing IP address from Scrape.do's residential pool on every request. No manual proxy list management required.

JS Rendering

Spins up a headless browser on Scrape.do's infrastructure to fully render the target page before returning the response. Useful for SPAs, lazy-loaded content, and anti-bot pages that check for browser fingerprints.

Geo Targeting

Route the request through a specific country by selecting a country code. The outgoing IP will appear to originate from that country.

Ban Retry

When enabled, Scrape.do automatically retries the request if it detects a block or detection response from the target server. Retries are handled server-side and transparent to Scrapeman — you receive the final successful response.

Rotating Proxy

Supply a list of proxy URLs and let Scrapeman rotate through them automatically. In the Settings tab toggle Rotate through multiple proxies and add proxy URLs one per line. Pick a strategy:

  • Round-robin — cycles through the list in order. The position is shared across all concurrent slots in a run, so the Collection Runner rotates per request and the Load Runner rotates per concurrent slot.
  • Random — picks a random proxy for each request.

When the rotate list is non-empty, the single URL field is ignored. If the list is empty, the single URL is used.

User-Agent Presets

The Settings tab has a User-Agent picker with 9 presets. Select one to set the User-Agent header for that request. A preview shows the exact UA string below the picker. A custom User-Agent in the Headers tab always overrides the preset.

Preset Label
scrapemanScrapeman <version> (default)
chrome-macosChrome 124 macOS
chrome-windowsChrome 124 Windows
firefox-macosFirefox 125 macOS
firefox-windowsFirefox 125 Windows
safari-macosSafari 17 macOS
safari-iosSafari 17 iOS
googlebotGooglebot 2.1
curlcurl 8.7

Anti-Bot Detection

After every request, Scrapeman inspects the response for anti-bot signals and shows a dismissable banner above the body when one is found.

Signal Trigger
Cloudflarecf-ray header present, or HTTP 403 with a Cloudflare browser-check body
Rate limitedHTTP 429 or a Retry-After header
CAPTCHABody contains hcaptcha, recaptcha, captcha-container, or turnstile
Bot blockHTTP 403 with body matching access denied, bot detected, automated access, or automated request

When a Retry-After header is present, the seconds-to-wait countdown shows in the banner. Cloudflare is checked before rate limit, rate limit before CAPTCHA, CAPTCHA before bot block. Only one signal is shown per response.

Rate Limiting

Per-request rate limit controls the delay the Collection Runner and Load Runner insert between requests. It has no effect on a single send. Configure under Settings → Rate limit:

  • Fixed delay — wait this many milliseconds after each request.
  • Jitter min / max — add a random extra delay between min and max ms on top.

Run-level delay (from the Load Runner config) and per-request rate limit stack in a non-additive way: if the run-level delay is greater than 0, the per-request rate limit is not added on top.