Skip to main content

ScrapeNinja and proxies

All ScrapeNinja requests go through proxies, which are included in all ScrapeNinja plans. You can choose the proxy pool that fits your needs by passing the geo parameter to the ScrapeNinja API. For example, passing geo: "us" will make ScrapeNinja use the US proxy pool for your request. These proxies are always rotating, so every request is done through a different IP address (this is a recommended way to scrape websites to avoid IP bans). Even basic ScrapeNinja proxies have tens of thousands of IP addresses, and they are shared between all ScrapeNinja users.

Of course, these proxy pools are not perfect in terms of quality, and in many cases, you need to use better proxies for optimal results. For example, if you are trying to automate a website that requires a particular country geo, or you are sending a cookie header along with your request to use your website account, you should consider using custom premium proxies.

Basic ScrapeNinja geo pools list

These proxy pools are chosen via geo parameter in the ScrapeNinja API request. These proxies are available to every ScrapeNinja customer, free and paid plans. These are rotating datacenter and ISP proxies, thousands of IP addresses in every pool, here are the possible geos you can specify:

  • us - United States (default proxy pool used if no geo parameter is passed)
  • eu - European Union
  • de - Germany
  • fr - France
  • br - Brazil

If you need wider country selection and better proxy quality, check ScrapeNinja premium proxy packages below.

Things to try before purchasing premium or external proxies

If you see bad results like captcha or 403 errors from time to time, try increasing the retryNum parameter in your ScrapeNinja API request. This will make ScrapeNinja retry the request a few times before returning the result. ScrapeNinja retries are smart (e.g. retries for some of Cloudflare blocked and captcha pages are hardcoded). However, in a lot of cases ScrapeNinja does not understand that the response is bad and does not contain useful information. To help ScrapeNinja understand that a particular response is bad, find some unique string sequence in the response that indicates that the request was blocked, and pass it to the textNotExpected parameter in the ScrapeNinja API request (which accepts an array of strings). This will trigger a retry attempt via another proxy. statusNotExpected is a similar setting which enforces retries on certain HTTP response status codes. For more details on these ScrapeNinja API parameters, refer to the API docs.

An example of ScrapeNinja payload which triggers retry on 'Just a moment' page and 403 and 429 status codes
{
"url": "https://example.com",
"timeout": 8,
"geo": "eu",
"followRedirects": 0,
"statusNotExpected": [
"403",
"429"
],
"textNotExpected": [
"<title>Just a moment"
],
"retryNum": 3
}

How to use ScrapeNinja with custom proxies

Instead of passing the geo parameter to ScrapeNinja /scrape and /scrape-js endpoints, you can pass the proxy parameter with the proxy URL you want to use for your request. Here, you can pass any proxy URL in http://user:pw@host:port format, including your own proxy server or a premium ScrapeNinja proxy service. Refer to API docs for more details on ScrapeNinja API parameters.

caution

If you decide to use your own proxies with ScrapeNinja, make sure these proxies allow connections from any IP address, so ScrapeNinja API servers can make requests through them.

Premium ScrapeNinja proxy service

Premium ScrapeNinja proxy traffic packages are purchased via APIRoad and are billed separately from ScrapeNinja API usage. These datacenter proxies offer great price-to-quality balance: 100+ country geos, rotating and sticky ips, and tens of thousands of IP addresses proxy pool. The cost of premium proxies is $1.5 per GB of traffic, and you can purchase any amount of traffic you need.

tip

You need to sign up on APIRoad and purchase ScrapeNinja premium proxies through APIRoad, but you can use the same ScrapeNinja premium proxy credentials when you call the ScrapeNinja API through APIRoad AND RapidAPI API marketplaces. All ScrapeNinja customers get 50MB gift package of premium proxy traffic automatically on signup at APIRoad.net website. Purchase ScrapeNinja premium proxies here

Basic custom proxy usage with ScrapeNinja

Note that there is no geo parameter in the payload, and the proxy parameter is used instead.

curl --request POST \
--url https://scrapeninja.p.rapidapi.com/scrape \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '{
"url": "https://website.com/post.php",
"proxy": "http://user-u61ot6f78:[YOUR-PASSWORD]@proxy2.scrapeninja.net:8002",
"textNotExpected": ["<div id=\"p\">Your request was blocked</div>", "Some other unique text to retry"],
"retryNum": 3
}'

Premium proxies: set custom country geo

All requests will be done from Portuguese IP address.

curl --request POST \
--url https://scrapeninja.p.rapidapi.com/scrape \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '{
"url": "https://website.com/post.php",
"proxy": "http://user-u61ot6f78,geo-pt:[YOUR-PASSWORD]@proxy2.scrapeninja.net:8002",
"textNotExpected": ["<div id=\"p\">Your request was blocked</div>", "Some other unique text to retry"],
"retryNum": 3
}'

Premium proxies: use consistent (sticky) IP address

All requests will re-use the same IP address (this is especially useful for logged-in automation where target website expects all requests to be completed from the same origin IP). Include a unique sess-anyhash string in the proxy credentials: using the same anyhash hash will ensure that the same proxy IP address is used for a subset of requests.

curl --request POST \
--url https://scrapeninja.p.rapidapi.com/scrape \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '{
"url": "https://website.com/post.php",
"proxy": "http://user-u61ot6f78,geo-pt,sess-anyhash:[YOUR-PASSWORD]@proxy2.scrapeninja.net:8002",
"textNotExpected": ["<div id=\"p\">Your request was blocked</div>", "Some other unique text to retry"],
"retryNum": 3
}'
caution

Note that sticky IP address of premium proxy is released after 20-40 minutes so the IP will change after this time window.