ScrapeNinja Live Sandbox

Get started right from your browser: execute scraping requests, both JS and non-JS, submitting the form below.

URL to scrape:

Quick examples: HackerNews basic Forbidden country bypass Linkedin Job Posting Zillow Property auto.de results POST JSON Click & dump JSON Website metadata

Hint: {{ note }}

Scrape Engine

Raw Network Request 🚀

Real browser TLS fingerprint. Fast, reliable.

Real Browser 🌐

JS evaluation. 3x slower. Can make screenshots.

Use v2 engine Better success rate for Cloudflare, Datadome and PerimeterX protection

Full page screenshot

Catch subsequent AJAX call (global)

Advanced. Can be used to dump sub-request. See airlines scraper example. Use with caution: bad URL mask will result in timeouts because scraper will be waiting for the AJAX call until the timeout, and won't be able to dump required data! Explore .info.catchedAjax of the ScrapeNinja response to retrieve the dumped request. This is a global handler, you can also specify XHR request catcher per-step (see "Interact with page" sectin below).

Mask for AJAX request url:

Note: V2 engine is only accessible via APIRoad marketplace (not RapidAPI). The following features are not supported in v2:

AJAX request catching
Page interactions (clicks, form filling)
HTTP methods other than GET

But instead, it provides modern Chrome fingerprint, better success rate for protected websites, and extractor presets like markdown and smart main content extraction for webpages with text corpus. By default, only the initial viewport is captured in screenshots; check the option above for a full-page screenshot.

Interact with page

Advanced. Interact with page: fill in forms, and click links!

Step #{{idx+1}}:

Selector:

Wait for navigation Activate if the action triggers page redirect.

Dump XHR (ajax) request

Mask for XHR request url: Catched XHR response body will be available within .info.log events array from the returned ScrapeNinja response (find this particular event by filtering let xhr = body.info.log.find(e => e.type == 'xhr' && e.stepIdx == {{idx+1}})).

Set value

delete step

New Step [+]

Block Images

Do not wait for images to load. Highly recommended for faster page loads and lower proxy traffic usage, unless you need image for better screenshots.

Block CSS and fonts Do not wait for css to load.

HTTP Method

Proxy geo

Custom/premium Proxy URL:

Read more about premium ScrapeNinja proxies. You can also specify your own proxy in the input above.

Number of retries

Wait for selector (JS evaluation):

Wait for DOM selector to appear

Wait time after page loaded:

Wait for specified amount of seconds after page load (from 1 to 12s). Use this only if ScrapeNinja failed to wait for required page elements automatically.

Headers to send (one per line)

HTTP Status Codes Not Expected (one per line)

Each line is a status code (e.g., 403, 404) that will be considered an error if received.

Body Strings Not Expected (one per line)

Each line is a string; if found in the response body, it will be considered an error.

Extractor Preset

New: LLM/MCP ready extractors for Markdown, and Markdown content only! Choose a preset extractor type, or select "[custom]" to use your own extractor function.

Custom extractor (optional) Develop your extractor function in realtime in dedicated sandbox

JS function to extract data from scraped HTML/JSON. Function accepts raw input string as a first argument, Cheerio HTML parser instance as a second argument. You don't have to use Cheerio if you don't need it, you can just use regex to get data if you prefer, see Letterbox rating example extractor. Quick Starter for Cheerio extractor:


                        
                          

                          function (input, cheerio) { 

                            let $ = cheerio.load(input); 

                            return { title: $('#title').text().trim() } 

                        }

Grab extracted results from .extractor json property of ScrapeNinja response. Leave empty to parse everything on your side.

Response Optimization

Include body in response

Uncheck to exclude the full response body (.body property) from the API response (reduces response size).

Include headers in response

Uncheck to exclude response headers (.info.headers property) from the API response (reduces response size).

Scrape it Scraping.. Save example

Raw ScrapeNinja Response:

Latency: {{ responseLatency }}ms HTTP Status: {{ responseBody.info.statusCode }} Billing Points: {{ responseBody.info.billingPoints }}

Submit the form to scrape the URL.

{{ responseBodyFormatted }}

Screenshot

Click to open full size in a new tab.

Unescaped target website response

Access it in your code via responseJson.body property. Use {{cmdKey}}+F for a quick search in the response body. Copy&paste this body to Cheerio Sandbox to develop your extractor.

Unescaped Markdown extractor response

Access it in your code via responseJson.extractor.result.markdown property.

Quick video overview of the sandbox:

Code Generator

Language:

JavaScript

Python

cURL

Use the generated code below to run the requests to ScrapeNinja API from your own server.

import fetch from 'node-fetch';

const url = 'https://scrapeninja.apiroad.net/{{sjUrl}}';

const PAYLOAD = {{ payloadString }};

const options = {
  method: 'POST',
  headers: {
    'content-type': 'application/json',
    // get your key at https://apiroad.net/marketplace/apis/scrapeninja
    'X-Apiroad-Key': 'YOUR-KEY',
  },
  body: JSON.stringify(PAYLOAD)
};

try {
  let res = await fetch(url, options);
  let resJson = await res.json();

  // Basic error handling. Modify if neccessary
  if (!resJson.info || ![200, 404].includes(resJson.info.statusCode)) {
      throw new Error(JSON.stringify(resJson));
  }

  console.log('target website response status: ', resJson.info.statusCode);
  console.log('target website response body: ', resJson.body);
} catch (e) {
  console.error(e);
}

Launching the scraper in your own Node.js environment:

All the code in these instructions has been tested out on Linux Ubuntu and Node.js v16, and will also work on Node.js v14.8, but not lower versions, because of top-level awaits. To make it work on older Node.js versions, wrap the code of try-catch block into (async () => { [CODE GOES HERE] } )();. Check your Node.js version: node -v.

Step 1. Create project folder, initialize empty npm project, and install node-fetch

mkdir your-project-folder && \
  cd "$_" && \
  npm i -g create-esnext && \
  npm init esnext && \
  npm i node-fetch -y

Step 2. Copy&paste the code above

Create new empty file like scraper.js and paste the code to this file.

Step 3. Launch

node ./scraper.js

import requests
import json

url = 'https://scrapeninja.apiroad.net/{{sjUrl}}'

# get your subscription key at https://apiroad.net/marketplace/apis/scrapeninja from "Code snippets",
# copy and paste it to 'x-apiroad-key' header below

headers = {
    "Content-Type": "application/json",
    "x-apiroad-key": "YOUR-APIROAD-KEY"
}

payload =  {{ payloadStringPython }}

response = requests.request("POST", url, json=payload, headers=headers)

response_json = json.loads(response.text)
print(response.__dict__)
print(response_json)

Launching the scraper in your own Python environment:

All the instructions have been tested on Linux Ubuntu and are using Python v3 (though Python v2 should work perfectly fine, too).

Step 1. Create project folder, setup venv virtual environment

mkdir your-project-folder && \
  cd "$_" && \
  python3 -m venv venv && \
  source venv/bin/activate

Step 2: Install requests library in your virtual environment

python3 -m pip install requests

Step 3. Copy&paste the code above

Create new empty file like scraper.py and paste the code to this file.

Step 4. Launch

python3 ./scraper.py

# get your subscription key at https://apiroad.net/marketplace/apis/scrapeninja ,
# copy and paste it to 'x-apiroad-key' header below
{{ curlCommand }}

Running the cURL Command:

Copy the command and run it in your terminal to execute the request.

Sandbox FAQ

What is the ScrapeNinja Sandbox?

The ScrapeNinja Live Sandbox is an online tool designed to swiftly test the scraping capabilities of a specific target website using the ScrapeNinja Scraping API. It eliminates the need for coding or setting up a local environment. Our goal in creating the Sandbox was to simplify the process of exploring how to scrape a specific website, test various proxy countries for that target, and later bootstrap your project faster with the code generation feature provided by the Sandbox.

Is the ScrapeNinja Sandbox a paid service?

While the ScrapeNinja API operates on a subscription model, offering both free and paid plans, the ScrapeNinja Sandbox is currently available for free. This allows you to fully test its capabilities without any subscription. We hope you find the Sandbox a valuable tool for exploring the ScrapeNinja API.