Each line is a status code (e.g., 403, 404) that will be considered an error if received.
Each line is a string; if found in the response body, it will be considered an error.
New: LLM/MCP ready extractors for Markdown, and Markdown content only! Choose a preset extractor type, or select "[custom]" to use your own extractor function.
JS function to extract data from scraped HTML/JSON. Function accepts raw input string as a first argument, Cheerio HTML parser instance as a second argument. You don't have to use Cheerio if you don't need it, you can just use regex to get data if you prefer, see Letterbox rating example extractor. Quick Starter for Cheerio extractor:
function (input, cheerio) {
let $ = cheerio.load(input);
return { title: $('#title').text().trim() }
}
Grab extracted results from .extractor json property of ScrapeNinja response. Leave empty to parse everything on your side.
Uncheck to exclude the full response body (.body property) from the API response (reduces response size).
Uncheck to exclude response headers (.info.headers property) from the API response (reduces response size).
Access it in your code via responseJson.body property.
Use Ctrl+F for a quick search in the response body. Copy&paste this body to Cheerio Sandbox to develop your extractor.
Unescaped Markdown extractor response
Access it in your code via responseJson.extractor.result.markdown property.
Quick video overview of the sandbox:
Code Generator
Language:
Use the generated code below to run the requests to ScrapeNinja API from your own server.
import fetch from'node-fetch';
const url = 'https://scrapeninja.apiroad.net/scrape';
constPAYLOAD = {
"url": "https://www.apple.com/shop/refurbished/iphone",
"method": "GET",
"retryNum": 2,
"geo": "us",
"extractor": "function (input, cheerio) {\n let $ = cheerio.load(input);\n let items = $('.rf-refurb-category-grid-no-js h3 a').map(function() {\n return $(this).text();\n }).toArray();\n \n return items;\n // wanna check for specific models? replace the return statement above with:\n // return items.filter(v => !v.includes('iPhone 11'))\n}"
};
const options = {
method: 'POST',
headers: {
'content-type': 'application/json',
// get your key at https://apiroad.net/marketplace/apis/scrapeninja'X-Apiroad-Key': 'YOUR-KEY',
},
body: JSON.stringify(PAYLOAD)
};
try {
let res = awaitfetch(url, options);
let resJson = await res.json();
// Basic error handling. Modify if neccessaryif (!resJson.info || ![200, 404].includes(resJson.info.statusCode)) {
thrownewError(JSON.stringify(resJson));
}
console.log('target website response status: ', resJson.info.statusCode);
console.log('target website response body: ', resJson.body);
} catch (e) {
console.error(e);
}
import fetch from 'node-fetch';
const url = 'https://scrapeninja.apiroad.net/scrape';
const PAYLOAD = {
"url": "https://www.apple.com/shop/refurbished/iphone",
"method": "GET",
"retryNum": 2,
"geo": "us",
"extractor": "function (input, cheerio) {\n let $ = cheerio.load(input);\n let items = $('.rf-refurb-category-grid-no-js h3 a').map(function() {\n return $(this).text();\n }).toArray();\n \n return items;\n // wanna check for specific models? replace the return statement above with:\n // return items.filter(v => !v.includes('iPhone 11'))\n}"
};
const options = {
method: 'POST',
headers: {
'content-type': 'application/json',
// get your key at https://apiroad.net/marketplace/apis/scrapeninja
'X-Apiroad-Key': 'YOUR-KEY',
},
body: JSON.stringify(PAYLOAD)
};
try {
let res = await fetch(url, options);
let resJson = await res.json();
// Basic error handling. Modify if neccessary
if (!resJson.info || ![200, 404].includes(resJson.info.statusCode)) {
throw new Error(JSON.stringify(resJson));
}
console.log('target website response status: ', resJson.info.statusCode);
console.log('target website response body: ', resJson.body);
} catch (e) {
console.error(e);
}
Launching the scraper in your own Node.js environment:
All the code in these instructions has been tested out on Linux Ubuntu and Node.js v16, and will also work on Node.js v14.8, but not lower versions, because of top-level awaits. To make it work on older Node.js versions, wrap the code of try-catch block into (async () => { [CODE GOES HERE] } )();. Check your Node.js version: node -v.
Step 2: Install requests library in your virtual environment
python3 -m pip install requests
Step 3. Copy&paste the code above
Create new empty file like scraper.py and paste the code to this file.
Step 4. Launch
python3 ./scraper.py
# get your subscription key at https://apiroad.net/marketplace/apis/scrapeninja ,# copy and paste it to 'x-apiroad-key' header below
curl -X POST -H 'X-RapidAPI-Key: YOUR-KEY' \
-H 'content-type: application/json' \
-H 'X-RapidAPI-Host: scrapeninja.p.rapidapi.com' \
-d '{
"url": "https://www.apple.com/shop/refurbished/iphone",
"method": "GET",
"retryNum": 2,
"geo": "us",
"extractor": "function (input, cheerio) {\n let $ = cheerio.load(input);\n let items = $('\''.rf-refurb-category-grid-no-js h3 a'\'').map(function() {\n return $(this).text();\n }).toArray();\n \n return items;\n // wanna check for specific models? replace the return statement above with:\n // return items.filter(v => !v.includes('\''iPhone 11'\''))\n}"
}' \
"https://scrapeninja.p.rapidapi.com/scrape"
# get your subscription key at https://apiroad.net/marketplace/apis/scrapeninja ,
# copy and paste it to 'x-apiroad-key' header below
curl -X POST -H 'X-RapidAPI-Key: YOUR-KEY' \
-H 'content-type: application/json' \
-H 'X-RapidAPI-Host: scrapeninja.p.rapidapi.com' \
-d '{
"url": "https://www.apple.com/shop/refurbished/iphone",
"method": "GET",
"retryNum": 2,
"geo": "us",
"extractor": "function (input, cheerio) {\n let $ = cheerio.load(input);\n let items = $('\''.rf-refurb-category-grid-no-js h3 a'\'').map(function() {\n return $(this).text();\n }).toArray();\n \n return items;\n // wanna check for specific models? replace the return statement above with:\n // return items.filter(v => !v.includes('\''iPhone 11'\''))\n}"
}' \
"https://scrapeninja.p.rapidapi.com/scrape"
Running the cURL Command:
Copy the command and run it in your terminal to execute the request.
Sandbox FAQ
What is the ScrapeNinja Sandbox?
The ScrapeNinja Live Sandbox is an online tool designed to swiftly test the scraping capabilities of a specific target website using the ScrapeNinja Scraping API. It eliminates the need for coding or setting up a local environment. Our goal in creating the Sandbox was to simplify the process of exploring how to scrape a specific website, test various proxy countries for that target, and later bootstrap your project faster with the code generation feature provided by the Sandbox.
Is the ScrapeNinja Sandbox a paid service?
While the ScrapeNinja API operates on a subscription model, offering both free and paid plans, the ScrapeNinja Sandbox is currently available for free. This allows you to fully test its capabilities without any subscription. We hope you find the Sandbox a valuable tool for exploring the ScrapeNinja API.
Why ScrapeNinja?
Websites are getting harder to scrape. Web scraping protection has evolved from checking user agent and other headers, to checking the IP address of the requester, and lately — to TLS fingerprint analysis. While it's easy to start scraping in any programming language, only specialized solutions can provide a reliable way to scrape data at scale.
ScrapeNinja has proven its high scraping reliability through state-of-the-art real browser TLS fingerprint emulation, rigorous realtime stability monitoring of responses, and advanced technology that helps us find and rotate the best proxies.