Skip to main content

Using ScrapeNinja with n8n

n8n is a low-code automation platform that allows you to build automation scenarios without writing code (read a good review of n8n and comparison with Make here). It is very similar to Zapier and Make.com though a bit more technical. Huge advantage of n8n is that is can be self hosted, or you can opt in for a cloud offering which is quite affordable, as well.

ScrapeNinja is a high performance web scraping API that attempts to solve the common challenges web developers face when trying to scrape random e-commerce, social networking or any other website. ScrapeNinja also offers a set of tools to make web scraping easier: cURL converter, ScrapeNinja Playground (with many ScrapeNinja scrapers available for demo purposes), and ScrapeNinja Cheerio Playground.

Integrating ScrapeNinja into n8n is quite easy. Let's see how to do it.

I can scrape via n8n directly, why would I need ScrapeNinja?

n8n can execute HTTP requests and process the result, so technically, it is possible to retrieve output by sending the request directly to a target website. It is a great way to start. However, in a real world, this approach quickly becomes limiting:

  • n8n does not support smart retries via rotating proxies,
  • there is also no easy way to extract pure JSON data from HTML DOM tree, and
  • rendering website as a real browser with JS evaluation is also not possible out of the box.

Also, the JA3 fingerprint of HTTP request of n8n has Node.js fingerprint, which is not as effective as Chrome TLS fingerprint of ScrapeNinja. Cloudflare anti-scraping protection might be triggered by n8n request, while ScrapeNinja (even non-JS engine) can bypass it.

How it works:

Instead of making n8n send the HTTP request to the target website, you send HTTP POST request to ScrapeNinja API and supply the URL you want to get data from.

The examples below are using cURL utlity for brevity. We can use n8n HTTP Request node to do the same thing.

# Let's try to scrape some website... this is a same concept of using n8n HTTP request node to send request directly to a scraped website.
curl https://example.com/product -H "User-Agent: Chrome"

# Output 1, the request failed:
<html><body><h1 style="color: red">Access Denied!</h1><div>This webpage has certain countries forbidden: DE. Your location is: DE</div></body></html>

Okay, this didn't work. let's try the same request with ScrapeNinja, via US proxy pool! ScrapeNinja returns JSON with metadata so we use jq CLI utility here to extract JSON body property.

curl https://scrapeninja.p.rapidapi.com/scrape -d '{"url": "https://example.com/product", "geo":"us"}' -H "Content-Type: application/json" -H "X-Rapidapi-Key: YOUR-KEY" | jq '.body'

# Output 2, via ScrapeNinja:
<html><body><h1 style="color: green">Product Title #1</h1><div class='price'>Price: <span>$321.4</span></div></body></html>

ScrapeNinja Cheerio-powered extractors

Much better! Now let's extract useful data by writing simple JavaScript extractor and sending it to ScrapeNinja cloud. This feature simplifies the process of data extraction and processing, as you don't need to install Cheerio npm package into n8n (and installing npm packages is not possible in n8n Cloud, for example). Read more about ScrapeNinja extractors here.


curl https://scrapeninja.p.rapidapi.com/scrape -d '{"url": "https://example.com/product", "geo":"fr", "extractor": "function extract(html, c) { let $ = c.load(html); return $('.price span').text(); }"}' -H "Content-Type: application/json" -H "X-Rapidapi-Key: YOUR-KEY" | jq '.extractor'

# Output 3, ScrapeNinja with Extractor. ScrapeNinja Cloud executed Cheerio and used supplied JS extractor to retrieve the price from the HTML. Extractor output is put to `extractor` property of ScrapeNinja JSON response.
{ "result": "$321" }

ScrapeNinja real browser engine

ScrapeNinja has a /scrape-js endpoint that uses a real browser to render the website. It can evaluate JavaScript, intercept AJAX calls, and even take screenshots. This is useful when the website relies heavily on JavaScript for rendering content.

tip

Read more about ScrapeNinja architecture on Intro page


# let's top it up a notch. /scrape is using high-perf scraping engine, but what if some real browser rendering is needed?
# Let's use /scrape-js endoint instead, it even takes screenshots and waits for a full page load!
curl https://scrapeninja.p.rapidapi.com/scrape-js -d '{"url": "https://example.com/product", "geo":"fr", "waitForSelector":".price"}' -H "Content-Type: application/json" -H "X-Rapidapi-Key: YOUR-KEY" | jq '.info.screenshot'

# Output 4: the screenshot of a website, via ScrapeNinja real browser rendering engine.
https://cdn.scrapeninja.net/screenshots/website-screenshot.png

# Try ScrapeNinja in a browser: https://scrapeninja.net/scraper-sandbox?slug=hackernews


Converting ScrapeNinja cURL command into n8n HTTP Request node: video demo

Now we know how it works via cURL - let's do the same thing in n8n. It is very easy, thanks to n8n cURL import feature. Here is a 30 second video how you can use ScrapeNinja RapidAPI cURL command and convert it into n8n HTTP Request node:

Of course, it is recommended to extract API key into n8n Credentials if you plan to reuse ScrapeNinja requests in multiple scenarios.

Convert raw HTML into JSON in n8n with ScrapeNinja JS extractors

Now you can copy any ScrapeNinja params into n8n HTTP node and enhance your automation scenarios with web scraping capabilities. For example, in this quick video ScrapeNinja extractor is used to convert HackerNews raw HTML output to structured JSON data: