Get started right from your browser: write cheerio extractor code and test it live!

New Jul 2024 Cheerio Sandbox v2: Use AI to write your JS extractor!

Quick examples (opens in new page): basic   list of items   3 ways of accessing elements   text nodes   HackerNews   Auto.de  
More cheerio examples can be found on Cheerio Cheatsheet
{{ errMsg }} on line {{ errLineNumber }}
Run extractor Extracting..  Latency: {{ responseLatency }}ms  Save

Extracted data

{{ safeDump(evalResult) }}

Console Log

{{ line.out.length > 1 ? line.out.join(', ') : line.out[0] }}

New Jul 2024 Cheerio Sandbox v2: Use AI to write your JS extractor!

Why?

This sandbox was created to streamline Node.js HTML scraper development. It evolved from the primary ScrapeNinja Live Sandbox, which executed HTTP requests and scraped a target website on every form submission. This wasn't efficient for rapid HTML extractor testing, especially with challenging and slow sites. By isolating the HTML extraction component, we've made iterative REPL coding for HTML extraction quicker and more efficient. Debugging cheerio extractors locally can be time-consuming, requiring multiple test runs to ensure consistent syntax. Learn more about the sandbox's creation and functionality in our blog post.

How to write your perfect extractor

Websites change their html layouts and break things. So, perfect and bullet proof extractor is the extractor that you didn't have to write! So make sure the website you are scraping does not provide some sort of JSON API before scraping HTML.

The extractor uses cheerio node.js package so first of all read its documentation.

Cheerio is in a lot of cases similar to jQuery, but with notable and sometimes annoying differences.

The best tool to get and test your css selectors is Chrome Dev Tools console.

Extracting article/blog/news text data from arbitrary websites

To extract articles and news data from multiple websites (hundreds) it's not feasible to write and support a generic js extractor. Instead, consider using a specialized tool: Article Extractor API project, which leverages ScrapeNinja scraping engine under the hood.

How to use in a real project:

You can use this extractor function in your local cheerio installation (you need to have your Node.js installation for this) or in ScrapeNinja extractor field for /scrape endpoint.

Running your extractor locally:

Step #1. Create project folder and install node-fetch&cheerio

mkdir your-project-folder && \ cd "$_" && \
npm i -g create-esnext && \
npm init esnext && \
npm i node-fetch cheerio -y

Step #2. Copy&paste the code

Create new empty file like scraper.js and paste the code to this file:

import cheerio from 'cheerio'

// paste the extractor function here
function extract(input, cheerio) { ... } // the extractor function can now be called as extract()

// retrieve your input from node-fetch or file system
const input = '<h2 class="title">YOUR TEST INPUT</h2>';

let results = extract(input, cheerio);


// the json data is now located in results variable
console.log(results);


Step #3. Launch

node ./scraper.js

Running your scraper with extractor in ScrapeNinja:

Just copy&paste the code of function to "extractor" field in ScrapeNinja sandbox and then put generated ScrapeNinja code to your local node.js script.