Cheerio Sandbox: Extract the same cell value using three different traversing methods

Get started right from your browser: write cheerio extractor code and test it live!

Jul 2024 Cheerio Sandbox v2: Use AI to write your JS extractor!

New Oct 2025 Cheerio Sandbox AI agent: Use AI Agent to write your JS extractor!

Three ways to extract data values from HTML using Cheerio:

1. Index-based CSS Selector Extraction: The line tdByIndex extracts the text from the second cell of the second row in a table. The :eq() pseudo-class is used to select elements by their index, starting from 0.

2. Functional Traversing: The tdTraversed property demonstrates a more functional approach to DOM traversal. It starts by selecting a table, finds a td element within it, moves to its parent tr element, navigates to the next tr sibling, and finally selects the second td element within that row.

3. Sibling Element Extraction by Text: The tdByText property showcases a method to extract data based on the text content of a preceding element. This is a perfect approach for HTML pages with fully dynamic CSS classes. The selector $('td:contains("Second row title") + td').text() finds a td element containing the text "Second row title" and then selects its immediate next td sibling to extract its text.

Sample Data (HTML)

Quick examples (opens in new page): basic list of items 3 ways of accessing elements text nodes HackerNews Auto.de

Extractor

More cheerio examples can be found on Cheerio Cheatsheet

Note

{{ errMsg }} on line {{ errLineNumber }}

Run extractor Extracting.. Latency: {{ responseLatency }}ms Save

Extracted data

{{ safeDump(evalResult) }}

Console Log

Jul 2024 Cheerio Sandbox v2: Use AI to write your JS extractor!

New Oct 2025 Cheerio Sandbox AI agent: Use AI Agent to write your JS extractor!

Why?

This sandbox was created to streamline Node.js HTML scraper development. It evolved from the primary ScrapeNinja Live Sandbox, which executed HTTP requests and scraped a target website on every form submission. This wasn't efficient for rapid HTML extractor testing, especially with challenging and slow sites. By isolating the HTML extraction component, we've made iterative REPL coding for HTML extraction quicker and more efficient. Debugging cheerio extractors locally can be time-consuming, requiring multiple test runs to ensure consistent syntax. Learn more about the sandbox's creation and functionality in our blog post.

How to write your perfect extractor

Websites change their html layouts and break things. So, perfect and bullet proof extractor is the extractor that you didn't have to write! So make sure the website you are scraping does not provide some sort of JSON API before scraping HTML.

The extractor uses cheerio node.js package so first of all read its documentation.

Cheerio is in a lot of cases similar to jQuery, but with notable and sometimes annoying differences.

The best tool to get and test your css selectors is Chrome Dev Tools console.

Extracting article/blog/news text data from arbitrary websites

To extract articles and news data from multiple websites (hundreds) it's not feasible to write and support a generic js extractor. Instead, consider using a specialized tool: Article Extractor API project, which leverages ScrapeNinja scraping engine under the hood.

How to use in a real project:

You can use this extractor function in your local cheerio installation (you need to have your Node.js installation for this) or in ScrapeNinja extractor field for /scrape endpoint.

Running your extractor locally:

Step #1. Create project folder and install node-fetch&cheerio

mkdir your-project-folder && \
  cd "$_" && \

  npm i -g create-esnext && \

  npm init esnext && \

  npm i node-fetch cheerio -y

Step #2. Copy&paste the code

Create new empty file like scraper.js and paste the code to this file:

import cheerio from 'cheerio'

// paste the extractor function here
function extract(input, cheerio) { ... } // the extractor function can now be called as extract()

// retrieve your input from node-fetch or file system
const input = '<h2 class="title">YOUR TEST INPUT</h2>';

let results = extract(input, cheerio);


// the json data is now located in results variable
console.log(results);

Step #3. Launch

node ./scraper.js

Running your scraper with extractor in ScrapeNinja:

Just copy&paste the code of function to "extractor" field in ScrapeNinja sandbox and then put generated ScrapeNinja code to your local node.js script.