Skip to main content

Creating JS Extractors

JS extractors are small Javascript functions that can be sent to the ScrapeNinja API along with the URL of the target website. These extractors receive the scraped content as a string. The extractor function is executed in the ScrapeNinja cloud and can use the Cheerio HTML parser to extract useful and clean data from the website's HTML.

Why use JS extractors?

JS extractors are optional. If you are running the ScrapeNinja API on your cloud server (e.g. in a Python or Node.js environment), you don't necessarily need to create a JS extractor. Instead, you can process the raw ScrapeNinja output on your end: using the BeautifulSoup library in Python, or the local Cheerio installed via npm. This can be more convenient. However, if you are running the ScrapeNinja API call in a no-code environment such as, writing extractors becomes invaluable as they return clean JSON that no-code environments can process.

Developing and Testing JS Extractors in the Cheerio Playground

ScrapeNinja provides a dedicated Cheerio Playground to enhance the experience of writing extractors. The Cheerio Playground expects the website output to be placed in a special text area. To obtain the website's HTML, it may be helpful to use the ScrapeNinja Scraper Playground and then copy & paste the HTML from the Scraper Playground into the Cheerio Playground.