Crawlee · The scalable web crawling, scraping and automation library for JavaScript/Node.js

The scalable web crawling,
scraping and automation library
for JavaScript/Node.js

Runs on JavaScript

JavaScript is the language of the web. Crawlee builds on popular tools like Playwright, Puppeteer and cheerio, to deliver large-scale high-performance web scraping and crawling of any website. Works best with TypeScript!

Automates any web workflow

Run headless Chrome, Firefox, WebKit or other browsers, manage lists and queues of URLs to crawl, run crawlers in parallel at maximum system capacity. Handle storage and export of results and rotate proxies.

Works on any system

Crawlee can be used stand-alone on your own systems or it can run as a serverless microservice on the Apify Platform.

Automatic scaling

All the crawlers are automatically scaled based on available system resources using the AutoscaledPool class. Advanced options are available to fine-tune scaling behaviour.

Generated fingerprints

Never get blocked with unique fingerprints for browsers generated based on real world data.

Browser like requests from Node.js

Crawl using HTTP requests as if they were from browsers, using auto-generated headers based on real browsers and their TLS fingerprints.

Easy crawling

There are three main classes that you can use to start crawling the web in no time. Need to crawl plain HTML? Use the blazing fast CheerioCrawler. For complex websites that use React, Vue or other front-end javascript libraries and require JavaScript execution, spawn a headless browser with PlaywrightCrawler or PuppeteerCrawler.

Try it out

Install Crawlee into a Node.js project. You must have Node.js 16 or higher installed.

npm install crawlee playwright

Copy the following code into a file in the project, for example main.mjs:

import { PlaywrightCrawler, Dataset } from 'crawlee';

const crawler = new PlaywrightCrawler();

crawler.router.addDefaultHandler(async ({ request, page, enqueueLinks }) => {
    const title = await page.title();
    console.log(`Title of ${request.loadedUrl} is '${title}'`);

    // save some results
    await Dataset.pushData({ title, url: request.loadedUrl });

    // enqueue all links targeting the same hostname
    await enqueueLinks();
});

await crawler.run(['https://www.iana.org/']);

Execute the following command in the project's folder and watch it recursively crawl IANA with Puppeteer and Chromium.

node main.mjs