Setup
You’ll need to install the@mozilla/readability
and the jsdom
npm package:
npm
cheerio
for scraping:
npm
npm
Usage
The below example scrapes a Hacker News thread, splits it based on HTML tags to group chunks based on the semantic information from the tags, then extracts content from the individual chunks:Customization
You can pass the transformer any arguments accepted by the@mozilla/readability
package to customize how it works.