Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Headless Chrome Crawler | 5,051 | 10 | 9 | 2 years ago | 21 | June 11, 2018 | 28 | mit | JavaScript | |
Distributed crawler powered by Headless Chrome | ||||||||||
Scraperjs | 3,575 | 134 | 33 | 6 years ago | 15 | December 14, 2015 | 27 | mit | JavaScript | |
A complete and versatile web scraper. | ||||||||||
Html Metadata | 115 | 43 | 25 | 4 years ago | 20 | November 20, 2019 | 16 | mit | JavaScript | |
MetaData html scraper and parser for Node.js (supports Promises and callback style) | ||||||||||
Reactphp Blog Series | 94 | 4 years ago | 2 | PHP | ||||||
Examples for posts from ReactPHP series | ||||||||||
Amazon Scraper | 27 | 3 | 3 | 7 years ago | 4 | October 13, 2016 | JavaScript | |||
A price scraper for Amazon website powered by Node.JS x-ray Promises and passion. | ||||||||||
Yahoo Stock Api | 24 | 11 days ago | 6 | mit | TypeScript | |||||
💰 NPM package to get stock and historical price from finance.yahoo.com | ||||||||||
Tiktok Scraper | 22 | a year ago | 2 | PHP | ||||||
TikTok Scraper® | ||||||||||
Youtube Comment Scraper | 15 | 6 years ago | 3 | mit | JavaScript | |||||
Scraping comments from Youtube. | ||||||||||
Wind Scrape | 14 | a year ago | 3 | gpl-3.0 | TypeScript | |||||
Node package for scraping wind forecast from a few websites | ||||||||||
Yt Scraper | 10 | a year ago | 3 | mit | JavaScript | |||||
Modern YouTube scraper capable of retrieving detailed video and channel info |
MetaData html scraper and parser for Node.js (supports Promises and callback style)
The aim of this library is to be a comprehensive source for extracting all html embedded metadata. Currently it supports Schema.org microdata using a third party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags).
Planned is support for RDFa, AGLS, and other yet unheard of metadata types. Contributions and requests for other metadata types welcome!
npm install html-metadata
Promise-based:
var scrape = require('html-metadata');
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
scrape(url).then(function(metadata){
console.log(metadata);
});
Callback-based:
var scrape = require('html-metadata');
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
scrape(url, function(error, metadata){
console.log(metadata);
});
The scrape method used here invokes the parseAll() method, which uses all the available methods registered in method metadataFunctions(), and are available for use separately as well, for example:
Promise-based:
var cheerio = require('cheerio');
var preq = require('preq'); // Promisified request library
var parseDublinCore = require('html-metadata').parseDublinCore;
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
preq(url).then(function(response){
$ = cheerio.load(response.body);
return parseDublinCore($).then(function(metadata){
console.log(metadata);
});
});
Callback-based:
var cheerio = require('cheerio');
var request = require('request');
var parseDublinCore = require('html-metadata').parseDublinCore;
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
request(url, function(error, response, html){
$ = cheerio.load(html);
parseDublinCore($, function(error, metadata){
console.log(metadata);
});
});
Options object:
You can also pass an options object as the first argument containing extra parameters. Some websites require the user-agent or cookies to be set in order to get the response.
var scrape = require('html-metadata');
var request = require('request');
var options = {
url: "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/",
jar: request.jar(), // Cookie jar
headers: {
'User-Agent': 'webscraper'
}
};
scrape(options, function(error, metadata){
console.log(metadata);
});
The method parseGeneral obtains the following general metadata:
<link rel="apple-touch-icon" href="" sizes="" type="">
<link rel="icon" href="" sizes="" type="">
<meta name="author" content="">
<link rel="author" href="">
<link rel="canonical" href="">
<meta name ="description" content="">
<link rel="publisher" href="">
<meta name ="robots" content="">
<link rel="shortlink" href="">
<title></title>
<html lang="en">
<html dir="rtl">
npm test
runs the mocha tests
npm run-script coverage
runs the tests and reports code coverage
Contributions welcome! All contibutions should use bluebird promises instead of callbacks, and be .nodeify()-ed in index.js so the functions can be used as either callbacks or Promises.