Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Opengraph | 120 | 1 | 1 | 10 months ago | 20 | June 08, 2022 | 5 | mit | PHP | |
A Laravel package to fetch Open Graph data of a website. | ||||||||||
Webpage Rs | 31 | 6 | 8 days ago | 11 | January 06, 2023 | 2 | Rust | |||
Small Rust library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more | ||||||||||
Metaget | 20 | 2 years ago | mit | JavaScript | ||||||
A Node.js module to fetch HTML meta tags (including Open Graph) from a remote URL | ||||||||||
Open_graph_reader | 18 | 100 | a year ago | 12 | October 13, 2021 | 3 | mit | Ruby | ||
A library to fetch and parse OpenGraph properties from an URL or a given string. | ||||||||||
Puppeteer Prerender | 14 | 1 | 1 | 3 years ago | 67 | September 15, 2020 | mit | JavaScript | ||
Fetch the pre-rendered content, meta and Open Graph of a SPA | ||||||||||
Linkpreviewkit | 12 | 1 | 7 years ago | 5 | February 19, 2016 | mit | HTML | |||
Library to fetch the social media meta tag information from a website URL | ||||||||||
Url_scraper | 9 | 2 | 10 years ago | 3 | July 23, 2013 | 1 | mit | JavaScript | ||
Gangsta | 8 | 6 years ago | apache-2.0 | PHP | ||||||
Fetch OpenGraph data from a url and display in ExpressionEngine templates | ||||||||||
Metadog | 7 | 2 | 7 years ago | 8 | November 02, 2016 | mit | JavaScript | |||
Sniffs out and fetches open graph and schema.org metadata from webpages. | ||||||||||
Metalink | 4 | 7 months ago | 5 | Ruby | ||||||
Metalink extracts information from URLs and provide uniformed and structured data using OpenGraph, JSON+LD, oEmbed and other meta tags. |
Small library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more
use webpage::{Webpage, WebpageOptions};
let info = Webpage::from_url("http://www.rust-lang.org/en-US/", WebpageOptions::default())
.expect("Could not read from URL");
// the HTTP transfer info
let http = info.http;
assert_eq!(http.ip, "54.192.129.71".to_string());
assert!(http.headers[0].starts_with("HTTP"));
assert!(http.body.starts_with("<!DOCTYPE html>"));
assert_eq!(http.url, "https://www.rust-lang.org/en-US/".to_string()); // followed redirects (HTTPS)
assert_eq!(http.content_type, "text/html".to_string());
// the parsed HTML info
let html = info.html;
assert_eq!(html.title, Some("The Rust Programming Language".to_string()));
assert_eq!(html.description, Some("A systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.".to_string()));
assert_eq!(html.opengraph.og_type, "website".to_string());
You can also get HTML info about local data:
use webpage::HTML;
let html = HTML::from_file("index.html", None);
// or let html = HTML::from_string(input, None);
If you need to be able to serialize the data provided by the library using serde, you can include specify the serde
feature while declaring your dependencies in Cargo.toml
:
webpage = { version = "1.1", features = ["serde"] }
The curl
feature is enabled by default but is optional. This is useful if you do not need a HTTP client but already have the HTML data at hand.
pub struct Webpage {
pub http: HTTP, // info about the HTTP transfer
pub html: HTML, // info from the parsed HTML doc
}
pub struct HTTP {
pub ip: String,
pub transfer_time: Duration,
pub redirect_count: u32,
pub content_type: String,
pub response_code: u32,
pub headers: Vec<String>, // raw headers from final request
pub url: String, // effective url
pub body: String,
}
pub struct HTML {
pub title: Option<String>,
pub description: Option<String>,
pub url: Option<String>, // canonical url
pub feed: Option<String>, // RSS feed typically
pub language: Option<String>, // as specified, not detected
pub text_content: String, // all tags stripped from body
pub meta: HashMap<String, String>, // flattened down list of meta properties
pub opengraph: Opengraph,
pub schema_org: Vec<SchemaOrg>,
}
pub struct Opengraph {
pub og_type: String,
pub properties: HashMap<String, String>,
pub images: Vec<Object>,
pub videos: Vec<Object>,
pub audios: Vec<Object>,
}
// Facebook's Opengraph structured data
pub struct OpengraphObject {
pub url: String,
pub properties: HashMap<String, String>,
}
// Google's schema.org structured data
pub struct SchemaOrg {
pub schema_type: String,
pub value: serde_json::Value,
}
The following configurations are available:
pub struct WebpageOptions {
allow_insecure: false,
follow_location: true,
max_redirections: 5,
timeout: Duration::from_secs(10),
useragent: "Webpage - Rust crate - https://crates.io/crates/webpage".to_string(),
}
// usage
let options = WebpageOptions { allow_insecure: true, ..Default::default() };
let info = Webpage::from_url(&url, options).expect("Halp, could not fetch");