Webpage Rs

Small Rust library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more
Alternatives To Webpage Rs
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Opengraph1201110 months ago20June 08, 20225mitPHP
A Laravel package to fetch Open Graph data of a website.
Webpage Rs3168 days ago11January 06, 20232Rust
Small Rust library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more
Metaget20
2 years agomitJavaScript
A Node.js module to fetch HTML meta tags (including Open Graph) from a remote URL
Open_graph_reader18
100a year ago12October 13, 20213mitRuby
A library to fetch and parse OpenGraph properties from an URL or a given string.
Puppeteer Prerender14113 years ago67September 15, 2020mitJavaScript
Fetch the pre-rendered content, meta and Open Graph of a SPA
Linkpreviewkit12
17 years ago5February 19, 2016mitHTML
Library to fetch the social media meta tag information from a website URL
Url_scraper9
210 years ago3July 23, 20131mitJavaScript
Gangsta8
6 years agoapache-2.0PHP
Fetch OpenGraph data from a url and display in ExpressionEngine templates
Metadog727 years ago8November 02, 2016mitJavaScript
Sniffs out and fetches open graph and schema.org metadata from webpages.
Metalink4
7 months ago5Ruby
Metalink extracts information from URLs and provide uniformed and structured data using OpenGraph, JSON+LD, oEmbed and other meta tags.
Alternatives To Webpage Rs
Select To Compare


Alternative Project Comparisons
Readme

Webpage.rs

crates.io

Small library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more

Usage

use webpage::{Webpage, WebpageOptions};

let info = Webpage::from_url("http://www.rust-lang.org/en-US/", WebpageOptions::default())
    .expect("Could not read from URL");

// the HTTP transfer info
let http = info.http;

assert_eq!(http.ip, "54.192.129.71".to_string());
assert!(http.headers[0].starts_with("HTTP"));
assert!(http.body.starts_with("<!DOCTYPE html>"));
assert_eq!(http.url, "https://www.rust-lang.org/en-US/".to_string()); // followed redirects (HTTPS)
assert_eq!(http.content_type, "text/html".to_string());

// the parsed HTML info
let html = info.html;

assert_eq!(html.title, Some("The Rust Programming Language".to_string()));
assert_eq!(html.description, Some("A systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.".to_string()));
assert_eq!(html.opengraph.og_type, "website".to_string());

You can also get HTML info about local data:

use webpage::HTML;
let html = HTML::from_file("index.html", None);
// or let html = HTML::from_string(input, None);

Features

Serialization

If you need to be able to serialize the data provided by the library using serde, you can include specify the serde feature while declaring your dependencies in Cargo.toml:

webpage = { version = "1.1", features = ["serde"] }

No curl dependency

The curl feature is enabled by default but is optional. This is useful if you do not need a HTTP client but already have the HTML data at hand.

All fields

pub struct Webpage {
    pub http: HTTP, // info about the HTTP transfer
    pub html: HTML, // info from the parsed HTML doc
}

pub struct HTTP {
    pub ip: String,
    pub transfer_time: Duration,
    pub redirect_count: u32,
    pub content_type: String,
    pub response_code: u32,
    pub headers: Vec<String>, // raw headers from final request
    pub url: String, // effective url
    pub body: String,
}

pub struct HTML {
    pub title: Option<String>,
    pub description: Option<String>,

    pub url: Option<String>, // canonical url
    pub feed: Option<String>, // RSS feed typically

    pub language: Option<String>, // as specified, not detected
    pub text_content: String, // all tags stripped from body

    pub meta: HashMap<String, String>, // flattened down list of meta properties

    pub opengraph: Opengraph,
    pub schema_org: Vec<SchemaOrg>,
}

pub struct Opengraph {
    pub og_type: String,
    pub properties: HashMap<String, String>,

    pub images: Vec<Object>,
    pub videos: Vec<Object>,
    pub audios: Vec<Object>,
}

// Facebook's Opengraph structured data
pub struct OpengraphObject {
    pub url: String,
    pub properties: HashMap<String, String>,
}

// Google's schema.org structured data
pub struct SchemaOrg {
    pub schema_type: String,
    pub value: serde_json::Value,
}

Options

The following configurations are available:

pub struct WebpageOptions {
    allow_insecure: false,
    follow_location: true,
    max_redirections: 5,
    timeout: Duration::from_secs(10),
    useragent: "Webpage - Rust crate - https://crates.io/crates/webpage".to_string(),
}

// usage
let options = WebpageOptions { allow_insecure: true, ..Default::default() };
let info = Webpage::from_url(&url, options).expect("Halp, could not fetch");
Popular Opengraph Projects
Popular Fetch Projects
Popular Data Formats Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Html
Rust
Fetch
Webpage
Rust Library
Serde
Opengraph
Json Ld
Html Parser