Awesome Open Source
Awesome Open Source

Scrape

Hex.pm Hex.pm Hex.pm

Structured Data extraction from common web resources, using information-retrieval techniques. See the docs

Installation

The package can be installed by adding scrape to your list of dependencies in mix.exs:

def deps do
  [
    {:scrape, "~> 3.0.0"}
  ]
end

Known Issues

  • This package uses an outdated version of httpoison because of keepcosmos/readability. You can override this in your app with override: true and everything should work.
  • The current version 3.X is a complete rewrite from scratch, so some new issues might occur and the API has changed. Please provide some URL to a HTML/Feed document when submitting issues, so I can look into it for bugfixing.

Usage

  • Scrape.domain!(url) -> get structured data of a domain-type url (like https://bbc.com)
  • Scrape.feed!(url) -> get structured data of a RSS/Atom feed
  • Scrape.article!(url) -> get structured data of an article-type url

License

LGPLv3. You can use this package any way you want (including commercially), but I want bugfixes and improvements to flow back into this package for everyone's benefit.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
html (11,248
elixir (1,217
data-science (922
rss (183
scraping (121
information-retrieval (87
feed (79
readability (40
scrape (18