Webpage Scraper

This is a flask based application which fetches images, hyperlinks, indented source code and text after stripping the html tags from a given webpage and allows you to save them onto your system in a directory or text file with a name of your choice.
Alternatives To Webpage Scraper
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Daily Scraper88
21 hours ago16agpl-3.0HTML
Fetches information about every webpage 🤖
Webpage Rs3166 days ago11January 06, 20232Rust
Small Rust library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more
Awesome Seo Scripts22
3 years agoJavaScript
Random SEO scripts
Metatags19
5 years agoPHP
A Laravel package to fetch webpage metadata ( Open Graph | Twitter | Facebook | Article )
Webpage Scraper15
7 years ago1Python
This is a flask based application which fetches images, hyperlinks, indented source code and text after stripping the html tags from a given webpage and allows you to save them onto your system in a directory or text file with a name of your choice.
Web2db13
3 years ago6September 22, 2020Python
Fetch webpage full-text, persist link and full text to SQLITE3 db, resumable with tqdm progressbar.
Gangsta8
6 years agoapache-2.0PHP
Fetch OpenGraph data from a url and display in ExpressionEngine templates
Metadog727 years ago8November 02, 2016mitJavaScript
Sniffs out and fetches open graph and schema.org metadata from webpages.
Webpage6
5 days ago13HTML
Sources for the JMLR webpage
Xst Google Calendar Events4
2 years ago3mitJavaScript
A react Component, which fetch Calendar entries from Google Calender and output them to your Webpage. It's based on Javascript (react) and create an sortable Event-List Table. No need for PHP or Database-Connection.
Alternatives To Webpage Scraper
Select To Compare


Alternative Project Comparisons
Readme

webpage-scraper

webpage-scraper is a flask based application which allows the users to :

  • Input URL with the freedom of inputting it with/ without the protocol and sub-domain specifiers.
  • Fetch a list of URLs to all the images on the webpage with an option to download all the images in a directory with name specified by the user.
  • Get a list of all the hyperlinks on the webpage. Save them into a text file with a name specified by the user.
  • Get the indented html source code of the webpage and save it in a .html file with a user-provided name.
  • Fetch the text on the webpage stripping the html code. Save it in a text file with a filename of user's choice.
  • The database is deployed on mLab and uses MongoDB for fast access to long list of images, hyperlinks and text for a URL that has been requested by some other user in the past, thus, reducing processing time for subsequent users.

Pre- requisites

To install requirements:

[sudo] pip install requirements

If you don't have pip installed, this Python installation guide can guide you through the process.

To install MongoDB Community Edition:

Make sure you have MongoDB installed

Getting started

git clone http://github.com/mansimarkaur/webpage-scraper 
cd webpage-scraper
python crawler_flask.py

Open http://127.0.0.1:5000/ in your browser. Input URL and have fun 👍

Popular Webpage Projects
Popular Fetch Projects
Popular Text Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Flask
Fetch
Webpage