Azlyrics Scraper

🎵 AZLyrics scraper for getting all the song lyrics and publishing to Box (+148k songs)

AZLyrics scraper

HitCount GitHub stars GitHub forks GitHub repo size in bytes GitHub contributors GitHub license

Box folder URL | Static repo website | Kaggle dataset

🎵 AZLyrics scraper for getting all the song lyrics and publishing to Box.

Python requirements

This project is using Python3. All these requirements have been specified in the requirements.lock file.

  1. Requests: used for retrieving the HTML content of a website.
  2. BeautifulSoup: used for scraping an HTML content.
  3. Tor: used for making requests anonymous using other IPs.
  4. Stem: used for authentificating every request with a different IP.
  5. Fake User-Agent: used for using random User-Agent's for every request.
  6. Unidecode: used for cleaning strings from weird characters.
  7. Box SDK: used for uploading/downloading files to/from Box Cloud Storage.


Usage of virtualenv is recommended for package library / runtime isolation.


To run this script, please execute the following from the root directory:

  1. Setup virutal environment

  2. Install dependencies

pip3 install -r requirements.lock
  1. Move JWT configuration file from Box API

  2. Install Tor browser

  3. Configure Tor IP renewal editting /etc/tor/torrc file

    ControlPort 9051
    CookieAuthentication 1
  4. Restart Tor browser

sudo service tor restart
  1. Run the script
python3 -m src

JWT configuration

In order to use Box Cloud Storage API in a secure way, this project is configured for using their service with the JWT authentication. After following the tutorial, we will obtain a configuration file which will have to be located under data folder with the name of jwt_config.json as the configuration file says:

# Box integration
BOX_CONFIG_FILE_PATH = 'data/jwt_config.json'



MIT © AZLyrics scraper

Popular Publishing Projects
Popular Lyrics Projects
Popular Content Management Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
User Agent
Cloud Storage