Stweet

Advanced python library to scrap Twitter (tweets, users) from unofficial API
Alternatives To Stweet
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Carbon32,058
8 days ago51mitJavaScript
:black_heart: Create and share beautiful images of your source code
Twint14,6291310a month ago43April 29, 2020589mitPython
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
Trump2cash6,069
a year ago32mitPython
A stock trading bot powered by Trump tweets
T5,4108924 months ago60December 24, 2016182mitRuby
A command-line power tool for Twitter.
Twitter4,4679,261338a day ago47February 12, 2020125mitRuby
A Ruby interface to the Twitter API.
Twitter Scraper3,4831022 months ago11July 17, 202050mitPython
Scrape the Twitter Frontend API without authentication.
Twitter Text2,8913521024 months ago58March 31, 202080apache-2.0HTML
Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.
Tweets_analyzer2,573
3 years ago30gpl-3.0Python
Tweets metadata scraper & activity analyzer
Twitterscraper1,8522212 years ago47July 28, 2020136mitPython
Scrape Twitter for Tweets
Refined Twitter1,276
2 years agomitJavaScript
Browser extension that simplifies the Twitter interface and adds useful features
Alternatives To Stweet
Select To Compare


Alternative Project Comparisons
Readme

stweet

Open Source Love Python package PyPI version MIT Licence

A modern fast python library to scrap tweets and users quickly from Twitter unofficial API.

This tool helps you to scrap tweets by a search phrase, tweets by ids and user by usernames. It uses the Twitter API, the same API is used on a website.

Inspiration for the creation of the library

I have used twint to scrap tweets, but it has many errors, and it doesn't work properly. The code was not simple to understand. All tasks have one config, and the user has to know the exact parameter. The last important thing is the fact that Api can change — Twitter is the API owner and changes depend on it. It is annoying when something does not work and users must report bugs as issues.

Main advantages of the library

  • Simple code — the code is not only mine, every user can contribute to the library
  • Domain objects and interfaces — the main part of functionalities can be replaced (eg. calling web requests), the library has basic simple solution — if you want to expand it, you can do it without any problems and forks
  • 100% coverage with integration tests — this advantage can find the API changes, tests are carried out every week and when the task fails, we can find the source of change easily – not in version 2.0
  • Custom tweets and users output — it is a part of the interface, if you want to save tweets and users custom format, it takes you a brief moment

Installation

pip install -U stweet

Donate

If you want to sponsor me, in thanks for the project, please send me some crypto 😁:

Coin Wallet address
Bitcoin 3EajE9DbLvEmBHLRzjDfG86LyZB4jzsZyg
Etherum 0xE43d8C2c7a9af286bc2fc0568e2812151AF9b1FD

Basic usage

To make a simple request the scrap task must be prepared. The task should be processed by ** runner**.

import stweet as st


def try_search():
    search_tweets_task = st.SearchTweetsTask(all_words='#covid19')
    output_jl_tweets = st.JsonLineFileRawOutput('output_raw_search_tweets.jl')
    output_jl_users = st.JsonLineFileRawOutput('output_raw_search_users.jl')
    output_print = st.PrintRawOutput()
    st.TweetSearchRunner(search_tweets_task=search_tweets_task,
                         tweet_raw_data_outputs=[output_print, output_jl_tweets],
                         user_raw_data_outputs=[output_print, output_jl_users]).run()


def try_user_scrap():
    user_task = st.GetUsersTask(['iga_swiatek'])
    output_json = st.JsonLineFileRawOutput('output_raw_user.jl')
    output_print = st.PrintRawOutput()
    st.GetUsersRunner(get_user_task=user_task, raw_data_outputs=[output_print, output_json]).run()


def try_tweet_by_id_scrap():
    id_task = st.TweetsByIdTask('1447348840164564994')
    output_json = st.JsonLineFileRawOutput('output_raw_id.jl')
    output_print = st.PrintRawOutput()
    st.TweetsByIdRunner(tweets_by_id_task=id_task,
                        raw_data_outputs=[output_print, output_json]).run()


if __name__ == '__main__':
    try_search()
    try_user_scrap()
    try_tweet_by_id_scrap()

Example above shows that it is few lines of code required to scrap tweets.

Export format

Stweet uses api from website so there is no documentation about receiving response. Response is saving as raw so final user must parse it on his own. Maybe parser will be added in feature.

Scrapped data can be exported in different ways by using RawDataOutput abstract class. List of these outputs can be passed in every runner – yes it is possible to export in two different ways.

Currently, stweet have implemented:

  • CollectorRawOutput – can save data in memory and return as list of objects
  • JsonLineFileRawOutput – can export data as json lines
  • PrintEveryNRawOutput – prints every N-th item
  • PrintFirstInBatchRawOutput – prints first item in batch
  • PrintRawOutput – prints all items (not recommended in large scrapping)

Using tor proxy

Library is integrated with tor-python-easy. It allows using tor proxy with exposed control port – to change ip when it is needed.

If you want to use tor proxy client you need to prepare custom web client and use it in runner.

You need to run tor proxy -- you can run it on your local OS, or you can use this docker-compose.

Code snippet below show how to use proxy:

import stweet as st

if __name__ == '__main__':
    web_client = st.DefaultTwitterWebClientProvider.get_web_client_preconfigured_for_tor_proxy(
        socks_proxy_url='socks5://localhost:9050',
        control_host='localhost',
        control_port=9051,
        control_password='test1234'
    )

    search_tweets_task = st.SearchTweetsTask(all_words='#covid19')
    output_jl_tweets = st.JsonLineFileRawOutput('output_raw_search_tweets.jl')
    output_jl_users = st.JsonLineFileRawOutput('output_raw_search_users.jl')
    output_print = st.PrintRawOutput()
    st.TweetSearchRunner(search_tweets_task=search_tweets_task,
                         tweet_raw_data_outputs=[output_print, output_jl_tweets],
                         user_raw_data_outputs=[output_print, output_jl_users],
                         web_client=web_client).run()

Divide scrap periods recommended

Twitter on guest client block multiple pagination. Sometimes in one query there is possible to call for 3 paginations. To avoid this limitation divide scrapping period for smaller parts.

Twitter in 2023 block in API putting time range in timestamp – only format YYYY-MM-DD is acceptable. In arrow you can only put time without hours.

Twint inspiration

Small part of library uses code from twint. Twint was also main inspiration to create stweet.

Popular Tweets Projects
Popular Twitter Projects
Popular Social Media Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Search
Twitter
Scraper
Crawl
Tweets
Twitter Api