Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
T | 5,410 | 89 | 2 | 4 months ago | 60 | December 24, 2016 | 182 | mit | Ruby | |
A command-line power tool for Twitter. | ||||||||||
Twitterscraper | 1,852 | 22 | 1 | 2 years ago | 47 | July 28, 2020 | 136 | mit | Python | |
Scrape Twitter for Tweets | ||||||||||
Apps.loklak.org | 1,080 | 4 years ago | 14 | lgpl-2.1 | JavaScript | |||||
loklak apps site http://apps.loklak.org | ||||||||||
Getoldtweets Python | 1,053 | 3 years ago | 160 | mit | Python | |||||
A project written in Python to get old tweets, it bypass some limitations of Twitter Official API. | ||||||||||
Twitter Advanced Search | 885 | 3 months ago | 4 | |||||||
Advanced Search for Twitter. | ||||||||||
Rtweet | 778 | 29 | 9 | a day ago | 10 | May 19, 2019 | 18 | other | R | |
🐦 R client for interacting with Twitter's [stream and REST] APIs | ||||||||||
Search Tweets Python | 738 | 2 | 2 months ago | 10 | July 01, 2021 | 20 | mit | Python | ||
Python client for the Twitter 'search Tweets' and 'count Tweets' endpoints (v2/Labs/premium/enterprise). Now supports Twitter API v2 /recent and /all search endpoints. | ||||||||||
Tweetscraper | 698 | 2 years ago | 8 | April 29, 2018 | 1 | gpl-2.0 | Python | |||
TweetScraper is a simple crawler/spider for Twitter Search without using API | ||||||||||
Chatterbot | 495 | 59 | 24 days ago | 35 | May 29, 2021 | mit | Ruby | |||
A straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate. | ||||||||||
Emailharvester | 399 | 3 years ago | gpl-3.0 | Python | ||||||
Email addresses harvester |
TweetScraper
can get tweets from Twitter Search.
It is built on Scrapy without using Twitter's APIs.
The crawled data is not as clean as the one obtained by the APIs, but the benefits are you can get rid of the API's rate limits and restrictions. Ideally, you can get all the data from Twitter Search.
WARNING: please be polite and follow the crawler's politeness policy.
Install conda
, you can get it from miniconda. The tested python version is 3.7
.
Install selenium python bindings: https://selenium-python.readthedocs.io/installation.html. (Note: the KeyError: 'driver'
is caused by wrong setup)
For ubuntu or debian user, run:
$ bash install.sh
$ conda activate tweetscraper
$ scrapy list
$ #If the output is 'TweetScraper', then you are ready to go.
the install.sh
will create a new environment tweetscraper
and install all the dependencies (e.g., firefox-geckodriver
and firefox
),
Change the USER_AGENT
in TweetScraper/settings.py
to identify who you are
USER_AGENT = 'your website/e-mail'
In the root folder of this project, run command like:
scrapy crawl TweetScraper -a query="foo,#bar"
where query
is a list of keywords seperated by comma and quoted by "
. The query can be any thing (keyword, hashtag, etc.) you want to search in Twitter Search. TweetScraper
will crawl the search results of the query and save the tweet content and user information.
The tweets will be saved to disk in ./Data/tweet/
in default settings and ./Data/user/
is for user data. The file format is JSON. Change the SAVE_TWEET_PATH
and SAVE_USER_PATH
in TweetScraper/settings.py
if you want another location.
Keeping the crawler up to date requires continuous efforts, please support our work via opencollective.com/tweetscraper.
TweetScraper is released under the GNU GENERAL PUBLIC LICENSE, Version 2