Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Rsshub | 23,832 | 2 | 12 hours ago | 1,820 | September 23, 2022 | 268 | mit | JavaScript | ||
🍰 Everything is RSSible | ||||||||||
Pake | 12,871 | 5 days ago | 3 | mit | Rust | |||||
🤱🏻 Turn any webpage into a desktop app with Rust. 🤱🏻 很简单的用 Rust 打包网页生成很小的桌面 App | ||||||||||
Hitomi Downloader | 11,860 | 20 days ago | 2,096 | Python | ||||||
:cake: Desktop utility to download images/videos/music/text from various websites, and more. | ||||||||||
Alternative Front Ends | 3,765 | 4 days ago | 37 | agpl-3.0 | ||||||
Overview of alternative open source front-ends for popular internet platforms (e.g. YouTube, Twitter, etc.) | ||||||||||
Libredirect | 2,147 | 5 days ago | 14 | gpl-3.0 | JavaScript | |||||
A web extension that redirects popular sites to alternative frontends and backends | ||||||||||
Privacy Redirect | 1,604 | a month ago | 196 | gpl-3.0 | JavaScript | |||||
A simple web extension that redirects Twitter, YouTube, Instagram & Google Maps requests to privacy friendly alternatives. | ||||||||||
Githubposter | 1,390 | 13 hours ago | 27 | February 28, 2022 | 6 | mit | Python | |||
Make everything a GitHub svg poster and Skyline! | ||||||||||
Alternative Frontends | 1,284 | 2 months ago | 10 | gpl-3.0 | ||||||
🔐🌐 Privacy-respecting web frontends for popular services | ||||||||||
Russia It Podcast | 1,128 | a year ago | 11 | |||||||
Список русскоязычных подкастов на тему информационных технологий | ||||||||||
Mproxy | 863 | 6 years ago | 17 | C | ||||||
c 语言实现的一个最小的http代理,支持翻墙 |
At present, most journalists treat social sources like they would any other — individual anecdotes and single points of contact. But to do so with a handful of tweets and Instagram posts is to ignore the potential of hundreds of millions of others.
Many stories lay dormant in the vast amounts of data produced by everyday consumers. Here's a guide and tool box that may help you. What you find below are a number of scripts developed to mine data from APIs.
Slides that explain the work process can be found here. I'm currently in the process of writing more thorough resources on the subject of social media data mining. Feel free to reach out with questions on Twitter @lamthuyvo!
This is a growing list of scripts we've put together to make social data mining easier.
There are broadly three different ways to harvest data from the social web:
The kind of data that official channels like API data streams provide is very limited. Despite harboring warehouses of data on consumers’ behavior, social media companies only provide a sliver of it through their APIs (for Facebook, developers can only get data for public pages and groups, and for Twitter, this access is often restricted to a set number of tweets from a user’s timeline or to a set time frame for search).
Scripts and instructions related to APIs can be found in the 01-apis
directory of this repository.
There are ways for users of social media platforms to request and download archives of their own online persona and behavior. Some services like Facebook or Twitter will allow users to download a history of the data that constitutes their posts, their messaging, or their profile photos.
Scripts and instructions related to personal archives can be found in the 02-personal-archives
directory of this repository.
While there's plenty of social media data on display on the sites you browse, extracting social media data from the platforms through scraping is often against the terms of service. Scraping a social media platform can get users booted from a service and potentially even result in a lawsuit.
If you end up wanting to look into harvesting data from the social web, there is information information related in the 03-scraping
directory of this repository.
Below is a set of instructions you can follow to get your machine ready to run any of the Python scripts in this repository. While Python is one of the most powerful languages for data gathering and analysis, it can take a few tries to get it installed and running properly. If you're a beginner, don't despair though, these growing pains are normal and can vary from machine to machine. We promise the payoff is worth it!
git
installed. A helpful guide to getting a brand new machine set up can be found here, courtesy of NPR's Visuals Team.pip
.cd
bash command.git clone https://github.com/lamthuyvo/social-media-data-scripts.git
cd social-media-data-scripts
pip install -r requirements.txt
or
sudo pip install -r requirements.txt
If you have problems with installing the dependencies through
pip install requests
pip install tweepy --ignore-installed six
pip install beautifulsoup4
or
sudo pip install requests
sudo pip install tweepy --ignore-installed six
pip install beautifulsoup4
Hooray! You're ready to get your data now. We have created a directory for scripts that you can use to get data from each data source.
You can follow the directions for each script in its sub-folders:
01-apis
02-personal-archives
03-scraping
There are numerous useful resources and tools out on the web for social media data gathering. Find an incomplete list that I'll continue to update below.