Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Skraper | 209 | 2 days ago | 13 | August 21, 2023 | 4 | apache-2.0 | Kotlin | |||
Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, Coub, Vimeo, IFunny, VK, Odnoklassniki, Pikabu) | ||||||||||
Socialcounters | 102 | 7 years ago | 7 | other | PHP | |||||
jQuery/PHP - Collection of Social Media APIs that display number of your social media fans. Facebook Likes, Twitter Followers, Instagram Followers, YouTube Subscribers, etc.. | ||||||||||
Network Avatar Picker | 87 | 1 | 2 | 2 years ago | 25 | January 04, 2022 | 1 | apache-2.0 | JavaScript | |
A npm module that returns user's social network avatar. Supported providers: facebook, instagram, twitter, tumblr, vimeo, github, youtube and gmail | ||||||||||
Social Circles | 66 | 2 years ago | 1 | mit | CSS | |||||
Well designed social media buttons. | ||||||||||
Omniauth Rails App | 32 | 4 years ago | 1 | Ruby | ||||||
Example Rails 3.2 + OmniAuth application, connected to Facebook, Twitter, Tumblr, YouTube, Vimeo and more | ||||||||||
Donlod | 2 | 5 months ago | 1 | agpl-3.0 | JavaScript | |||||
DONLOD is a social and media platform downloader that doesn't piss you off. | ||||||||||
2017hosts | 2 | 7 years ago | ||||||||
2017hosts | ||||||||||
Interstellar | 1 | 7 years ago | mit | Clojure | ||||||
Here should be some fancy logo
Kotlin/Java library and cli tool which allows scraping and downloading posts, attachments, other meta from more than 10 sources without any authorization or full page rendering. Based on jsoup, jackson and kotlin-coroutines.
Repository contains:
Current list of implemented sources:
Unfortunately, each web-site is subject to change without any notice, so the tool may work incorrectly because of that. If that happens, please let me know via an issue.
Cli tool allows to:
--media-only
from almost all presented sources.Requirements:
Build tool
./mvnw clean package -DskipTests=true
Usage:
./skraper --help
usage: [-h] PROVIDER PATH [-n LIMIT] [-t TYPE] [-o OUTPUT] [-m]
[--parallel-downloads PARALLEL_DOWNLOADS]
optional arguments:
-h, --help show this help message and exit
-n LIMIT, --limit LIMIT posts limit (50 by default)
-t TYPE, --type TYPE output type, options: [log, csv, json, xml, yaml]
-o OUTPUT, --output OUTPUT output path
-m, --media-only scrape media only
--parallel-downloads PARALLEL_DOWNLOADS amount of parallel downloads for media items if
enabled flag --media-only (4 by default)
positional arguments:
PROVIDER skraper provider, options: facebook, instagram,
twitter, youtube, tiktok, telegram, twitch, reddit,
9gag, pinterest, flickr, tumblr, ifunny, vk, pikabu,
vimeo, odnoklassniki, coub
PATH path to user/community/channel/topic/trend
Examples:
./skraper 9gag /hot
./skraper reddit /r/memes -n 5 -t csv -o ./reddit/posts
./skraper instagram /explore/tags/memes -t json
./skraper flickr /photos/harrythehawk -t yaml
./skraper pinterest /levato/meme -t xml
./skraper youtube /user/JetBrainsTV/videos --media-only -n 2
Maven:
<dependency>
<groupId>ru.sokomishalov.skraper</groupId>
<artifactId>skrapers</artifactId>
<version>x.y.z</version>
</dependency>
Gradle kotlin dsl:
implementation("ru.sokomishalov.skraper:skrapers:x.y.z")
As mentioned before, the provider implementation list is:
After that usage as simple as is:
val skraper = InstagramSkraper(client = OkHttpSkraperClient())
Important moment: it is highly recommended to not use DefaultBlockingSkraperClient . There are some more efficient, non-blocking and resource-friendly implementations for SkraperClient. To use them you just have to put required dependencies in the classpath.
Current http-client implementation list:
Each scraper is a class which implements Skraper interface:
interface Skraper {
val client: SkraperClient
fun getPosts(path: String): Flow<Post>
suspend fun getPageInfo(path: String): PageInfo?
fun supports(media: Media): Boolean
suspend fun resolve(media: Media): Media
}
Also, there are some provider-specific kotlin extensions for implementations. You can find them out at the provider implementation package.
Kotlin coroutines is a CPS implementation (aka callbacks).
Here is a quite good java side example of how to call kotlin suspend
functions from plain Java.
To scrape the latest posts for specific user, channel or trend use skraper like that:
suspend fun main() {
val skraper = FacebookSkraper()
val posts = skraper.getUserPosts(username = "memes").take(2).toList() // extension for getPosts()
// or
val postsDetected = Skrapers.getPosts(url = "https://facebook.com/memes") // aggregating singleton
println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(posts))
}
Received data structure is similar to each other provider's. Output data example:
[
{
"id": "5029851093699104",
"text": "gotta love em!",
"publishedAt": 1580744400000,
"statistics": {
"likes": 79,
"comments": 3
},
"media": [
{
"url": "https://facebook.com/memes/posts/5029851093699104?__xts__%5B0%5D=68.ARA2yRI2YnlXQRKX7Pdphh8ztgvnP11aYE_bZFPNmqLpJZLhwJaG24gDPUTiKDLv-J_E09u2vLjCXalpmEuGSmVR0BkVtcng_i6QV8x5e-aZUv0Mkn1wwKLlhp5NNH6zQWKlqDqRjZrwvcKeUi0unzzulRCHRvDIrbz2leM6PLescFySwMYbMmKFc7ctqaC_F7nJ09Ya0lz9Pqaq_Rh6UsNKom6fqdgHAuoHV894a3QRuyY0BC6fQuXZLOLbRIfEVK3cF9Z5UQiXUYruCySF-WpQEV0k72x6DIjT6B3iovYFnBGHaji9VAx2PByZ-MDs33D1Hz96Mk-O1Pj7zBwO6FvXGhkUJgepiwUOVd0q-pV83rS5EhjtPFDylNoNO2xkDUSIi483p49vumVPWtmab8LX1V6w2anf55kh6pedCXcH3D8rBjz8DaTBnv995u9kk5im-1-HdAGQHyKrCZpaA0QyC-I4oGsCoIJGck3RO8u_SoHcfe2tKjTgPe6j9p1D&__tn__=-R",
"aspectRatio": 0.864,
"duration": 10860.000000000
}
]
},
{
"id": "4990218157662398",
"text": "Interesting",
"publishedAt": 1580742000000,
"statistics": {
"likes": 3092,
"comments": 514
},
"media": [
{
"url": "https://scontent.fhrk1-1.fna.fbcdn.net/v/t1.0-0/p526x296/52333452_10157743612509879_529328953723191296_n.png?_nc_cat=1&_nc_ohc=oNMb8_mCbD8AX-w9zeY&_nc_ht=scontent.fhrk1-1.fna&oh=ca8a719518ecfb1a24f871282b860124&oe=5E910D0C",
"aspectRatio": 0.8960573476702509
}
]
}
]
You can see the full model structure for posts and others here
It is possible to scrape user/channel/trend info for some purposes:
suspend fun main() {
val skraper = TwitterSkraper()
val pageInfo = skraper.getUserInfo(username = "memes") // extension for `getPageInfo()`
// or
val pageInfoDetected = Skrapers.getPageInfo(url = "https://twitter.com/memes") // aggregating singleton
println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(pageInfo))
}
Output:
{
"nick": "memes",
"name": "Memes.com",
"description": "http://memes.com is your number one website for the funniest content on the web. You will find funny pictures, funny memes and much more.",
"statistics": {
"posts": 10848,
"followers": 154718
},
"avatar": {
"url": "https://pbs.twimg.com/profile_images/824808708332941313/mJ4xM6PH_normal.jpg"
},
"cover": {
"url": "https://abs.twimg.com/images/themes/theme1/bg.png"
}
}
Sometimes you need to know direct media link:
suspend fun main() {
val skraper = InstagramSkraper()
val info = skraper.resolve(Video(url = "https://www.instagram.com/p/B-flad2F5o7/"))
val serializer = JsonMapper().writerWithDefaultPrettyPrinter()
println(serializer.writeValueAsString(info))
}
Output:
{
"url": "https://scontent-amt2-1.cdninstagram.com/v/t50.2886-16/91508191_213297693225472_2759719910220905597_n.mp4?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=104&_nc_ohc=27bC52qar_oAX-7J2Zh&oe=5EC0BC52&oh=0aafee2860c540452b76e7b8e336147d",
"aspectRatio": 0.8010012515644556,
"thumbnail": {
"url": "https://scontent-amt2-1.cdninstagram.com/v/t51.2885-15/e35/91435498_533808773845524_5302421141680378393_n.jpg?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=100&_nc_ohc=8gPAcByc6YAAX_kDBWm&oh=5edf6b9d90d606f9c0e055b7dbcbfa45&oe=5EC0DDE8",
"aspectRatio": 0.8010012515644556
}
}
There is "static" method which allows to download any media from all known implemented sources:
suspend fun main() {
val tmpDir = Files.createTempDirectory("skraper").toFile()
val testVideo = Skrapers.download(
media = Video("https://youtu.be/fjUO7xaUHJQ"),
destDir = tmpDir,
filename = "Gandalf"
)
val testImage = Skrapers.download(
media = Image("https://www.pinterest.ru/pin/89509111320495523/"),
destDir = tmpDir,
filename = "Do_no_harm"
)
println(testVideo)
println(testImage)
}
Output:
/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Gandalf.mp4
/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Do_no_harm.jpg
To use the bot follow the link.