Awesome Open Source
Awesome Open Source

reddit-detective: Play detective on Reddit

Python version Neo4j version Maintenance GitHub license Documentation Status

Quality Gate Status Maintainability Rating Reliability Rating Technical Debt

pip install reddit_detective

reddit-detective represents reddit in a graph structure using Neo4j.

Created to help researchers, developers and people who are curious about how Redditors behave.

Helping you to:

  • Detect political disinformation campaigns
  • Find trolls manipulating the discussion
  • Find secret influencers and idea spreaders (it might be you!)
  • Detect "cyborg-like" activities
    • "What's that?" Check reddit_detective/analytics/metrics.py for detailed information

Installation and Usage

  • Install Neo4j 4.1.0 here
  • Neo4j uses Cypher language as its query language. Knowing Cypher dramatically increases what you can do with reddit-detective Click here to learn Cypher
  • Install reddit-detective with pip install reddit_detective
    • Note: Version 0.1.2 is broken, any other version is fine

Code Samples

Creating a Reddit network graph

Note: Due to limitations of the Reddit API, Each API call may have some delay, so using reddit-detective to collect large data might be unscalable for now. Functionality to convert tabular Reddit data to network graph will be added.

import praw
from neo4j import GraphDatabase

from reddit_detective import RedditNetwork, Comments
from reddit_detective.data_models import Redditor

# Create PRAW client instance
api = praw.Reddit(
    client_id="yourclientid",
    client_secret="yourclientsecret",
    user_agent="reddit-detective"
)

# Create driver instance
driver = GraphDatabase.driver(
    "url_of_database",
    auth=("your_username", "your_password")
)

# Create network graph
net = RedditNetwork(
        driver=driver,
        components=[
            # Other relationship types are Submissions and CommentsReplies
            # Other data models available as components are Subreddit and Submission
            Comments(Redditor(api, "BloodMooseSquirrel", limit=5)),
            Comments(Redditor(api, "Anub_Rekhan", limit=5))
        ]
    )
net.create_constraints() # Optional, doing once is enough
net.run_cypher_code()
net.add_karma(api)  # Shows karma as a property of nodes, optional

Output (in Neo4j): Result

Finding interaction score

# Assuming a network graph is created and database is started

# Interaction score = A / (A + B)
# Where A is the number of comments received in user's submissions
# And B is the number of comments made by the user
from reddit_detective.analytics import metrics

score = metrics.interaction_score(driver, "Anub_Rekhan")
score_norm = metrics.interaction_score_normalized(driver, "Anub_Rekhan")
print("Interaction score for Anub_Rekhan:", score)
print("Normalized interaction score for Anub_Rekhan:", score_norm)

Output:

Interaction score for Anub_Rekhan: 0.375
Normalized interaction score for Anub_Rekhan: 0.057324840764331204

Finding cyborg score

# Assuming a network graph is created and database is started

# For a user, submission or subreddit, return the ratio of cyborg-like comments to all comments
# A cyborg-like comment is basically a comment posted within 6 seconds of the submission's creation
# Why 6? Can't the user be a fast typer? 
#   See reddit_detective/analytics/metrics.py for detailed information

from reddit_detective.analytics import metrics

score, comms = metrics.cyborg_score_user(driver, "Anub_Rekhan")
print("Cyborg score for Anub_Rekhan:", score)
print("List of Cyborg-like comments of Anub_Rekhan:", comms)

Output:

Cyborg score for Anub_Rekhan: 0.2
List of Cyborg-like comments of Anub_Rekhan: ['q3qm5mo']

Running a Cypher statement

# Assuming a network graph is created and database is started

session = driver.session()
result = session.run("Some cypher code")
session.close()

Upcoming features

  • [ ] Convert any tabular Reddit data to Neo4j Graph with given instructions from the user
  • [ ] UserToUser relationships
    • A relationship to link users with its only property being the amount of encounters
    • Having ties with the same submission is defined as an encounter
  • [ ] Add more paper-inspired metrics
  • [ ] Create a wrapper for centrality metrics of Neo4j GDSC (Graph data science library)

Inspiration

List of works/papers that inspired reddit-detective:

authors: [Sachin Thukral (TCS Research), Hardik Meisheri (TCS Research),
Arnab Chatterjee (TCS Research), Tushar Kataria (TCS Research),
Aman Agarwal (TCS Research), Lipika Dey (TCS Research),
Ishan Verma (TCS Research)]

title: Analyzing behavioral trends in community driven
discussion platforms like Reddit

published_in: 2018 IEEE/ACM International Conference on Advances in 
Social Networks Analysis and Mining (ASONAM)

DOI: 10.1109/ASONAM.2018.8508687
Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (796,774
Database (93,036
Network (37,703
Graph (24,410
Social (11,557
Reddit (5,066
Neo4j (3,295
Etl (2,371
Graph Database (894
Cypher (803
Politics (729
Elt (56