Awesome Open Source
Awesome Open Source

Twitter Stream Word Count

Use Apache Storm to ingest live tweets from Twitter Stream API, and stores word count in Postgres database for further analysis

Application Architecture

See this document.

Steps to run the application

  1. Create AWS EC2 instance using UCB W205 AMI
  2. Make sure all the dependencies are there:
    • Python 2.7
    • virtualenv
    • lein
    • streamparse
    • psycopg2
    • tweepy
    • redis
  3. Start Postgres DB
  4. Download the project folder to your preferred location
  5. Go into the project folder
  6. Run dbsetup python script to create databse and table: $ python
  7. Go into tweetwordcount folder: $ cd tweetwordcount
  8. Run storm application: $ sparse run
  9. You may see the following warning:
    • WARNING: You're currently running as root; probably by accident.
    • Press control-C to abort or Enter to continue as root.
    • Set LEIN_ROOT to disable this warning.
  10. Just press enter to continue
  11. Application should be running now. You can exit with Ctrl-C

Steps to run the serving scripts

  1. Go into the project folder
  2. Go into serves folder $ cd serves
  3. Get all the words with their total count of occurrences, sorted alphabetically in an ascending order: $ python
  4. Get counts for a particular word: $ python [your word]
  5. Get counts for a range ordered by their total number of occurrences: $ python [lower],[upper]

Alternative Project Comparisons
Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (796,774
Postgres (23,191
Streaming (9,650
Tweets (8,399
Count (7,887
Storm (1,651
Apache Storm (133
Twitter Streaming Api (105