News Media Reliability

News Media Reliability
Alternatives To News Media Reliability
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
3 months ago21
A simple guide to HTML <head> elements
9 days ago10mitPHP
The Community Portal.
5 years ago38
blog of sivagao,每天一篇好文章~
a month ago16April 13, 202224gpl-3.0Go
A simple feed aggregator daemon with sugar on top.
Codrops Kinetic Typo121
4 months ago22mitJavaScript
Kinetic typography demos for Codrops article
3 years agootherCSS
Tutorials, articles, datasets and other resources for creating useful, interesting, artistic and friendly online bots.
Codrops Texture Projection47
2 years agoJavaScript
Article about Texture Projection in Three.js
Oembed39222 months ago40June 30, 202222mitPHP
A simple plugin to extract media information from websites, like youtube videos, twitter statuses or blog articles.
410 years ago5September 30, 201213JavaScript
autodafe documentation system
News Media Reliability32
3 years ago1Python
Alternatives To News Media Reliability
Select To Compare

Alternative Project Comparisons

Factuality and Bias Prediction of News Media

This repository describes the work that was published in two papers (see citations below) on predicting the factuality and political bias in news media. Each paper proposes a different set of engineered features collected from sources of information related to the target media.

  author      = {Baly, Ramy  and  Karadzhov, Georgi  and  Alexandrov, Dimitar and  Glass, James  and  Nakov, Preslav},
  title       = {Predicting Factuality of Reporting and Bias of News Media Sources},  
  booktitle   = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
  series      = {EMNLP~'18},
  NOmonth     = {November},
  year        = {2018},
  address     = {Brussels, Belgium},
  NOpublisher = {Association for Computational Linguistics}
  author      = {Baly, Ramy and Karadzhov, Georgi and An, Jisun and Kwak, Haewoon and Dinkov, Yoan and Ali, Ahmed and Glass, James and Nakov, Preslav},
  title       = {What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context},  
  booktitle   = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  series      = {ACL~'20},
  NOmonth     = {July},
  year        = {2020},
  NOpublisher = {Association for Computational Linguistics}


The corpus was created by retrieving websites along with their factuality and bias labels from the Media Bias/Fact Check (MBFC) website. Two versions of the corpus ("emnlp18" and "acl2020") can be found at ./data/{version}/corpus.tsv, and contains the following fields:

  • source_url: the URL to each website (example:
  • source_url_normalized: a shortened version of the source_url (example: These will be used as IDs to split the data into 5 folds of training and testing (in ./data/splits.txt)
  • ref: the link to the page in the MBFC website analyzing the corresponding website (example:
  • fact: the factuality label of each website (low, mixed, or high)
  • bias: the bias label of each website (extreme-right, right, center-right, center, center-left, left, extreme-left)


In addition to the corpus, we provide the different features that we used to obtain the results in our papers. We also include the script that reads these features, train the SVM classifier and writes the performance metrics and output predictions to file. The features can be found at ./data/{version}/features/.

  1. For the "emnlp18" paper, the following features are used:

    • articles_body_glove
    • articles_title_glove
    • has_twitter
    • has_wikipedia
    • twitter_created_at
    • twitter_description
    • twitter_engagement
    • twitter_haslocation
    • twitter_urlmatch
    • twitter_verified
    • url_structure
    • wikipedia_categories
    • wikipedia_content
    • wikipedia_summary
    • wikipedia_toc
  2. For the "acl2020" paper, the following features are used:

    • articles_body_bert
    • articles_title_bert
    • has_facebook
    • has_twitter
    • has_wikipedia
    • has_youtube
    • twitter_profile
    • twitter_followers
    • wikipedia_content
    • youtube_fulltext
    • youtube_nela
    • youtube_numerical
    • youtube_opensmile
    • youtube_subs

Details about each feature can be found in the cited papers. Each of these features is stored as a JSON file, where each key correspond to a source_url (normalized), and its value is a list of numerical values representing this particular feature.

Training and Classification

To run the training script, use a command-line that follows the template below.

python3 -tk [0] -f [1] -ds [2]


  • [0] is the task at hand: "fact" or "bias" prediction
  • [1] is the list of features (from the lists above) that will be used to train the model. features must be comma-separated.
  • [2] is the name of the dataset we are running the experiment on ("acl2020" or "emnlp18").

The performance metrics and output predictions will be stored in ./data/{version}/results/{task}_{features}/

Popular Twitter Projects
Popular Article Projects
Popular Social Media Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.