Awesome Open Source
Awesome Open Source

Outbrain Click Prediction challenge solution


The part of the solution is a combination of 5 models:

  • SVM and FTRL on basic features:
    • event features: user id, document id, platform id, day, hour and geo
    • ad features: ad document id, campaign, advertizer id
  • XGB and ET on MTV (Mean Target Value) features:
    • all categorical features that previous model used
    • document features like publisher, source, top category, topic and entity
    • interaction between these featuers
    • also, the document similarity features: the cosine between the ad doc and the page with the ad
  • FFM with the following features:
    • all categorical features from the above, except document similarity, categories, topics and entities
    • XGB leaves from the previous step (see slide 9 from this presentation for the description of the idea)
  • The models are combined with an XGB model (rank:pairwise objective)

To get the 13th positions, models from diaman should also be added

Files description

  • splits the training dataset into two folds
  • prepares the data for SVM and FTRL
  • and train models on data from
  • and extract the leak
  • calculates TF-IDF similarity between the document user on and the ad document
  • and prepare data for MTV features calculation
  • calculates MTV for all features from categorical_features.txt
  • builds an XBG on a small part of data and selects best features to be used on for XGB and ET
  • trains ET model on MTV features
  • trains XGB model on MTV features and creates leaf featurse to be used in FFM
  • creates the input file to be read by ffmlib
  • splits each fold into two subfolds (can't use the original folds because the leaf features are not transferable between folds)
  • runs libffm for training FFM models
  • puts FFM predictions from each fold/subfold together
  • puts all the features and model predictions together for ensembling
  • traings the second level XGB model on top of all these features

The files should be run in the above order

Diaman's features should be included into - and the rest can stay unchanged.

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (55,543
kaggle (111
xgboost (63