Awesome Open Source
Awesome Open Source

Feature Forge

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).

Most machine learning problems involve an step of feature definition and preprocessing. Feature Forge helps you with:

  • Defining and documenting features
  • Testing your features against specified cases and against randomly generated cases (stress-testing). This helps you making your application more robust against invalid/misformatted input data. This also helps you checking that low-relevance results when doing feature analysis is actually because the feature is bad, and not because there's a slight bug in your feature code.
  • Evaluating your features on a data set, producing a feature evaluation matrix. The evaluator has a robust mode that allows you some tolerance both for invalid data and buggy features.
  • Experimentation: running, registering, classifying and reproducing experiments for determining best settings for your problems.


Just pip install featureforge.


Documentation is available at

Contact information

Feature Forge is copyright 2014 Machinalis ( Its primary authors are:

Any contributions or suggestions are welcome, the official channel for this is submitting github pull requests or issues.


  • StatsManager api change (order of arguments swapped)
  • For experimentation, enabled a way of booking experiments forever.
  • Bug fixes related to sparse matrices.
  • Small documentation improvements.
  • Reduced default logging verbosity.
  • Using sparse numpy matrices by default.
  • Discarded the need of using forked version of Schema library.
  • Added support for running and generating stats for experiments
  • Fixing installer dependencies
  • Added support for python 3
  • Added support for bag-of-words features
  • Initial release

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (888,878
Sparse (2,701
Scikit Learn (2,538
Matrices (2,317