A collection of tools for chemometrics and machine learning written in Julia.
Alternatives To Chemometricstools.jl
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Jina17,93129 hours ago2,019July 06, 202239apache-2.0Python
🔮 Build multimodal AI services via cloud native technologies · Neural Search · Generative AI · Cloud Native
Kubeflow12,4082a day ago112April 13, 2021401apache-2.0TypeScript
Machine Learning Toolkit for Kubernetes
Tpot8,99640185 days ago60January 06, 2021284lgpl-3.0Python
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Kedro8,2128328 hours ago35May 09, 2022281apache-2.0Python
A Python framework for creating reproducible, maintainable and modular data science code.
Stanza6,5512689 hours ago17April 23, 202274otherPython
Official Stanford NLP Python Library for Many Human Languages
Augmentor4,87821818 hours ago22April 27, 2022134mitPython
Image augmentation library in Python for machine learning.
Clearml4,23988 hours ago93July 04, 2022311apache-2.0Python
ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
3 years ago3mitHTML
An in-depth machine learning tutorial introducing readers to a whole machine learning pipeline from scratch.
a day ago14April 06, 2022124agpl-3.0Python
Build data pipelines, the easy way 🛠️
Mage Ai3,647
17 hours ago9June 27, 202255apache-2.0Python
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Alternatives To Chemometricstools.jl
Select To Compare

Alternative Project Comparisons

Build Status


This package contains a collection of tools to perform fundamental and advanced Chemometric analysis' in Julia. It is currently richer than any other free chemometrics package available in any other language. If you are uninformed as to what Chemometrics is; it could nonelegantly be described as the marriage between data science and chemistry. Traditionally it is the symbiosis of applied linear algebra/statistics which is disciplined by the physics and meaning of chemical measurements. This is somewhat orthogonal to most specializations of machine learning where "add more layers" is the modus operandi. Sometimes chemometricians also weigh the pros and cons of black box modelling and break out pure machine learning methods - so some of those techniques are in this package.


Shootouts/Modeling Examples:

Package Status => Closer to Acceptability (v 0.5.8)

ChemometricsTools has been accepted as an official Julia package! Yep, so you can Pkg.add("ChemometricsTools") to install it. A lot of features have been added since the first public release (v 0.2.3 ). In 0.5.8 almost all of the functionality available can be used/abused. If you find a bug or want a new feature don't be shy - file an issue. In v0.5.1 Plots was removed as a dependency, new plot recipes were added, and now the package compiles much faster! Multilinear modeling, univariate modeling, and DOE functions are now available. Making headway into the release plan for v0.6.0. Convenience functions, documentation, bug fixes, refactoring and clean up are in progress bare with me. The git repo's master branch typically has the most advanced version, but the features on it may be less reliable because I like to do development on it.

Seeking Collaborators

So my time and efforts for building this package are constrained. I really would like to find some collaborators to help flesh this package out, use it, find bugs. Even if your interests are more leaning towards machine learning/statistics I'd love to hear from you. Please file an issue if you are interested - or send me a message on Julia Discourse (ckneale)!

Version Release Strategy

  • < 0.3.0 : Mapping functionality, prototyping
  • < 0.5.0 : Testing via actual usage on real data, look for missing essentials
  • < 0.6.0 : Bake in convenience functions for ease of use. Flesh out Documentation.
  • < 0.7.5 : Public input (find those bugs!). Adequate Unit Tests.
  • < 1.0.0 : Focus on performance, stability, generalizability, lock down the package syntax.

Package Highlights


Two design choices introduced in this package are "Transformations" and "Pipelines". We can use transformations to treat data from multiple sources the same way. This helps mitigate user error for cases where test data is scaled based on training data, calibration transfer, etc.

Multiple transformations can easily be chained together and stored using "Pipelines". Pipelines aren't "pipes" like are present in Bash, R and base Julia. They are flexible, yet immutable, convenience objects that allow for sequential preprocessing and data transformations to be reused, chained, or automated for reliable analytic throughput.

Model training

ChemometricsTools offers easy to use iterators for K-folds validation's, and moving window sampling/training. More advanced sampling methods, like Kennard Stone, are just a function call away. Convenience functions for interval selections, weighting regression ensembles, etc are also available. These allow for ensemble models like SIPLS, P-DS, P-OSC, etc to be built quickly. With the tools included both in this package and Base Julia, nothing should stand in your way.

Regression Modeling

This package features dozens of regression performance metrics, and a few built in plots (Bland Altman, QQ, Interval Overlays etc) are included. The list of regression methods currently includes: CLS, Ridge, Kernel Ridge, LS-SVM, PCR, PLS(1/2), ELM's, Regression Trees, Random Forest, Monotone Regression... More to come. Chemometricians love regressions! I've also added some convenience functions for univariate calibrations, standard addition experiments and some automated plot functions for them.

Classification Modeling

In-house classification encodings (one cold/one hot), and easy to retrieve global or multiclass performance statistics. ChemometricsTools currently includes: LDA/PCA with Gaussian discriminants, Hierchical LDA, SIMCA, multinomial softmax/logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, Probabilistic Neural Networks, LinearPerceptrons, and more to come. You can also conveniently dump classification statistics to LaTeX/CSV reports!

Multiway/Multilinear Modeling

I've been working to fulfill an obvious gap in the available tooling. Standard methods for Tucker decomposition (HOSVD, and HOOI) have been included. Some preprocessing methods, and even an early view at multilinear PLS. There's a lot that could be done here, please feel free to contribute!

Specialized tools?

This package has tools for specialized fields of analysis'. For instance, fractional derivatives for the electrochemists (and the adventurous), a handful of smoothing methods for spectroscopists, curve resolution (unimodal and nonnegativity constraints available) for forensics, process fault detection methods, etc. There are certainly plans for other tools for analyzing chemical data that packages in other languages have seemingly left out. Stay tuned.

Where's the Data?

Please check out ChemometricsData.jl for access to more publicly available datasets.

Right now the 2002 International Diffuse Reflectance Conference Pharmaceutical NIR, iris, Tecator aka 'meat', and ball gear fault detection (NASA) dataset are included in this package. But, this will be factored out eventually into ChemometricsData.jl.

I'd love for a collaborator to contribute some: spectra, chromatograms, etc. Please reach out to me if you wish to collaborate/contribute. In the mean time you can load in your own datasets using the full extent of Julia ecosystem (XLSX.jl, CSV.jl, JSON.jl, MATLAB.jl, LibPQ.jl, Feather.jl, Arrow.jl, etc).

What about Time Series? Cluster modeling?

Well, I'd love to hammer in some time series methods. That was originally part of the plan. Then I realized OnlineStats.jl already has the essentials for online learning covered, and a there are many efforts for actual time series((TimeSeries.jl)[]) modelling in the works.

Similarly, if clustering methods are important to you, check out Clustering.jl. I may add a few supportive odds and ends in here (or contribute to the packages directly) but really, most of the Julia 1.0+ ecosystem is really reliable, well made, and community supported.


  • Clean up.
  • Performance improvements.
  • Syntax improvements.
  • Documentation improvements.
  • Unit tests.


  • Design of Experiment tools (Partial Factorial design, D/I-optimal, etc...)?
  • Convenience fns propagation of error, multiequilibria, kinetics?
  • Electrochemical simulations and optical simulations (maybe separate packages...)?
Popular Machine Learning Projects
Popular Pipeline Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Machine Learning