What's new in Hyperlearn 2022!
Ask complex questions you never thought possible like
How long will I live?
When will the market next crash?
How will climate change affect me?
Would you wait a second for a Google search? How about 5 hours for a prediction?
Hyperlearn makes Moonshot run fast and makes ML algos faster and use less memory.
We're building a full Earth simulation to predict the future of everything and make JARVIS reality.
! Hyperlearn is under construction! A stable package will be reuploaded mid 2022! Stay tuned!
Moonshot Website (under SEVERE construction)
50 Page Modern Big Data Algorithms PDF
In 2018-2020, I was at NVIDIA helping make GPU ML algos faster! I incorporated Hyperlearn's methods to make TSNE 2000x faster, and others faster. Since then, I have 50+ fast algos, but didn't have time to update Hyperlearn since Moonshot was priority one! I'll be updating Hyperlearn mid 2022!
Hyperlearn's algorithms, methods and repo has been featured or mentioned in 5 research papers!
+ Microsoft, UW, UC Berkeley, Greece, NVIDIA
Hyperlearn's methods and algorithms have been incorporated into 5 organizations and repositories!
+ Facebook's Pytorch, Scipy, Cupy, NVIDIA, UNSW
During Hyperlearn's development, bugs and issues were notified to GCC!
HyperLearn is written completely in PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, C++, C, Python, Cython and Assembly, and mirrors (mostly) Scikit Learn.
HyperLearn also has statistical inference measures embedded, and can be called just like Scikit Learn's syntax.
Some key current achievements of HyperLearn:
- 70% less time to fit Least Squares / Linear Regression than sklearn + 50% less memory usage
- 50% less time to fit Non Negative Matrix Factorization than sklearn due to new parallelized algo
- 40% faster full Euclidean / Cosine distance algorithms
- 50% less time LSMR iterative least squares
- New Reconstruction SVD - use SVD to impute missing data! Has .fit AND .transform. Approx 30% better than mean imputation
- 50% faster Sparse Matrix operations - parallelized
- RandomizedSVD is now 20 - 30% faster
Around mid 2022, Hyperlearn will evolve to GreenAI and aims to incorporate:
- New Paratrooper optimizer - fastest SGD variant combining Lookahead, Learning Rate Range Finder, and more!
- 30% faster Matrix Multiplication on CPUs
- Software Support for brain floating point (bfloat16) on nearly all hardware
- Easy compilation on old and new CPU hardware (x86, ARM)
- 100x faster regular expressions
- 50% faster and 50% less memory usage for assembly kernel accelerated methods
- Fast and parallelized New York Times scraper
- Fast and parallelized NYSE Announcements scraper
- Fast and parallelized FRED scraper
- Fast and parallelized Yahoo Finance scraper
I also published a mini 50 page book called "Modern Big Data Algorithm"!
Modern Big Data Algorithms PDF
Comparison of Speed / Memory
|QDA (Quad Dis A)
||Guaranteed stable & fast
Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )
I've also added some preliminary results for N = 5000, P = 6000
Help is really needed! Message me!
Key Methodologies and Aims
1. Embarrassingly Parallel For Loops
- Including Memory Sharing, Memory Management
- CUDA Parallelism through PyTorch & Numba
2. 50%+ Faster, 50%+ Leaner
3. Why is Statsmodels sometimes unbearably slow?
- Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
- Using Einstein Notation & Hadamard Products where possible.
- Computing only what is necessary to compute (Diagonal of matrix and not entire matrix).
- Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.
4. Deep Learning Drop In Modules with PyTorch
- Using PyTorch to create Scikit-Learn like drop in replacements.
5. 20%+ Less Code, Cleaner Clearer Code
- Using Decorators & Functions where possible.
- Intuitive Middle Level Function names like (isTensor, isIterable).
- Handles Parallelism easily through hyperlearn.multiprocessing
6. Accessing Old and Exciting New Algorithms
- Matrix Completion algorithms - Non Negative Least Squares, NNMF
- Batch Similarity Latent Dirichelt Allocation (BS-LDA)
- Correlation Regression
- Feasible Generalized Least Squares FGLS
- Outlier Tolerant Regression
- Multidimensional Spline Regression
- Generalized MICE (any model drop in replacement)
- Using Uber's Pyro for Bayesian Deep Learning
Goals & Development Schedule
Hyperlearn will be revamped in the following months to become Moonshot GreenAI with over an extra 150 optimized algorithms! Stay tuned!!
Also you made it this far! If you want to join Moonshot, complete the secretive quiz!
Extra License Terms
- Hyperlearn is intended for academic, research and personal purposes only. Any explicit commercialisation of the algorithms, methods and anything inside Hyperlearn is strictly prohibited unless explicit notice is given to Daniel Han-Chen. The usage must also be approved by Daniel Han-Chen.
- Hyperlearn uses the BSD 3 License now (previously GNU v3). However, as stated, commercialisation on top of Hyperlearn must be first approved by Daniel Han-Chen.