Awesome Data Science And Engineering

A curated list of Data Science and Engineering frameworks, tools, libraries and related list of tutorials.
Alternatives To Awesome Data Science And Engineering
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Recommenders16,382
8 hours ago158mitPython
Best Practices on Recommendation Systems
Awesome Pytorch List14,103
4 months ago4
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Mit Deep Learning9,328
a year ago15mitJupyter Notebook
Tutorials, assignments, and competitions for MIT Deep Learning related courses.
Computervision Recipes8,950
8 months ago65mitJupyter Notebook
Best Practices, code samples, and documentation for Computer Vision.
Catboost7,36766 hours ago60September 26, 2022518apache-2.0Python
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Metaflow6,9931195 hours ago91July 17, 2023293apache-2.0Python
:rocket: Build and manage real-life data science projects with ease!
Snorkel5,570482 months ago21July 29, 202218apache-2.0Python
A system for quickly generating training data with weak supervision
Start Machine Learning3,589
3 months ago4mit
A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2023 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art techniques!
Scikit Learn Videos3,180
2 years agoJupyter Notebook
Jupyter notebooks from the scikit-learn video series
Awesome Computer Science Opportunities2,993
2 months ago10mit
An awesome list of events and fellowship opportunities for Computer Science students
Alternatives To Awesome Data Science And Engineering
Select To Compare


Alternative Project Comparisons
Readme

Data Science & Engineering

A curated list of Data Science and Engineering frameworks, tools, libraries and related list of tutorials. This mostly covers python related opensource ones ranging from beginner to intermediate levels.

Table of Contents

Big Data

PySpark - Apache Spark Python API - pypi

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Study Material

Frameworks

Apache Airflow - pypi

Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows.

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

Advanced Concepts:

Libraries

Pandas - pypi

Library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

NumPy - pypi

NumPy is the fundamental package for scientific computing with Python. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Alembic - pypi

Alembic is a database migrations tool written by the author of SQLAlchemy. A migrations tool offers the following functionality:

  1. Can emit ALTER statements to a database in order to change the structure of tables and other constructs
  2. Provides a system whereby “migration scripts” may be constructed; each script indicates a particular series of steps that can “upgrade” a target database to a new version, and optionally a series of steps that can “downgrade” similarly, doing the same steps in reverse.
  3. Allows the scripts to execute in some sequential manner.

Tools

JupyterLab

JupyterLab is the next-generation web-based user interface for Project Jupyter.

JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. You canarrange multiple documents and activities side by side in the work area using tabs and splitters. Documents and activities integrate with each other, enabling new workflows for interactive computing.

Google Colab

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Popular Tutorials Projects
Popular Data Science Projects
Popular Learning Resources Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Sql
Tutorial
Apache
Data Science
Spark
Pandas
Pyspark
Dag
Airflow
Beginner Friendly
Data Engineering