Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dask
dask
x
149 search results found
Dask
⭐
11,711
Parallel computing with task scheduling
Cudf
⭐
6,936
cuDF - GPU DataFrame Library
Ibis
⭐
3,404
The flexibility of Python with the scale and performance of modern SQL.
Xarray
⭐
3,322
N-D labeled arrays and datasets in Python
Stumpy
⭐
2,901
STUMPY is a powerful and scalable Python library for modern time series analysis
Mars
⭐
2,664
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Swifter
⭐
2,407
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Fugue
⭐
1,821
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Distributed
⭐
1,514
A distributed task scheduler for Dask
Optimus
⭐
1,446
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Eliot
⭐
1,074
Eliot: the logging system that tells you *why* it happened
Satpy
⭐
980
Python package for earth-observing satellite data processing
Mlforecast
⭐
635
Scalable machine 🤖 learning for time series forecasting.
Traceml
⭐
490
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Pystore
⭐
404
Fast data store for Pandas time-series data
Dask Sql
⭐
350
Distributed SQL Engine in Python using Dask
Datacompy
⭐
339
Pandas and Spark DataFrame comparison for humans and more!
Pyresample
⭐
320
Geospatial image resampling in Python
Hypergbm
⭐
306
A full pipeline AutoML tool for tabular data
Xclim
⭐
281
Library of derived climate variables, ie climate indicators, based on xarray.
Nebari
⭐
255
🪴 Nebari - your open source data science platform
Paperboy
⭐
231
A web frontend for scheduling Jupyter notebook reports
Dask Jobqueue
⭐
229
Deploy Dask on job schedulers like PBS, SLURM, and SGE
Models
⭐
224
Merlin Models is a collection of deep learning recommender system model reference implementations
Amazon Sagemaker Local Mode
⭐
220
Amazon SageMaker Local Mode Examples
Climpred
⭐
213
🌎 Verification of weather and climate forecasts 🌍
Xesmf
⭐
213
Universal Regridder for Geospatial Data
Stackstac
⭐
212
Turn a STAC catalog into a dask-based xarray
Orochi
⭐
177
The Volatility Collaborative GUI
Aicsimageio
⭐
177
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
Kartothek
⭐
163
A consistent table management library in python
Geowombat
⭐
154
GeoWombat: Utilities for geospatial data
Cubo
⭐
137
On-Demand Earth System Data Cubes (ESDCs) in Python
Bumblebee
⭐
124
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Python Big Data
⭐
124
Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.
Xgboost_ray
⭐
121
Distributed XGBoost on Ray
Flox
⭐
112
Fast & furious GroupBy operations for dask.array
Xarray Beam
⭐
110
Distributed Xarray with Apache Beam
Autoray
⭐
106
Abstract your array operations.
Dask Ec2
⭐
104
Start a cluster in EC2 for dask.distributed
Geoxarray
⭐
94
Geolocation utilities for xarray
Mloperator
⭐
89
Machine Learning Operator & Controller for Kubernetes
Xeofs
⭐
81
Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
Timeeval
⭐
68
Evaluation Tool for Anomaly Detection Algorithms on Time Series
Lens
⭐
67
Summarise and explore Pandas DataFrames
Ncar Python Tutorial
⭐
63
Numerical & Scientific Computing with Python Tutorial
Xgrads
⭐
61
Parse and read ctl and associated binary file commonly used by GrADS into xarray
Framequery
⭐
59
SQL on dataframes - pandas and dask
Dask Rasterio
⭐
58
Read and write rasters in parallel using Rasterio and Dask
Graphchain
⭐
56
⚡️ An efficient cache for the execution of dask graphs.
Cowait
⭐
54
Containerized distributed programming framework for Python
Dask Awkward
⭐
54
Native Dask collection for awkward arrays, and the library to use it.
Knit
⭐
54
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Agatha
⭐
50
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Xmitgcm
⭐
45
Read MITgcm mds binary files into xarray
Lazycluster
⭐
43
🎛 Distributed machine learning made simple.
Big Data
⭐
37
Python tools for big data
Learn Data Munging
⭐
37
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Dask Scaling Dataframe
⭐
36
Python and Dask: Scaling the Dataframe
Dask Deltatable
⭐
34
A Delta Lake reader for Dask
Madpy Dask
⭐
34
MadPy Dask talk materials
Oocgcm
⭐
32
oocgcm is a python library for the analysis of large gridded geophysical dataset.
Coiled Resources
⭐
31
Notebooks that support blog posts and tech talks on Dask / Coiled.
Daskos
⭐
29
Apache Mesos backend for Dask scheduling library
Cesm Lens Aws
⭐
29
Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask
Dask Pytorch Ddp
⭐
28
dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.
Arboreto
⭐
27
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Gaia
⭐
27
Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.
Daskperiment
⭐
27
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
Cog_worker
⭐
26
Scalable arbitrary analysis on COGs
Mpes
⭐
26
Distributed data processing routines for multidimensional photoemission spectroscopy (MPES)
Python For Hpc
⭐
25
Repository for participants of the "Python for HPC" training
Dask Snowflake
⭐
24
Dask integration for Snowflake
Mdio Python
⭐
24
Cloud native, scalable storage engine for various types of energy data.
Adaptive Scheduler
⭐
24
Run many functions (adaptively) on many cores (>10k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. 🎉
Pmda
⭐
23
Parallel algorithms for MDAnalysis
Esmlab
⭐
23
Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️
Daskmaskrcnn
⭐
22
Running Mask-RCNN on Dask with PyTorch
Bytehub
⭐
22
ByteHub: making feature stores simple
Xsar
⭐
22
Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing
Geospatial Data Analysis Python
⭐
21
This repo contain the most common tools used in geospatial analysis using python!
Dask Histogram
⭐
21
Histograms with task scheduling.
Php Uavt Adreskodu Botu
⭐
20
Php ile uavt adres kodu botu
Awesome Pandas Alternatives
⭐
19
Awesome list of alternative dataframe libraries in Python.
Mercat
⭐
18
MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data
Dask Ms
⭐
17
Implementation of a dask/xarray dataset backed by a CASA MS
Nyc Taxi Analysis
⭐
17
Analyzing 200 GB of NYC taxi dataset.
Coffea Casa
⭐
16
Repository with configuration setup of a prototype of analysis facility - "coffea-casa"
Lexcube
⭐
15
Lexcube: 3D Data Cube Visualization in Jupyter Notebooks
Svoe
⭐
15
A scalable, declarative, low-code framework for real-time and batch feature calculation/management (quant finance, anomaly/fraud detection, etc.), predictive ML training/inference and simulation. Built on top of Ray
Prefect Saturn
⭐
15
Python client for using Prefect Cloud with Saturn Cloud
Austin Ml Change Detection Demo
⭐
15
A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.
Distributed Compute Operator
⭐
14
🤖 Kubernetes operator providing Ray|Spark|Dask|MPI clusters on-demand
Stemgraphic
⭐
14
stemgraphic python package for visualization of data and text
Nebari Docs
⭐
14
📖 Documentation for nebari
Codex Africanus
⭐
13
Radio Astronomy Algorithms Library
Pangeo Binder
⭐
13
Pangeo + Binder (dev repo for a binder/pangeo fusion concept)
Dvc_dask_use_case
⭐
13
A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.
Distributed Compute On Aws With Cross Regional Dask
⭐
12
Perform I/O intensive workloads on high-volume data sparsely located across multiple AWS regions through the use of Dask.
Ngff Zarr
⭐
12
A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.
1-100 of 149 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.