Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset big data
big-data
x
dataset
x
15 search results found
Img2dataset
⭐
2,986
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Fluid
⭐
1,488
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Spark Movie Lens
⭐
757
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Oie Resources
⭐
435
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Knowage Server
⭐
387
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Video2dataset
⭐
379
Easily create large video dataset from video urls
Cc2dataset
⭐
264
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
Musescore Dataset
⭐
217
The dataset of all music sheets and users on musescore.com (unmaintained/discontinued since Sep 30, 2021)
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Cloud Volume
⭐
120
Read and write Neuroglancer datasets programmatically.
Jhdf
⭐
105
A pure Java HDF5 library
Flink Book
⭐
38
大数据,流计算,实时计算,Flink框架学习资料。畅销书籍《深入理解Flink核心设计与实践原理》 随书代码,书中讲解的Flink特性均有完整可运行的代码供读者运行和测试。整个工程共有【182个Jav
Telemetry Batch View
⭐
32
A Scala framework to build derived datasets, aka batch views, of Telemetry data.
Vehicleorientationdataset
⭐
30
The vehicle orientation dataset is a large-scale dataset containing more than one million annotations for vehicle detection with simultaneous orientation classification using a standard object detection network.
Enceladus
⭐
28
Dynamic Conformance Engine
Aws Redshift Spectrum Poc
⭐
27
Cloudformation and SQL scripts used to replicate a POC environment from the "Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum" post
Detecting Malicious Url Machine Learning
⭐
23
Pytorch_lmdb_dataset
⭐
22
pytorch lmdb dataset with protobuf
Spark.sas7bdat
⭐
21
Read in SAS data in parallel into Apache Spark
Social Network Analysis In Python
⭐
19
Social Network Facebook Analysis (Python, Networkx)
Pyspark
⭐
15
spark (scala and python)
X Wines
⭐
15
A world wines dataset with user ratings for recommendation systems and general use.
Gmql
⭐
14
GMQL - GenoMetric Query Language
Clusterindices
⭐
9
This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.
Bigarrays.jl
⭐
9
Storing and accessing large Julia array locally or in cloud storage without server.
Bigdata Net Hadoop Mapreduce
⭐
6
Big data analysis for NHS huge datasets (Hadoop Map Reduce) #BigData #MapReduce #.Net #HDFS
Related Searches
Python Dataset (14,792)
Jupyter Notebook Dataset (6,824)
Deep Learning Dataset (2,364)
Machine Learning Dataset (2,279)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
Dataset Classification (1,500)
Dataset Convolutional Neural Networks (1,264)
Dataset Paper (1,252)
Javascript Dataset (1,014)
1-15 of 15 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.