Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset spark
dataset
x
spark
x
66 search results found
Deequ
⭐
3,044
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Spark Cassandra Connector
⭐
1,929
DataStax Connector for Apache Spark to Apache Cassandra
Petastorm
⭐
1,693
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Spark Movie Lens
⭐
757
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Cdap
⭐
735
An open source framework for building data analytic applications.
Machinelearning
⭐
684
Machine learning resources,including algorithm, paper, dataset, example and so on.
Complete Life Cycle Of A Data Science Project
⭐
499
Complete-Life-Cycle-of-a-Data-Science-Project
Whylogs Java
⭐
179
Profile and monitor your ML data pipeline end-to-end
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Qb
⭐
160
QANTA Quiz Bowl AI
Spark Iforest
⭐
147
Isolation Forest on Spark
Spatialspark
⭐
141
Big Spatial Data Processing using Spark
Distributed Dataset
⭐
107
A distributed data processing framework in Haskell.
Phrase At Scale
⭐
84
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Tiledb Vcf
⭐
79
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Dataframecheatsheet
⭐
74
Cheatsheet for Spark DataFrame
Mltoolkits
⭐
65
learningOrchestra is a distributed Machine Learning integration tool that facilitates and streamlines iterative processes in a Data Science project.
Spark Tutorial
⭐
55
This tutorial provides a quick introduction to using Spark
Spark Examples
⭐
54
RAPIDS Spark examples
Examples
⭐
47
These are some code examples
Spark Mail
⭐
45
Tutorial on parsing Enron email to Avro and then explore the email set using Spark.
Spark Xgboost Examples
⭐
43
XGBoost GPU accelerated on Spark example applications
Vagrant Spark Zeppelin
⭐
43
Vagrant, Apache Spark and Apache Zeppelin VM for teaching
Spark Anomaly Detection
⭐
43
Detecting outliers in a dataset using Spark
C4 Dataset Script
⭐
39
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
Spark Java8
⭐
38
Java 8 and Spark learning through examples
Telemetry Batch View
⭐
32
A Scala framework to build derived datasets, aka batch views, of Telemetry data.
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Enceladus
⭐
28
Dynamic Conformance Engine
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Kraps Haskell
⭐
26
Experimental Haskell bindings to Spark Datasets and DataFrames
Archived Sansa Owl
⭐
25
SANSA Stack OWL (Web Ontology Language) API
Blog Spark Naive Bayes Reuters
⭐
23
Simple example on how to use Naive Bayes on Spark using the popular Reuters 21578 dataset
Dac
⭐
23
A Distributed Associative Classifier for Apache Spark, mirror of
Patchwork
⭐
23
Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib
Pygotham2018_graphmining
⭐
22
Large-scale Graph Mining with Spark
Spark.sas7bdat
⭐
21
Read in SAS data in parallel into Apache Spark
Sparkcourse
⭐
19
Taming Big Data with Apache Spark and Python - Hands On - Udemy
Tweetsets
⭐
19
Service for creating Twitter datasets for research and archiving.
How To Process Kdd 99 Dataset
⭐
17
Use Introduction
Spark Sql Gdelt
⭐
16
Scripts and code to import the GDELT dataset into Spark SQL for analysis
Pyspark
⭐
15
spark (scala and python)
Rdds Dataframes Datasets Presentation 2016
⭐
15
Source for "RDDs, DataFrames and Datasets in Apache Spark" NEScala presentation
Mongodb_spark_course
⭐
14
Code materials for the MongoDB Spark Course
Tedsds
⭐
14
Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Pyconca 2016 Spark Tutorial
⭐
13
Materials for Mike's PyCon Canada 2016 PySpark Tutorial
Mumu Spark
⭐
13
mumu-spark是一个学习项目,主要通过这个项目来了解和学习spark的基本使用方式和工作原理。 sql、机器学习语言mlib、实时工作流streaming、图形数据库graphx。通过这些模块的学
Pyspark Ml Examples
⭐
13
Spark ML Tutorial and Examples for Beginners
Sparktf
⭐
12
R interface to Spark TensorFlow Connector
Wikibrain
⭐
11
Wikipedia graph mining: dynamic structure of collective memory
Spark Scala Tutorial Ko
⭐
11
Tutorial for Scala on Spark only
Mongodb Hadoop Workshop
⭐
11
MongoDB-Hadoop Workshop Exercises
Spark Constraints
⭐
10
SQL constraints in Spark!
Aadhaar Dataset Analysis
⭐
10
An analysis on Aadhaar dataset using Mapreduce and Spark
Bdp 05 Large Scale Clustering
⭐
9
BDP 05: CLUSTERING OF LARGE UNLABELED DATASETS OVERVIEW Real world data is frequently unlabeled and can seem completely random. In these sort of situations, unsupervised learning techniques are a great way to find underlying patterns. This project looks at one such algorithm, KMeans clustering, which searches for boundaries separating groups of points based on their differences in some features. The goal of the project is to implement an unsupervised clustering algorithm using a distributed co
Amoeba
⭐
9
Imb Sampling Ros_and_rus
⭐
9
Spark implementations of two data sampling methods (random oversampling and random undersampling) for imbalanced classification datasets
Communitydetection Spark Aws
⭐
9
A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.
Big Data Analysis With Python
⭐
8
Combine Spark and Python to process large datasets and unlock the power of parallel computing and machine learning
Intrusion Detection Spark Conv Lstm
⭐
8
Intrusion detection system with Apache Spark and deep learning
The 7 Ways Wordcount Apache Spark Snippets
⭐
8
The 7 Ways to Code WordCount in Spark 2.0 : Understanding the RDDs, Dataframes, Datasets & Spark SQL by Example
Project1
⭐
7
Music Recommender System using Apache Spark and Python
Dataset Transform
⭐
7
Strongly typed Scala operations for working with Spark Datasets
Mondrian
⭐
7
Spark-based Mondrian implementation
Sfo_fire_service_call_analysis_using_spark
⭐
6
To understand the Spark performance and tuning the application we have created Spark application using RDD, DataFrame, Spark SQL and Dataset APIs to answer the below questions from the SFO Fire department call service dataset.How many different types of calls were made to the Fire Department?,How many incidents of each call type were there?,How many years of Fire Service Calls are in the data file?, How many service calls were logged in the past 7 days? and Which neighborhood in SF generated the
Geomatch
⭐
6
Spark User Feedback
⭐
6
Cfzoo
⭐
6
Weighted Regularized Matrix Factorization for Collaborative Filtering Benchmark
Spark Analysis
⭐
6
Analyzing large datasets using spark
Cookie Datasets
⭐
6
Read well-known ML datasets in Apache Spark
Android_malware_capstone
⭐
6
Investigation of Android mobile Malware using the SherLock dataset
Bigchurn
⭐
6
Spark Datasetops
⭐
6
A tiny library that aims to make Spark SQL Dataset more developer friendly by bringing back the operators we all love to use on key-value RDDs
Distributed Smartml
⭐
5
Machine Learning Course
⭐
5
Machine Learning and Deep Learning Course
Strata2017
⭐
5
Repository for the "Exploration and visualization of large, complex datasets with R, Hadoop, and Spark" tutorial at Strata Hadoop World 2017
Big Data Project
⭐
5
big-data-project
Mambo
⭐
5
A simple in-memory, configuration driven, data processing pipeline for Apache Spark.
Selective Search
⭐
5
Selective search partitions large scale dataset into subsets(shards) such that only few shards needs to be searched for a query, thus improving search efficiency and effectiveness
Yelp_dataset
⭐
5
Sample analysis for the latest yelp dataset using spark
N5 Spark
⭐
5
Spark-driven processing utilities for N5 datasets.
Msd
⭐
5
Processing the Million Song Dataset with Apache Spark
Yelp Data Explorer
⭐
5
Yelp open dataset explorer using spark and cassandra
Related Searches
Python Dataset (14,792)
Jupyter Notebook Dataset (6,824)
Scala Spark (3,279)
Deep Learning Dataset (2,364)
Machine Learning Dataset (2,279)
Python Spark (2,053)
Dataset Pytorch (1,847)
Java Spark (1,587)
Dataset Tensorflow (1,583)
Dataset Classification (1,500)
1-66 of 66 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.