Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset scala
dataset
x
scala
x
48 search results found
Deequ
⭐
3,044
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Spark Cassandra Connector
⭐
1,929
DataStax Connector for Apache Spark to Apache Cassandra
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Spark Iforest
⭐
147
Isolation Forest on Spark
Wayeb
⭐
145
Wayeb is a Complex Event Processing and Forecasting (CEP/F) engine written in Scala.
Spatialspark
⭐
141
Big Spatial Data Processing using Spark
Flink Maven Scala
⭐
58
flink技术学习笔记分享
Spark Tutorial
⭐
55
This tutorial provides a quick introduction to using Spark
Spark Examples
⭐
54
RAPIDS Spark examples
Locis
⭐
44
Implementation of "A Parallel Spatial Co-location Mining Algorithm Based on MapReduce" paper
Spark Xgboost Examples
⭐
43
XGBoost GPU accelerated on Spark example applications
Spark Anomaly Detection
⭐
43
Detecting outliers in a dataset using Spark
Vagrant Spark Zeppelin
⭐
43
Vagrant, Apache Spark and Apache Zeppelin VM for teaching
Qamr
⭐
40
Question-Answer Meaning Representation
Flink Book
⭐
38
大数据,流计算,实时计算,Flink框架学习资料。畅销书籍《深入理解Flink核心设计与实践原理》 随书代码,书中讲解的Flink特性均有完整可运行的代码供读者运行和测试。整个工程共有【182个Jav
Hmda_data_science_kit
⭐
37
Telemetry Batch View
⭐
32
A Scala framework to build derived datasets, aka batch views, of Telemetry data.
Enceladus
⭐
28
Dynamic Conformance Engine
Quantumlearn
⭐
28
💭 An improved Machine Learning library in Scala.
Kudu Learning
⭐
27
kudu学习的一些资料,以及和spark/impala的集成使用
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Levar
⭐
25
Machine learning evaluation database
Blog Spark Naive Bayes Reuters
⭐
23
Simple example on how to use Naive Bayes on Spark using the popular Reuters 21578 dataset
Dac
⭐
23
A Distributed Associative Classifier for Apache Spark, mirror of
Rdfrules
⭐
23
RDFRules: Analytical Tool for Rule Mining from RDF Knowledge Graphs
Patchwork
⭐
23
Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib
Salt Examples
⭐
21
Example projects using Uncharted Salt
Ac Blstm
⭐
21
MXNet implementation of AC-BLSTM
Soda Fountain
⭐
18
Server for the Socrata Open Data API
Titanic
⭐
17
Predicting survival on the Titanic
Latis
⭐
16
Unified scientific data model implementation and modular framework for data access, processing, and output.
Spark Sql Gdelt
⭐
16
Scripts and code to import the GDELT dataset into Spark SQL for analysis
Rdds Dataframes Datasets Presentation 2016
⭐
15
Source for "RDDs, DataFrames and Datasets in Apache Spark" NEScala presentation
Pyspark
⭐
15
spark (scala and python)
Gmql
⭐
14
GMQL - GenoMetric Query Language
Akkamapreducesample
⭐
14
An example of Akka MapReduce to process huge datasets in real time.
Linkedhypernymsdataset
⭐
12
Covid19 Knowledge Graph
⭐
11
Builds a knowledge graph from the [COVID-19 Open Research Dataset (CORD-19)](https://pages.semanticscholar.org/coron dataset.
Spark Scala Tutorial Ko
⭐
11
Tutorial for Scala on Spark only
Wikibrain
⭐
11
Wikipedia graph mining: dynamic structure of collective memory
Random Forest
⭐
10
Implementation of a Random Forest classifier in both Python and Scala
Spark Constraints
⭐
10
SQL constraints in Spark!
Clusterindices
⭐
9
This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.
Imb Sampling Ros_and_rus
⭐
9
Spark implementations of two data sampling methods (random oversampling and random undersampling) for imbalanced classification datasets
Esc
⭐
9
Scala client for ElasticSearch
Newsleak
⭐
9
Science and Data-Driven Journalism: Data Extraction and Interactive Visualization of Unexplored Textual Datasets for Investigative Data-Driven Journalism
Stsc
⭐
9
A implementation of the Self-Tuning Spectral Clustering algorithm, and more.
Dataset Transform
⭐
7
Strongly typed Scala operations for working with Spark Datasets
Spark Datasetops
⭐
6
A tiny library that aims to make Spark SQL Dataset more developer friendly by bringing back the operators we all love to use on key-value RDDs
Cookie Datasets
⭐
6
Read well-known ML datasets in Apache Spark
Calcite Map Demo
⭐
6
Demonstration of integrating Calcite with a hierarchical dataset
Sfo_fire_service_call_analysis_using_spark
⭐
6
To understand the Spark performance and tuning the application we have created Spark application using RDD, DataFrame, Spark SQL and Dataset APIs to answer the below questions from the SFO Fire department call service dataset.How many different types of calls were made to the Fire Department?,How many incidents of each call type were there?,How many years of Fire Service Calls are in the data file?, How many service calls were logged in the past 7 days? and Which neighborhood in SF generated the
Scrubjay
⭐
6
A framework for automatic, scalable data integration of HPC performance data sources
Geomatch
⭐
6
Spark User Feedback
⭐
6
Yelp_dataset
⭐
5
Sample analysis for the latest yelp dataset using spark
Scala Wurfl
⭐
5
WURFL Scala API
Metadata Manager
⭐
5
Public dataset downloader for GMQL framework
Databus Mods
⭐
5
Databus Mods (How To and Mod Ontology and Reference Implementation)
Distributed Smartml
⭐
5
Machine Learning Course
⭐
5
Machine Learning and Deep Learning Course
Big Data Opinion Spam Detection Using Scala And Machine Learning
⭐
5
DETECTING OPINION SPAMMERS AND CLASSIFYING AMAZON FOOD REVIEWS USING SENTIMENT ANALYSIS AND MACHINE LEARNING
Data Knoller
⭐
5
data-knoller is a library to provide data preparation on user-specified dataset.
Mambo
⭐
5
A simple in-memory, configuration driven, data processing pipeline for Apache Spark.
Big Data Project
⭐
5
big-data-project
Selective Search
⭐
5
Selective search partitions large scale dataset into subsets(shards) such that only few shards needs to be searched for a query, thus improving search efficiency and effectiveness
Msd
⭐
5
Processing the Million Song Dataset with Apache Spark
Related Searches
Python Dataset (14,792)
Jupyter Notebook Dataset (6,824)
Scala Sbt (4,178)
Scala Spark (3,279)
Deep Learning Dataset (2,364)
Machine Learning Dataset (2,279)
Scala Akka (2,120)
Dataset Pytorch (1,847)
Java Scala (1,794)
Dataset Tensorflow (1,583)
1-48 of 48 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.