Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python hadoop
hadoop
x
python
x
430 search results found
Spark
⭐
35,911
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks
⭐
25,025
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Luigi
⭐
16,527
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Deeplearning4j
⭐
12,957
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
H2o 3
⭐
6,299
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
School Of Sre
⭐
6,008
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Bigdl
⭐
4,222
Fast, distributed, secure AI for Big Data
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Ibis
⭐
2,768
The flexibility of Python with the scale and performance of modern SQL.
Mrjob
⭐
2,584
Run MapReduce jobs on Hadoop or Amazon Web Services
Ambari
⭐
1,922
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Nagios Plugins
⭐
1,098
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Studybooks
⭐
999
我的学习资料,包括书籍、网址等
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Snakebite
⭐
854
A pure python HDFS client
Cdap
⭐
707
An open source framework for building data analytic applications.
Devops Python Tools
⭐
659
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Flintrock
⭐
604
A command-line tool for launching Apache Spark clusters.
Aws Glue Libs
⭐
514
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Bigtop
⭐
500
Mirror of Apache Bigtop
Tuiblogs
⭐
443
优秀的计算机编程类博客和文章 share excellent blogs and sites
Gather Deployment
⭐
345
Gathers scalable Tensorflow, Python infrastructure deployment and practices, 100% Docker.
Elasticluster
⭐
317
Create clusters of VMs on the cloud and configure them with Ansible.
Sagemaker Spark
⭐
263
A Spark library for Amazon SageMaker.
Maas
⭐
257
Official MAAS repository mirror (may be out of date). Development happens in Launchpad (https://git.launchpad.net/maas/).
Hadoopy
⭐
244
Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
Bisheserver
⭐
242
本系统是我的毕业设计项目,题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为
Hadoop Attack Library
⭐
200
A collection of pentest tools and resources targeting Hadoop environments
Zohmg
⭐
175
Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.
Avenir
⭐
164
Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Cc Mrjob
⭐
157
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Ipython Spark Docker
⭐
151
Spatialanalytics
⭐
134
Where 2.0 Workshop Code: Spatial Analysis of Tweets using Hadoop, Pig, Python & Mechanical Turk. Slides here: http://www.slideshare.net/kevinweil/spatial-analyt
Griffon Vm
⭐
129
Griffon Data Science Virtual Machine
Skein
⭐
126
A tool and library for easily deploying applications on Apache YARN
Aut
⭐
122
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Hprof2flamegraph
⭐
108
Flame Graph visualization for Java (HPROF, Honest-profiler)
Introtohadoopandmr__udacity_course
⭐
103
🐘 Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"
Hadoop Yarn Api Python Client
⭐
99
Python client for Hadoop® YARN API
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Ambari Hue Service
⭐
97
Ambari stack service for easily installing and managing Hue on HDP cluster
Annotated Wikiextractor
⭐
88
Simple Wikipedia plain text extractor with article link annotations and Hadoop support.
Tf Yarn
⭐
86
Train TensorFlow models on YARN in just a few lines of code!
Briefly
⭐
85
Briefly - A Python Meta-programming Library for Job Flow Control
Dlflow
⭐
84
DLFlow is a deep learning framework.
Pyhdfs
⭐
83
Python HDFS client
Solutions Google Compute Engine Cluster For Hadoop
⭐
81
This sample app will get up and running quickly with a Hadoop cluster on Google Compute Engine. For more information on running Hadoop on GCE, read the papers at https://cloud.google.com/resources/.
Data Pipeline
⭐
79
Data pipeline is a tool to run Data loading pipelines. It is an open sourced app engine app that users can extend to suit their own needs. Out of the box it will load files from a source, transform them and then output them (output might be writing to a file or loading them into a data analysis tool). It is designed to be modular and support various sources, transformation technologies and output types. The transformations can be chained together to form complex pipelines.
Cqu_bigdata
⭐
77
重庆大学计算机学院“大数据课程群”实验及PPT
Resilient Ml Research Platform
⭐
76
Templates
⭐
73
DevOps Templates for Kubernetes, AWS, GCP, Terraform, Docker, Packer, Jenkins, CircleCI, GitHub Actions, Lambda, AWS CodeBuild, GCP Cloud Build, Vagrant, Puppet, Python, Bash, Go, Perl, Java, Scala, Groovy, Maven, SBT, Gradle, Make, Jenkinsfile, Makefile, Dockerfile, docker-compose.yml, Vagrantfile, M4 etc...
Geoprocessing Tools For Hadoop
⭐
68
The Hadoop GP Toolbox provides tools to exchange features between a Geodatabase and Hadoop and run Hadoop workflow jobs.
Python Hdfs
⭐
67
HDFS client for Python
Airflow Spark
⭐
64
Docker with Airflow and Spark standalone cluster
Apache Spark Hands On
⭐
64
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Pystratus
⭐
63
Python-based utility for managing various distributed services on cloud providers
Dask Yarn
⭐
61
Deploy dask on YARN clusters
Coursework
⭐
59
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Pybigdata
⭐
56
使用 python 操作大数据的各种组件
Snabler
⭐
54
Parallel Algorithms in Python for Hadoop/Mapreduce
Mylearningnotes
⭐
54
Because its never late to start taking notes and 'public' it...
Knit
⭐
54
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Spark Training
⭐
52
Repository used for Spark Trainings
Til
⭐
51
Today I Learned
Qds Sdk Py
⭐
51
Python SDK for accessing Qubole Data Service
Hadoop_vision
⭐
49
Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"
Pydrill
⭐
46
Python Driver for Apache Drill.
Ossocr
⭐
46
gathering point for open source OCR scripts and diffs
Serverless Spark Workshop
⭐
45
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Ibis
⭐
44
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Medicare Demo
⭐
44
A demo of how to use PageRank with Hadoop and SociaLite to identify anomalies in Healthcare Data
Machine_learning_in_action_py3
⭐
44
Important book about the machine learning algorithms, and introduces the application of those who use these algorithms and tools, and how to use them in a real environment. This book and other books, behind the other books are long on machine learning theory knowledge, the book happened to be more discussion on how to use coded machine learning algorithms.
Icwsm2010_tutorial
⭐
44
example code for "Large-scale social media analysis with Hadoop" tutorial presented at ICWSM 2010
Webhdfs Py
⭐
43
Python Client for WebHDFS REST API
Python Devops
⭐
41
gathers Python stack for DevOps, these are usually my basic templates use for my implementations, so, feel free to use it and evolve it! Everything is Docker!
Slinky
⭐
39
Slinky, a high-performance web crawler / text analytics in Python, Redis, Hadoop, R, Gephi
Hadoop Multi Server Ansible
⭐
38
Multi-server deployment of Hadoop using Ansible
Big Data Exploration
⭐
37
[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
Swordfish
⭐
37
Open-source distribute workflow schedule tools, also support streaming task.
Test.fm
⭐
36
Testing framework for Collaborative Filtering
Big Data
⭐
35
Python tools for big data
Luiti
⭐
35
A time task management framework, support multiple projects, built on top of luigi.
Hadoop_exporter
⭐
35
A hadoop exporter for prometheus, scrape hadoop metrics (including HDFS, YARN, MAPREDUCE, HBASE. etc.) from hadoop components jmx url.
Ansible Ambari
⭐
33
Quickly deploy Hadoop with the help of Ansible and Apache Ambari
Telemetry Analysis Service
⭐
33
Telemetry Analysis Service
Bigdata Docker Compose
⭐
33
Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.
Hortonworks Sandbox
⭐
33
hortonworks-sandbox
Awesome Tools
⭐
32
curated list of awesome tools and libraries for specific domains
Jydoop
⭐
32
Efficient Hadoop Map-Reduce in Python
Jmxtrans Lib
⭐
32
JMXTrans configuration for hadoop/cassandra/zookeeper
Ambari Metrics
⭐
32
Apache Ambari Metrics is a sub project of Apache Ambari.
Hadoop Job Analyzer
⭐
31
Matrix Hadoop Tutorial
⭐
31
A set of tutorial codes about matrix methods in Hadoop
Gparml
⭐
31
Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models.
Related Searches
Python Dataset (14,792)
Python Pytorch (14,667)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Javascript Python (9,798)
Python Amazon Web Services (8,185)
Python Neural Network (7,064)
1-100 of 430 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.