Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Devops Python Tools | 709 | 4 months ago | 37 | mit | Python | |||||
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc. | ||||||||||
Bigdata Playground | 154 | 5 years ago | 4 | apache-2.0 | TypeScript | |||||
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL | ||||||||||
Boxball | 99 | 5 months ago | 8 | October 07, 2023 | 9 | apache-2.0 | Python | |||
Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV. | ||||||||||
Spark Mail | 45 | 5 years ago | 3 | other | HTML | |||||
Tutorial on parsing Enron email to Avro and then explore the email set using Spark. | ||||||||||
Etl Light | 38 | 7 years ago | mit | Scala | ||||||
A light Kafka to HDFS/S3 ETL library based on Apache Spark | ||||||||||
Deephaven Parquet Viewer | 21 | 4 months ago | 4 | Shell | ||||||
A browser-based Parquet file viewer | ||||||||||
Spark Lucenerdd Examples | 15 | 7 months ago | 2 | apache-2.0 | Scala | |||||
Examples of spark-lucenerdd | ||||||||||
Greatex | 10 | 2 years ago | 1 | Python | ||||||
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow. | ||||||||||
Bigdata Platform | 6 | 4 months ago | apache-2.0 | Jupyter Notebook | ||||||
End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter][Docker] |