Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Pipeline | 4,140 | 5 months ago | 85 | July 18, 2017 | 1 | apache-2.0 | Jsonnet | |||
PipelineAI Kubeflow Distribution | ||||||||||
Dataspherestudio | 2,474 | 1 | a month ago | 5 | July 14, 2022 | 402 | apache-2.0 | Java | ||
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling. | ||||||||||
Around Dataengineering | 926 | 5 months ago | 2 | Python | ||||||
A Data Engineering & Machine Learning Knowledge Hub | ||||||||||
Goodreads_etl_pipeline | 593 | 3 years ago | mit | Python | ||||||
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform. | ||||||||||
Agile_data_code_2 | 423 | a year ago | 10 | mit | Jupyter Notebook | |||||
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition | ||||||||||
Data Engineering Projects | 322 | a month ago | 5 | Jupyter Notebook | ||||||
Personal Data Engineering Projects | ||||||||||
Beginner_de_project | 276 | a month ago | 1 | mit | HCL | |||||
Beginner data engineering project - batch edition | ||||||||||
Airflow Pipeline | 160 | 7 months ago | 4 | apache-2.0 | Python | |||||
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR | ||||||||||
Data Engineering Interview Questions | 147 | 7 months ago | ||||||||
More than 2000+ Data engineer interview questions. | ||||||||||
Streamify | 97 | a year ago | Python | |||||||
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more! |
Integrating Apache Airflow and Spark
This template helps you initialize airflow and set spark running environments. Not only that, it helps you build scala codes and submit to spark.
$ airflow/init-airflow.sh
$ airflow/start-airflow.sh
$ ./pull_build.sh
......
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 59.949 s
[INFO] Finished at: 2019-07-05T19:14:38+09:00
[INFO] Final Memory: 30M/304M
[INFO] ------------------------------------------------------------------------
real 1m1.529s
......
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO -
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO - ----------------------------
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO - ENV: real OUTPUT_BUCKET: /user/isearch/score
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO - START_DATE: 20190703
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO - NUM_DAYS: 1
[2019-07-05 19:15:16,914] {bash_operator.py:127} INFO - ----------------------------
[2019-07-05 19:15:16,914] {bash_operator.py:127} INFO -
[2019-07-05 19:15:16,919] {bash_operator.py:127} INFO - ----------------------------------------
[2019-07-05 19:15:16,920] {bash_operator.py:127} INFO - HDFS_OUTPUT = /user/isearch/score/kr/summary/template/20190703
[2019-07-05 19:15:17,345] {bash_operator.py:131} INFO - Command exited with return code 0