Airflow_spark

Simply Integrating Apache Airflow and Spark
Alternatives To Airflow_spark
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Pipeline4,140
5 months ago85July 18, 20171apache-2.0Jsonnet
PipelineAI Kubeflow Distribution
Dataspherestudio2,4741a month ago5July 14, 2022402apache-2.0Java
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Around Dataengineering926
5 months ago2Python
A Data Engineering & Machine Learning Knowledge Hub
Goodreads_etl_pipeline593
3 years agomitPython
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Agile_data_code_2423
a year ago10mitJupyter Notebook
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Data Engineering Projects322
a month ago5Jupyter Notebook
Personal Data Engineering Projects
Beginner_de_project276
a month ago1mitHCL
Beginner data engineering project - batch edition
Airflow Pipeline160
7 months ago4apache-2.0Python
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Data Engineering Interview Questions147
7 months ago
More than 2000+ Data engineer interview questions.
Streamify97
a year agoPython
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Alternatives To Airflow_spark
Select To Compare


Alternative Project Comparisons
Readme

airflow_spark

Integrating Apache Airflow and Spark

airflow template

This template helps you initialize airflow and set spark running environments. Not only that, it helps you build scala codes and submit to spark.

requirements

  • python >= 3
  • maven >= 3
  • jdk >= 1.8

install & init airflow

$ airflow/init-airflow.sh
$ airflow/start-airflow.sh

build spark & test spark job with airflow

$ ./pull_build.sh

result

......

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 59.949 s
[INFO] Finished at: 2019-07-05T19:14:38+09:00
[INFO] Final Memory: 30M/304M
[INFO] ------------------------------------------------------------------------

real    1m1.529s

......

[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO -
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO -         ----------------------------
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO -         ENV: real OUTPUT_BUCKET: /user/isearch/score
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO -         START_DATE: 20190703
[2019-07-05 19:15:16,913] {bash_operator.py:127} INFO -         NUM_DAYS: 1
[2019-07-05 19:15:16,914] {bash_operator.py:127} INFO -         ----------------------------
[2019-07-05 19:15:16,914] {bash_operator.py:127} INFO -
[2019-07-05 19:15:16,919] {bash_operator.py:127} INFO - ----------------------------------------
[2019-07-05 19:15:16,920] {bash_operator.py:127} INFO - HDFS_OUTPUT = /user/isearch/score/kr/summary/template/20190703
[2019-07-05 19:15:17,345] {bash_operator.py:131} INFO - Command exited with return code 0
Popular Spark Projects
Popular Airflow Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Shell
Scala
Apache
Spark
Airflow