We will be using Airflow to orchestrate
user behavior metricdata.
Read this post, for information on setting up CI/CD, DB migrations, IAC(terraform), "make" commands and automated testing.
Run these commands to setup your project locally and on the cloud.
# Clone and cd into the project directory. git clone https://github.com/josephmachado/beginner_de_project.git cd beginner_de_project # Local run & test make up # start the docker containers on your computer & runs migrations under ./migrations make ci # Runs auto formatting, lint checks, & all the test files under ./tests # Create AWS services with Terraform make tf-init # Only needed on your first terraform run (or if you add new providers) make infra-up # type in yes after verifying the changes TF will make # Create Redshift Spectrum tables (tables with data in S3) make spectrum-migration # Create Redshift tables make redshift-migration # Wait until the EC2 instance is initialized, you can check this via your AWS UI # See "Status Check" on the EC2 console, it should be "2/2 checks passed" before proceeding # Wait another 5 mins, Airflow takes a while to start up make cloud-airflow # this command will forward Airflow port from EC2 to your machine and opens it in the browser # the user name and password are both airflow make cloud-metabase # this command will forward Metabase port from EC2 to your machine and opens it in the browser
To get Redshift connection credentials for
metabase use these commands.
make infra-config # use redshift_dns_name as host # use redshift_user & redshift_password # dev as database
Since we cannot replicate AWS components locally, we have not set them up here. To learn more about how to set up components locally read this article
Create database migrations as shown below.
make db-migration # enter a description, e.g., create some schema # make your changes to the newly created file under ./migrations make redshift-migration # to run the new migration on your warehouse
For the continuous delivery to work, set up the infrastructure with terraform, & defined the following repository secrets. You can set up the repository secrets by going to
Settings > Secrets > Actions > New repository secret.
SERVER_SSH_KEY: We can get this by running
terraform -chdir=./terraform output -raw private_keyin the project directory and paste the entire content in a new Action secret called SERVER_SSH_KEY.
REMOTE_HOST: Get this by running
terraform -chdir=./terraform output -raw ec2_public_dnsin the project directory.
REMOTE_USER: The value for this is ubuntu.
We have a dag validity test defined here.
After you are done, make sure to destroy your cloud infrastructure.
make down # Stop docker containers on your computer make infra-down # type in yes after verifying the changes TF will make
This will stop all the AWS services. Please double-check this by going to the AWS UI S3, EC2, EMR, & Redshift consoles.
Contributions are welcome. If you would like to contribute you can help by opening a Github issue or putting up a PR.