Completing DataTalksClub’s Data Engineering Zoomcamp: My Module 1 Journey with Docker and Terraform
I just wrapped up Module 1 of DataTalksClub’s Data Engineering Zoomcamp, and it’s been an exciting dive into the fundamentals of modern data engineering! This module focused on Docker, Postgres, and Terraform – tools every data engineer should know. Here’s a breakdown of my experience, key takeaways, and the code I built along the way.
What I Learned
Docker Basics:
Containerizing applications (like Postgres) to avoid "it works on my machine" chaos.
Writing
Dockerfile
anddocker-compose.yaml
for multi-service setups.
Data Ingestion with Python:
Using
pgcli
andpandas
to load NYC Taxi data into Postgres.Writing reusable scripts for dataset validation and pipeline automation.
Infrastructure as Code (Terraform):
Provisioning Google Cloud Platform (GCP) resources (BigQuery, GCS) via Terraform.
Managing state files and modular configurations.
My Code & Setup
Check out my GitHub repo for Module 1 here:
👉 DTC_dataEngg / module1-hw
Highlights:
Docker Workflow:
bash
Copy
# Spin up Postgres + pgAdmin docker-compose up -d # Run ingestion script docker run -it --network=hw_default taxi_ingest:v1 \ --user=root --password=root --host=pgdatabase --port=5432 --db=ny_taxi
Terraform for GCP:
Defined modules for BigQuery datasets and GCS buckets to ensure reproducibility.
Challenges & Wins
Docker Networking: Debugging container communication (e.g., Python script → Postgres) was tricky at first.
Terraform State: Learned to manage
.tfstate
files properly to avoid config drift.BigQuery Schema Auto-Detection: Tweaked my Python script to handle datatype mismatches.
Win: Successfully orchestrated a local-to-cloud pipeline using free-tier tools!
Why This Matters
Module 1 taught me how containerization and IaC solve critical problems in data engineering:
Reproducibility: Docker ensures pipelines run identically across environments.
Scalability: Terraform automates cloud resource provisioning, saving hours of manual setup.
Join the Discussion!
Are you also doing the Zoomcamp? How did your Module 1 go? Let’s connect on:
- GitHub: @Deathslayer89
Tags: #DataEngineering #Docker #Terraform #Zoomcamp #DataPipeline #GCP