Can I take the course for free?

No, you cannot take this course for free. When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. If you cannot afford the fee, you can apply for financial aid.

Will I earn university credit for completing the Specialization?

This Specialization doesn't carry university credit, but some universities may choose to accept Specialization Certificates for credit. Check with your institution to learn more.

Pipeline Architects: Data Engineering to Lakehouse Specialization

Build Data Pipelines That Scale to Production.

Master ingestion, transformation, orchestration, and lakehouse architecture at scale.

Instructor: Hurix Digital

Included with

Learn more

10 course series

Get in-depth knowledge of a subject

Intermediate level

Recommended experience

4 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

10 course series

Get in-depth knowledge of a subject

Intermediate level

Recommended experience

4 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Design data flow diagrams and configure Airbyte connectors for relational databases, streaming platforms, and REST APIs to unify diverse sources.
Build modular ETL pipelines using Python, dbt, and Airflow, and evaluate columnar versus row-oriented storage formats for analytical workloads.
Implement incremental warehouse loading, SCD2 historical tracking, and data lake transactions with versioning and schema evolution support.
Architect and build lakehouse platforms using Delta Lake, Iceberg, and Hudi, registering external tables and automating ingestion pipelines.

Skills you'll gain

Tools you'll learn

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your subject-matter expertise

Learn in-demand skills from university and industry experts
Master a subject or tool with hands-on projects
Develop a deep understanding of key concepts
Earn a career certificate from Coursera

Specialization - 10 course series

Raw data sitting in disconnected silos is not a data platform — it is a liability. Building systems that ingest, transform, reconcile, version, & serve data reliably at enterprise scale is what separates engineers who prototype from architects who build infrastructure teams depend on. This program teaches you how to do the latter.

Pipeline Architects is an intermediate program designed for data engineers, analytics engineers, & data platform professionals who want to build complete, production-ready data engineering skills. Across ten focused courses, you will master the full data engineering stack: mapping data flows, ingesting from relational databases, streaming platforms & REST APIs, building and transforming modular pipelines, evaluating storage formats, loading warehouses incrementally, implementing SCD2 historical tracking, applying data lake transactions and versioning, building lakehouse architectures, automating workflows with Apache Airflow, and unifying data through SQL MERGE reconciliation and performance tuning.

You'll work with industry-standard tools including Python, SQL, Apache Airflow, dbt, Snowflake, Apache Kafka, Airbyte, Delta Lake, Iceberg, and Hudi, applying hands-on techniques to realistic production data engineering scenarios.

By the end of the program, you will be equipped to architect, build, & operate data pipelines from raw ingestion through lakehouse delivery with the reliability and performance that modern analytics infrastructure demands.

Applied Learning Project

Throughout this program, you will complete hands-on projects that reflect real data engineering workflows. You will design end-to-end data flow diagrams, configure Airbyte connectors for relational databases, Kafka topics, and REST APIs, and build modular pipeline stages for ingestion, cleansing, transformation, and loading using Python, dbt, and Airflow. You will benchmark columnar and row-oriented storage formats, implement incremental warehouse loading using Snowflake MERGE INTO, and apply SCD2 logic to build historical dimension models. You will convert raw files to transactional formats, execute time-travel queries, manage schema evolution, and register external tables across Delta Lake, Iceberg, and Hudi. You will configure production-grade Airflow DAGs with retry logic, SLA alerting, and Slack integration, and apply SQL MERGE upsert operations with field-level conflict resolution and performance tuning. Each project produces a defensible, production-applicable artifact.

Map Data Flows Fast

Course 1, 1 hour

What you'll learn

Visual data flow docs are key for system clarity and form the base for good pipeline design and team communication.
Complete data flow diagrams must show the full journey from sources through transforms to final destinations.
Structured diagram creation follows steps: find sources, map processes, set destinations, and check connections.
Good data flow visuals connect technical work with business needs, enabling stakeholder alignment and decisions.

Skills you'll gain

Category: Data Flow Diagrams (DFDs)

Category: Data Store

Category: Diagram Design

Category: Data Pipelines

Category: Data Mapping

Category: Technical Communication

Category: Dataflow

Category: Data Visualization

Category: Data Transformation

Unify Diverse Data Sources

Course 2, 1 hour

What you'll learn

Standardized connector configuration patterns apply across different data source types, making integration skills transferable.
Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection.
Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.
Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

Skills you'll gain

Category: Enterprise Security

Category: Application Programming Interface (API)

Category: Data Integration

Category: Data Infrastructure

Category: Database Management

Category: Enterprise Architecture

Category: Databases

Category: Authentications

Category: Systems Integration

Category: Relational Databases

Category: Restful API

Category: Real Time Data

Category: Apache Kafka

Evaluate Storage for Data Warehousing Success

Course 3, 2 hours

What you'll learn

Storage format choice strongly affects query performance and should match workload needs, not general assumptions.
Column storage suits read-heavy analytics, while row storage performs better for transactional and write-focused workloads.
Benchmarking with real datasets and queries offers the best basis for sound storage architecture decisions.
Compression and ingestion speed must be balanced carefully to align performance with business priorities.

Skills you'll gain

Category: Data Warehousing

Category: Performance Testing

Category: Apache Hive

Category: Amazon Redshift

Category: Analysis

Category: Star Schema

Category: Data Processing

Category: Snowflake Schema

Category: Data Import/Export

Category: Query Languages

Category: Data Storage

Category: Performance Tuning

Category: Data Storage Technologies

Category: Data Architecture

Category: Data Store

Build & Transform Data Pipelines

Course 4, 2 hours

What you'll learn

Modular pipeline design enables maintainable, scalable data systems that can adapt to changing business requirements.
Integration of complementary tools (Spark, dbt, Airflow) creates more robust and efficient data processing workflows than single-tool approaches.
Proper separation of concerns between ingestion, transformation, and loading stages reduces complexity and improves debugging capabilities.
Automation and orchestration are essential for reliable, production-grade data systems that minimize manual intervention and human error.

Skills you'll gain

Category: Data Pipelines

Category: Code Reusability

Category: Cloud Computing

Category: Data Cleansing

Category: Data Integration

Category: Apache Airflow

Category: Data Processing

Category: Data Warehousing

Category: Extract, Transform, Load

Category: Data Infrastructure

Category: Cloud Deployment

Update Your Data Warehouse Incrementally

Course 5, 2 hours

What you'll learn

Standardized connector configuration patterns apply across different data source types, making integration skills transferable.
Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection
Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.
Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

Skills you'll gain

Category: Data Warehousing

Category: Data Integration

Category: Extract, Transform, Load

Category: Data Pipelines

Category: Data Processing

Category: Data Quality

Apply SCD2 to Build Dynamic Data Models

Course 6, 2 hours

What you'll learn

Historical data preservation is essential for accurate business analytics and regulatory compliance - once overwritten, critical context is lost.
SCD2 patterns create sustainable data architecture by maintaining complete audit trails through automated versioning than destructive updates.
Effective dimensional modeling requires systematic change detection logic that identifies modifications and creates new historical records.
Modern data tools like dbt democratize complex SCD2 implementation, making enterprise-grade historical tracking accessible through declarative SQL.

Skills you'll gain

Category: Data Modeling

Category: Scalability

Category: SQL

Category: Trend Analysis

Category: Data Integrity

Category: Data Pipelines

Category: Business Intelligence

Category: Data Warehousing

Apply Data Lake Transactions & Versioning

Course 7, 2 hours

What you'll learn

Transactional storage layers ensure data lake reliability, supporting concurrent operations and maintaining integrity.
Version control in data lakes enables auditing, compliance, time-travel queries, and error recovery for production systems.
Schema evolution strategies help data systems adapt to business changes while maintaining backward compatibility.
Converting raw files to transactional formats is a key pattern supporting both analytics and operational reliability.

Skills you'll gain

Category: Data Pipelines

Category: Disaster Recovery

Category: Data Lakes

Category: SQL

Build & Analyze Your Data Lakehouse

Course 8, 2 hours

What you'll learn

External tables let query engines access distributed files without duplication, reshaping large-scale analytics design.
Choosing Delta, Iceberg, or Hudi requires evaluating schema changes, time travel needs, and performance goals.
Lakehouse architecture merges data lake flexibility with warehouse reliability using metadata and ACID support.
Automated ingestion with staging and transformation layers ensures consistent, high-quality data across analytics systems.

Skills you'll gain

Category: Data Lakes

Category: Data Pipelines

Category: Analysis

Category: Automation

Category: Apache Hive

Category: Data Warehousing

Automate Data Workflows with Airflow Excellence

Course 9, 1 hour

What you'll learn

Production-grade workflows require proactive failure handling strategies, not reactive troubleshooting approaches.
Parameterization and configuration management are essential for workflow reusability across different environments and datasets.
Task dependency design and SLA monitoring form the foundation of reliable data pipeline operations.
Robust workflow architecture prevents downstream business disruptions and reduces operational overhead.

Skills you'll gain

Category: Apache Airflow

Category: Data Pipelines

Category: Workflow Management

Category: Scalability

Category: Service Level Agreement

Category: Service Level

Category: System Monitoring

Category: Dependency Analysis

Unify, Reconcile, and Tune Data Systems

Course 10, 3 hours

What you'll learn

SQL MERGE offers atomic sync that maintains consistency in CDC pipelines with minimal overhead.
Field-level conflict analysis needs clear business rules and source-of-truth hierarchies for reliable reconciliation.
Integration performance improves through measurement, bottleneck detection, and targeted tuning, not large redesigns.
Sustainable data systems balance quality, speed, and reliability through ongoing monitoring and iterative improvement.

Skills you'll gain

Category: Data Manipulation

Category: Data Governance

Category: Performance Measurement

Category: Performance Improvement

Category: Data Quality

Category: Data Cleansing

Category: Performance Metric

Category: Performance Tuning

Category: Operational Databases

Category: Consolidation

Category: Systems Integration

Category: Performance Testing

Category: Data Management

Category: Data Validation

Category: Application Performance Management

Category: SQL

Category: Data Pipelines

Category: Database Design

Category: Data Integration

Category: Data Integrity

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Hurix Digital

443 Courses55,501 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

This course is completely online, so there’s no need to show up to a classroom in person. You can access your lectures, readings and assignments anytime and anywhere via the web or your mobile device.

Yes! To get started, click the course card that interests you and enroll. You can enroll and complete the course to earn a shareable certificate. When you subscribe to a course that is part of a Specialization, you’re automatically subscribed to the full Specialization. Visit your learner dashboard to track your progress.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Pipeline Architects: Data Engineering to Lakehouse Specialization

Pipeline Architects: Data Engineering to Lakehouse Specialization

What you'll learn

Skills you'll gain

Tools you'll learn

Details to know

See how employees at top companies are mastering in-demand skills

Advance your subject-matter expertise

Specialization - 10 course series

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

What you'll learn

Skills you'll gain

Earn a career certificate

Instructor

Offered by

Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Frequently asked questions

Is this course really 100% online? Do I need to attend any classes in person?

Can I just enroll in a single course?

Is financial aid available?

More questions