Microsoft
Data Storage and Management for Big Data

Gain next-level skills with Coursera Plus for $199 (regularly $399). Save now.

Microsoft

Data Storage and Management for Big Data

 Microsoft

Instructor: Microsoft

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • - Manage big data storage and pipelines with Azure services.

    - Process and analyze large datasets using Apache Spark and Databricks.

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

January 2026

Assessments

41 assignments¹

AI Graded see disclaimer
Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your Data Analysis expertise

This course is part of the Microsoft Big Data Management and Analytics Professional Certificate
When you enroll in this course, you'll also be enrolled in this Professional Certificate.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate from Microsoft

There are 5 modules in this course

Data Storage Technologies (SQL vs NoSQL) guides learners through the core principles of modern data storage and the trade-offs that shape today’s big data systems. The module examines how relational databases manage structured data, where they encounter limitations at scale, and how techniques such as partitioning, indexing, and lakehouse architectures mitigate performance gaps. Learners compare major NoSQL categories—including document, key-value, and column-family databases—to understand how flexible schemas and distributed designs support high-volume, high-velocity workloads. Through hands-on activities with SQL Server, Azure Synapse, and Azure Cosmos DB, learners practice essential operations, evaluate storage technologies based on workload requirements, and build the skills needed to select and implement effective database solutions for big data environments.

What's included

6 videos3 readings8 assignments

Working with Data Formats (Structured, Semi-structured, Unstructured) helps learners build a clear understanding of how different data formats function within big data systems and why format selection matters for performance, storage, and analytical success. The module introduces structured formats, such as CSV and TSV, and explores flexible semi-structured formats, including JSON and XML. It also examines optimized file types, including Parquet, Avro, and ORC, that support large-scale analytics. Learners practice transforming data between formats using Azure Data Factory, working with nested structures, applying schema inference, and evaluating performance trade-offs across file types. Through demonstrations, code exercises, and hands-on labs, this module equips learners to select, convert, and manage data formats effectively for diverse big data scenarios.

What's included

6 videos3 readings8 assignments

Data Lakes and Data Warehouses Implementation guides learners through the architectural foundations and hands-on skills needed to build modern analytical environments. The module explores the purpose and structure of data lakes, highlighting the zones of raw, cleaned, enriched, and curated data, and demonstrates how thoughtful design supports flexibility, governance, and large-scale analytics. Learners also study core data warehouse concepts, including dimensional modeling, star schemas, and data marts, to understand how structured storage enables high-performance querying. Through practical work with Azure Data Lake Storage Gen2 and Azure Synapse Analytics, learners design zone architectures, implement dimensional models, configure SQL pools, and apply best practices for partitioning, distribution, and optimization. By the end, they gain the ability to organize, govern, and integrate data across both lake and warehouse environments, supporting scalable, enterprise-ready analytics.

What's included

6 videos3 readings7 assignments

Building Data Pipelines (ETL/ELT with Azure Data Factory) equips learners with the skills to design, implement, and manage scalable data integration workflows using modern, cloud-native approaches. The module examines the differences between ETL and ELT, helping learners understand when each methodology delivers the best performance, flexibility, and cost efficiency. Learners gain hands-on experience with Azure Data Factory, configuring linked services, datasets, activities, and core orchestration components, and practice building both simple and advanced pipelines. The module also introduces transformation logic, control flow patterns, parameterization, and error handling strategies that support production-ready data engineering solutions. Through walkthroughs, labs, code exercises, and scenario-based decisions, learners learn to monitor pipelines, troubleshoot failures, and design reliable data workflows that support enterprise-scale analytics.

What's included

6 videos3 readings9 assignments

Batch and Real-Time Processing Fundamentals introduces learners to the core processing models that power modern big data systems, helping them understand when each approach delivers the most value. The module explores batch architectures, scheduling methods, and optimization strategies for large-scale historical processing, while also examining real-time stream processing concepts, including event handling, latency trade-offs, and throughput requirements. Learners gain hands-on experience implementing both models—building batch workflows with Azure Data Factory and configuring streaming pipelines using Event Hubs and Stream Analytics. Through architectural analysis, code exercises, and practical labs, learners learn to evaluate business needs, select the right processing approach, and design hybrid systems that combine batch and streaming for comprehensive analytics.

What's included

6 videos3 readings9 assignments

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

 Microsoft
278 Courses2,140,106 learners

Offered by

Microsoft

Explore more from Data Analysis

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.