-
Notifications
You must be signed in to change notification settings - Fork 152
added blog about fabric #1381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
added blog about fabric #1381
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,191 @@ | ||
| --- | ||
| title: "Microsoft Fabric: One Platform, One Lake, Every Data Workload" | ||
| authors: [Aditya-Singh-Rathore] | ||
| sidebar_label: "Microsoft Fabric" | ||
| tags: [microsoft-fabric, onelake, data-engineering, lakehouse, delta-lake, big-data, cloud, power-bi] | ||
| date: 2026-05-01 | ||
|
|
||
| description: Microsoft Fabric is a unified analytics platform that brings together data engineering, data science, real-time analytics, and business intelligence under a single roof — all built on OneLake. Learn how Fabric is architected, how data flows through it, and why it matters for modern data teams. | ||
|
|
||
| draft: false | ||
| canonical_url: https://www.recodehive.com/blog/microsoft-fabric-one-platform-one-lake-every-data-workload | ||
|
|
||
| meta: | ||
| - name: "robots" | ||
| content: "index, follow" | ||
| - property: "og:title" | ||
| content: "Microsoft Fabric: One Platform, One Lake, Every Data Workload" | ||
| - property: "og:description" | ||
| content: "Microsoft Fabric unifies data engineering, analytics, and BI into one platform built on OneLake. Here's how the architecture works and why it matters." | ||
| - property: "og:type" | ||
| content: "article" | ||
| - property: "og:url" | ||
| content: "https://www.recodehive.com/blog/microsoft-fabric-one-platform-one-lake-every-data-workload" | ||
| - property: "og:image" | ||
| content: "/static/imag/blogs/microsoft-farbic-cover.jpg" | ||
| - name: "twitter:card" | ||
| content: "summary_large_image" | ||
| - name: "twitter:title" | ||
| content: "Microsoft Fabric: One Platform, One Lake, Every Data Workload" | ||
| - name: "twitter:description" | ||
| content: "Microsoft Fabric unifies data engineering, analytics, and BI into one platform built on OneLake. Here's how the architecture works and why it matters." | ||
| - name: "twitter:image" | ||
| content: "/assets/images/microsoft-fabric-cover.jpg" | ||
|
|
||
| --- | ||
|
|
||
| <!-- truncate --> | ||
| # Microsoft Fabric: One Platform, One Lake, Every Data Workload | ||
|
|
||
| Modern data teams don't struggle because of a lack of tools, they struggle because of too many. | ||
|
|
||
| A typical data stack today might include a cloud data warehouse, an object store, a managed Spark environment, a pipeline orchestration tool, and a BI layer on top. Each of these tools is powerful on its own. But getting them to work together, moving data across systems, keeping governance consistent, and debugging failures across layers — often becomes a bigger challenge than the actual data work itself. | ||
|
|
||
| Microsoft Fabric takes a different approach. | ||
|
|
||
| Instead of adding another tool to the stack, it brings everything together into a single, unified platform. From ingestion and storage to transformation, analytics, machine learning, and reporting — everything runs on a shared foundation. | ||
|
|
||
| In this article, we'll walk through how Fabric is structured, how data flows through it, and why this architecture matters for data engineers and teams evaluating modern data platforms. | ||
|
|
||
|  | ||
|
|
||
| ## The Foundation: OneLake | ||
|
|
||
| Every component in Fabric is built on top of **OneLake** - the platform's unified, logical data lake. Think of OneLake as the single source of truth for your entire Fabric workspace. | ||
|
|
||
| Every workload, whether it's a Spark notebook, a SQL warehouse query, a Power BI report, or a machine learning experiment reads from and writes to the same underlying storage. There is no data movement between services. There is no export-and-reload step when a data scientist needs access to a table that a data engineer just built. The data exists once. | ||
|
|
||
| OneLake stores everything in **Delta Parquet format**, an open-source table format that supports ACID transactions, schema enforcement, time travel, and versioning. This is worth emphasizing: it is not a proprietary format. Data stored in Fabric is readable by Spark, DuckDB, Pandas, Polars, and most modern query engines. If you ever need to access your data outside of Fabric, the files are already in a portable, open format. | ||
|
|
||
| OneLake is the architectural idea that makes Fabric coherent. Everything else is an experience layer built on top of it — and in practice, this is where Fabric starts to feel genuinely different from most data platforms. | ||
|
|
||
|
|
||
|
|
||
| ## Data Engineering: Lakehouses, Spark, and Notebooks | ||
|
|
||
| Fabric's data engineering experience is organized around the **Lakehouse** .A storage construct that lives in OneLake and combines the flexibility of a data lake with the structure and query capabilities of a data warehouse. | ||
|
|
||
| When you create a Lakehouse, you get a two-zone storage structure: | ||
|
|
||
| - 1. A **Files area** for raw, unstructured, or semi-structured data (CSV, JSON, images, logs) | ||
| - 2. A **Tables area** where data is stored as managed Delta tables | ||
|
|
||
| Once data lands in the Tables zone, it is immediately queryable by SQL, Spark, and Power BI — no registration step, no schema definition overhead. | ||
|
|
||
| For transformation workloads, Fabric provides a fully managed **Apache Spark** environment. You write notebooks in Python, Scala, SQL, or R. Clusters are serverless by default, they start on demand, require no configuration, and shut down automatically when idle. | ||
|
|
||
| For production workloads, you can promote notebooks to **Spark Job Definitions** for scheduled execution, and manage library dependencies across your team using **Environments** - versioned, shareable Spark configurations that eliminate the classic "works on my cluster" problem. | ||
|
|
||
| The data engineering layer is where raw data enters the platform, gets cleaned and transformed, and gets shaped into Delta tables that every other Fabric service can then consume directly. | ||
|
|
||
|
|
||
|
|
||
| ## Data Ingestion and Orchestration: Data Factory | ||
|
|
||
| Getting data from external systems into the Lakehouse is the job of **Data Factory** — Fabric's data integration and orchestration layer. | ||
|
|
||
| Data Factory offers two primary patterns: | ||
|
|
||
| - 1. **Pipelines**: The activity-based orchestration tool, familiar to anyone who has used Azure Data Factory or Apache Airflow. You build directed acyclic graphs of copy activities, transformation steps, conditional logic, and triggers. Fabric pipelines support hundreds of connectors to external databases, REST APIs, cloud storage, and SaaS applications. | ||
|
|
||
| - 2. **Dataflows Gen2**: A code-free alternative using a visual, Power Query-based interface. Transformations defined in Dataflows compile to Spark or SQL execution under the hood, making them a practical option for analysts who need to express transformation logic without writing code. | ||
|
|
||
| One key integration point: neither pipelines nor dataflows require a separate connection configuration to reach your Lakehouse because the Lakehouse is already in the same workspace. You select it from a dropdown. It sounds minor, but it's the kind of thing that saves considerable time when you're building pipelines regularly. | ||
|
|
||
|
|
||
|
|
||
| ## SQL Analytics: The Data Warehouse | ||
|
|
||
| Fabric's **Data Warehouse** is a fully managed, T-SQL analytics engine, but with an important architectural distinction from traditional warehouses. It stores its data in Delta Parquet on OneLake, not in a proprietary internal format. | ||
|
|
||
| This means that tables written by your Spark notebooks in the Lakehouse are directly readable by warehouse SQL queries, and warehouse tables are readable by Spark, without any copy or ETL step in between. For teams that maintain both engineering and analyst workflows, this eliminates an entire category of synchronization problem. | ||
|
|
||
| The warehouse supports standard SQL constructs: views, stored procedures, cross-database queries, and row-level security. | ||
|
|
||
| **A practical decision guide:** | ||
|
|
||
| | Use the Lakehouse when... | Use the Warehouse when... | | ||
| |---|---| | ||
| | Workloads are Spark-heavy | Consumers are SQL analysts | | ||
| | Data is schema-flexible | Structured, governed tables are needed | | ||
| | Programmatic transformation logic is required | Strong query performance with SQL semantics is the priority | | ||
|
|
||
|
|
||
|
|
||
| ## Real-Time Intelligence: Streaming and Event Data | ||
|
|
||
| **Real-Time Intelligence** is Fabric's answer to streaming data workloads and it is one of the most complete streaming experiences available within a unified platform. | ||
|
|
||
| **Eventstreams** act as a managed event streaming layer. You connect to sources like Azure Event Hubs, Kafka, IoT Hub, or custom HTTP endpoints, apply in-flight transformations using a visual stream-processing editor, and route the output to multiple destinations simultaneously - including Lakehouses, KQL databases, or other downstream consumers. | ||
|
|
||
| The destination for high-frequency event data is typically an **Eventhouse**, which contains one or more **KQL databases**. KQL (Kusto Query Language) is optimized for time-series and log data, and is significantly faster than SQL for streaming analytics queries, things like "show me all anomalies in sensor readings in the last 15 minutes, grouped by device." | ||
|
|
||
| For observability, IoT, clickstream, and security analytics workloads, the Eventhouse + KQL combination is a compelling part of the Fabric story. And crucially, Eventhouse data also lives in OneLake, meaning historical event data can be joined with batch data from the Lakehouse or Warehouse without a separate data movement step. | ||
|
|
||
|
|
||
|
|
||
| ## Data Science and Machine Learning | ||
|
|
||
| Fabric's **Data Science** experience is designed for the full ML lifecycle — from exploratory analysis through model training, evaluation, and deployment. | ||
|
|
||
| The primary workspace is familiar to any data scientist: Jupyter-style notebooks backed by managed Spark, with access to the full Python ML ecosystem (scikit-learn, XGBoost, PyTorch, TensorFlow) and **SynapseML**, Microsoft's open-source library for distributed ML on Spark. | ||
|
|
||
| Fabric integrates **MLflow** natively for experiment tracking, model registration, and versioning. When you train a model, metrics, parameters, and artifacts are automatically logged and stored. Models are registered in the workspace's model registry and can be used for batch scoring directly against Lakehouse tables using the `PREDICT` function in Spark SQL — no separate serving infrastructure required for batch inference use cases. | ||
|
|
||
| The deeper value for data scientists is what OneLake enables: feature tables built by data engineers in the Lakehouse are immediately accessible in ML notebooks without copying or re-ingesting data. The gap between data engineering and data science shrinks considerably when both disciplines are working against the same underlying tables. | ||
|
|
||
| --- | ||
|
|
||
| ## Security and Governance: Built In | ||
|
|
||
| One of the more understated strengths of Fabric's unified architecture is what it enables for governance. When all your data lives in one place, you define access policies once — not once per service. | ||
|
|
||
| Fabric integrates with **Microsoft Entra ID** for identity and access management, and with **Microsoft Purview** for data cataloging, lineage tracking, and sensitivity labeling. Row-level security, column-level security, and workspace-level access controls are applied uniformly across all Fabric experiences. | ||
|
|
||
| A sensitivity label applied to a table in the Lakehouse is respected when that same table is queried from the Warehouse or visualized in Power BI. For data engineers operating in regulated industries — finance, healthcare, public sector — this centralized governance model is a significant operational advantage over managing access policies across a fragmented stack. | ||
|
|
||
| --- | ||
|
|
||
| ## Power BI: Reporting Without Data Duplication | ||
|
|
||
| Power BI is the reporting and visualization layer — and in Fabric, it gains a capability that addresses one of its longest-standing pain points: **DirectLake mode**. | ||
|
|
||
| Traditionally, Power BI reports could either: | ||
| - 1. Query live data (slow, puts load on source systems), or | ||
| - 2. Import data into an in-memory semantic model (fast, but creates a stale copy requiring scheduled refreshes) | ||
|
|
||
| DirectLake is a third mode that reads directly from Delta Parquet files in OneLake at query time — delivering import-speed performance without maintaining a separate copy of the data. | ||
|
|
||
| For data engineers, this changes the publication model entirely. Once your pipeline writes a clean Delta table to the Lakehouse, a Power BI report can query it in DirectLake mode immediately — no refresh schedule, no import process, no synchronization lag. The report is always current as of the last pipeline run. | ||
|
|
||
| --- | ||
|
|
||
| ## Bringing It All Together | ||
|
|
||
| The reason Fabric is worth serious evaluation is not any individual component, it's what the unified architecture enables across all of them. | ||
|
|
||
| A data pipeline built in Data Factory writes to a Lakehouse. A Spark notebook transforms it into a clean Delta table. A data scientist trains a model against that table in a notebook. A warehouse analyst queries the same table in SQL. A Power BI report visualizes it in DirectLake mode. An Eventstream feeds real-time data into the same Lakehouse alongside the batch data. Throughout all of this, Purview is tracking lineage and Entra is enforcing access policies. | ||
|
|
||
| None of these steps require a separate connector, a data copy, or a cross-service authentication configuration. They are all reading from OneLake. | ||
|
|
||
| For teams that have spent years managing the operational overhead of a fragmented data stack, that's a genuinely meaningful shift — one where the platform handles the integration, and engineers can spend their time on the work that actually matters. | ||
|
|
||
| --- | ||
|
|
||
| ## Try It Yourself | ||
|
|
||
| - **Microsoft Fabric Free Trial** → [app.fabric.microsoft.com](https://app.fabric.microsoft.com/) | ||
| - **Full Documentation** → [learn.microsoft.com/fabric](https://learn.microsoft.com/fabric) | ||
|
|
||
| --- | ||
|
|
||
| ## About the Author | ||
|
|
||
| I'm **Aditya Singh Rathore**, a Data Engineer passionate about building modern, scalable data platforms. I write about Microsoft Fabric, Azure data tools, and real-world data engineering on [RecodeHive](https://www.recodehive.com/) — breaking down complex concepts into practical, actionable content. | ||
|
|
||
| If this article helped you understand Microsoft Fabric better, consider sharing it with your network. And if you're building something with Fabric or just getting started, I'd love to hear about it. | ||
|
|
||
| 🔗 Connect with me on [LinkedIn](https://www.linkedin.com/in/aditya-singh-rathore0017/) | [GitHub](https://github.com/Adez017) | ||
| 📩 Have a topic you'd like me to cover? Drop it in the comments below. | ||
|
|
||
| <GiscusComments/> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add more real dashboard ss from your account to make it friendly, long text people wont read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can do that , I'll work around that @sanjay-kv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Adez017 will it be okay if i pay through github sponsorship for the blogs once you complete 5, helps me with tax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure