Blog

Data Factory in Microsoft Fabric: A Powerful Data Integration Solution

Introduction

Data is the fuel of the modern world, and data integration is the process of transforming, combining, and moving data from various sources to a destination for analysis and insights. Data integration can be challenging, especially when dealing with large volumes, complex formats, and diverse systems. That’s why you need a data integration solution that is easy to use, powerful, and enterprise-grade.

Data Factory in Microsoft Fabric is the next generation of Azure Data Factory, which provides cloud-scale data movement and data transformation services that allow you to solve the most complex ETL (extract, transform, and load) scenarios. It’s intended to make your data integration experience easy to use, powerful, and truly enterprise-grade.

Data Factory in Fabric empowers you with: 

  • Seamless connectivity to more than 170+ data stores (including on-premises data sources, cloud databases, analytical platforms, line of business applications, and more)
  • Next-generation Power BI dataflows is now available as part of Data Factory in Fabric, providing 300+ out-of-the box data transformations, including AI transformations, and scalable data flows that run on Fabric compute. In addition, you can output the transformed data into various data destinations.
  • Data pipelines in Fabric is an evolution of Azure Data Factory pipelines which provides you with a rich set of integration capabilities. In addition, the copy assistant enables you to jumpstart any copy task from data sources to data destinations.
  • Built-in AI enables you to accelerate and automate common data integration tasks.

ETL Before Fabric

Before Fabric to perform ETL jobs Azure data factory was there in market from few years. It is a data integration service which orchestrates and automates the data movement and transformations. We can create and schedule automated workflows in data factory. It empowers organizations to make data-driven decisions by facilitating the movement and transformation of data in a flexible and scalable manner.

Power query is another component to Power BI for doing data transformations. It is later called as power query dataflows and it was being used in Power Apps as well for any kind of data migrations.

Why use Fabric Data Factory

Data Factory is the data integration component of Microsoft Fabric which brings the power of Azure Data Factory and Power Query Dataflows into one place. For many years, we had these two technologies doing data transformations separately. But now, these two are combined under Fabric, called Data Factory with Data Pipelines and Data flow Gen2. It gives the power of ADF (Azure Data Factory) with Data pipelines and Power query with Data flows.

It is the future of Data factory because it gives the flexibility to pull the data from numerous sources, does the complex transformations, analyzing and visualizing data with reporting to present to end user/customer. Also, it can be managed using single data source called One Lake Data Hub. It is its biggest advantage and reduces the pain to fetch data from multiple containers from data lake. It gives an end-to-end solution from the shape of raw data to clean and structured data with applied transformations and reporting using Power BI as all are in single fabric workspace.

Components in Data Factory Fabric

  • Dataflows: Dataflows enable you to build data transformations using a visual graphical interface that lets you use hundreds of connectors, functions, and transformations to clean, shape, and enrich your data, without writing any code. Data Factory in Microsoft Fabric introduces Dataflow Gen2, which is a new and improved version of Power Query Dataflows.
  • Data Pipelines: Pipelines enable you to orchestrate and automate your data movement and transformation activities. You can use various activities to perform different tasks, such as copying data, executing dataflows, running SQL queries, calling REST APIs, and more. You can also use the Copy Assistant feature to quickly create a copy activity that moves data from one source to another, without creating a pipeline.
  • Data Connectors: Connectors in data factory fabric enables you to connect to various data sources and destinations, such as cloud databases, on-premises data sources, analytical platforms, and more. Data Factory in Microsoft Fabric supports connectivity to more than 170+ data stores, including generic interfaces like REST APIs, OData, and more. Connections have similar functionality as linked services in Azure Data Factory, but connections in Microsoft Fabric have a more intuitive way to create and manage them.
  • Table Clones: Table Clones enable you to clone data warehouse tables within Microsoft Fabric as of the current point in time. This feature allows you to create a snapshot of your data warehouse table, and use it for reporting, analytics, development, testing, and more. You can also restore your data warehouse table from a clone, in case of any data loss or corruption.
  • Dynamic Data Masking: Dynamic Data Masking (DDM) enables you to protect sensitive data from unauthorized access by masking it with predefined rules. For example, you can mask the credit card numbers or email addresses in your data, and only show the last four digits or the domain name. DDM helps you comply with data privacy and security regulations and prevent data leakage. Data Factory in Microsoft Fabric supports DDM for Fabric Warehouse and SQL Endpoint in Lakehouse.

Comparisons between Azure Data Factory   Vs   Data Factory Fabric

Below are the main ADF component differences found in Data Factory Fabric.

Azure Data Factory Data Factory Fabric Description
Pipeline Data Pipeline Data pipeline in Fabric is much integrated with the unified data platform including Lakehouse, Datawarehouse, and others.
Mapping Dataflow Dataflow Gen2 Dataflow Gen2 provides easier experience to work and transform the data.
Dataset Not Applicable Data Factory in Fabric will not have any dataset feature.
Linked Service Connections Connections have similar functionality as linked service, but connections in Fabric are created in the Manage connections and gateways section.
Publish Save, Run In fabric you can click on the Save button to save pipeline directly. When you click Run, it will run the pipeline.
Azure IR Not Applicable In Fabric, Integration runtime feature is not applicable.
SHIR On-premises Data Gateway It works very much similar to Self-Hosted Integration Runtime to connect to on-premises sources.
Export and Import ARM Save as Save as is available in Fabric pipeline to duplicate a pipeline.
Monitoring Monitoring, Run history The monitoring hub in Fabric has more advanced functions and modern experience like monitoring across different workspaces for better insights.

One Lake as Single Storage

One Lake is a single, unified, logical data lake for your whole organization. Like OneDrive, One Lake comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data. One Lake brings customers: One data lake for the entire organization. 

Advantages:

  • Before One Lake, it was a tedious job for customers to create multiple lakes for different business groups rather than collaborating on a single lake, even with the extra overhead of managing multiple resources.
  • One Lake focuses on removing these challenges by improving collaboration. Every customer tenant has exactly single One Lake. Every Fabric tenant automatically provisions One Lake, with no extra resources to set up or manage.
  • It reduces the pain of storing the data in multiple storage locations.
  • Using one lake will reduce the cost to users as all types of data is stored in single location as it is centralized storage space
  • It is a single stop solution that encapsulates every analytical capability with single ecosystem.

In this blog series, we’re going to show you how to load data from different sources to different destinations using data factory in Microsoft Fabric. We’ll guide you through the steps of using these tools and how they work together to create something cool.

Next blog in the series – How to load data from Blob into Lakehouse with an auto mail implementation.

VNB Consulting boasts an extensive expertise in Microsoft Fabric strategy, implementation, support and training. Our skilled professionals are adept at designing and implementing robust data analytics solutions tailored to meet specific business needs. We specialize end to end data governance, data integration, data warehousing, data science, data visualizations using Microsoft Fabric. Contact us today for a free consultation of your business and harness the full potential of Microsoft Fabric to boost operational efficiency and drive digital innovation within their organizations.

Related Posts


Snowflake Cloud Data Platform

February 8, 2024

Snowflake 101: Why Choose Snowflake Cloud Data Platform for your Business?

This blog is the first in the series of articles on Snowflake. In this blog, let’s take a look at the basics of Snowflake Cloud Data Platform and why businesses should choose Snowflake for managing their data. What is Snowflake Cloud Data Platform? Snowflake Cloud Data Platform is a cloud-based data warehousing platform that helps

Microsoft Power Platform 2024 Release Wave 1 Updates

February 5, 2024

Microsoft Power Platform 2024: Release Wave 1 Plan Announcement

Microsoft announced the 2024 Release Wave 1 plans for Microsoft Power Platform and Microsoft Dynamics 365. This details the features and enhancements scheduled for rollout from April 2024 to September 2024. In the release plan, Microsoft reveals lot of new upcoming features that are planned to be released during the timeframe. Their aim is to