Blog

Create Dataflow to Load Data from Excel Web Source to Lakehouse

Dataflow Lakehouse

In our last Data Factory series using Fabric blog, we saw how to create a Dataflow in Data Factory to load the data from Web API connector to warehouse. We also discussed how to load the CSV files to a warehouse. In this blog, let’s explore a new use case. In this article, we will explore our knowledge on importing data from excel file to Lakehouse with simple transformations using Dataflow.

Setting up Dataflow in Data Factory

Click on Data Factory from the Fabric workspace. It will open a new window from where you can choose Dataflow Gen2 and Data pipeline. Select the Dataflow Gen2. You will see the editor like below.

Fabric Data Dactory - Dataflow Gen2

Click on Import from Excel option from the screen. You will be see a window where you can enter the SharePoint location of the Excel files.

Connect to Data Source

Add the Excel file to the dataflow editor and choose the required files. Select the appropriate one as shown below.

Add Excel File to Data Flow

You will see the Dataflow editor as below for all data and fields that were added.

Data Flow Editor

Adding Lakehouse as Destination Source

To add a destination, click on Add to Destination at the bottom right on the Query Setting pane. Choose Lakehouse from the list of options.

Adding Lakehouse Destination SourceThis will open a window to create a connection to Lakehouse. Click on the Next button.

Create Lakehouse Connection

 

Clicking Next will redirect you to choose the destination target (similar to warehouse). Choose the table names from your Lakehouse, if already created, or create the new table and enter the table names accordingly. In our case, let’s select new table and enter the table name,

Select New Table

Click the Next button to view the predefined mapping. Choose Replace.

View Predefined Mapping

Save the settings. You will be redirected to the Dataflow editor.

Dataflow Editor with Destination Settings

Create custom column in the table

Next, we will see how to create a new custom column from existing columns in the same table. In the data pane, scroll to the right on the selected table. You will see “Exit Date” as the last column. To add a new custom column, click on “Add Column” tab at the top of navigation bar and click on custom column.

Create Custom Column from Existing Column

In the new window to create the custom column, enter the column name as Monthly Salary. Choose the datatype as Currency from the dropdown. Select the Annual Salary column from the list and click on Insert column so that it will be added to its left canvas and divide by 12. Click OK as shown below. This will create a new column in the existing table.

Create custom column - Insert new column

To see the new column, scroll to the extreme right. Also, if you notice, for every transformation, it generates the power queries in the top query editor.

View the newly added column in Power Query Editor

Click the Publish button. As we already configured the destination, you will see the Lakehouse as target endpoint.

Lakehouse as a Datapoint

Viewing the data in Lakehouse

Once you Publish your changes, it will be saved, and the Dataflow will do an auto refresh by its own. This will load the data into target table TBL_Employees. To see the data in the Lakehouse, just open Lakehouse in a separate tab. You can see that the table is loaded with data automatically.

Dataflow autoloading data into table

View Dataflow History

To see the history of dataflow, go to Monitoring Hub pane in your workspace. Select the most recent dataflow activity, click on menu and select Historical runs.

Monitoring Hub - Dataflow History

This will show the details and the status of the run.

Dataflow details and history of runs

Wrapping Up

We have taken a look at how we can import data from excel file to Lakehouse with simple transformations using Dataflow. In case you missed out, make sure to check out our previous blogs in this Data Factory using Microsoft Fabric series by clicking the below links. Our next blog in this series will explain how you can perform incremental data load with data pipelines in Microsoft Fabric.

Data Factory in Microsoft Fabric: A Powerful Data Integration Solution

Create Pipeline in Data Factory to Copy Data from Azure Blob Storage to Lakehouse with Auto Mail Implementation

Create Dataflow in Data Factory to Copy Data from Excel Web Source to Warehouse

Related Posts


Snowflake Cloud Data Platform

February 8, 2024

Snowflake 101: Why Choose Snowflake Cloud Data Platform for your Business?

This blog is the first in the series of articles on Snowflake. In this blog, let’s take a look at the basics of Snowflake Cloud Data Platform and why businesses should choose Snowflake for managing their data. What is Snowflake Cloud Data Platform? Snowflake Cloud Data Platform is a cloud-based data warehousing platform that helps

Microsoft Power Platform 2024 Release Wave 1 Updates

February 5, 2024

Microsoft Power Platform 2024: Release Wave 1 Plan Announcement

Microsoft announced the 2024 Release Wave 1 plans for Microsoft Power Platform and Microsoft Dynamics 365. This details the features and enhancements scheduled for rollout from April 2024 to September 2024. In the release plan, Microsoft reveals lot of new upcoming features that are planned to be released during the timeframe. Their aim is to