Blog

Create Dataflow to Load Data from Excel Web Source to Lakehouse

Dataflow Lakehouse

In our last Data Factory series using Fabric blog, we saw how to create a Dataflow in Data Factory to load the data from Web API connector to warehouse. We also discussed how to load the CSV files to a warehouse. In this blog, let’s explore a new use case. In this article, we will explore our knowledge on importing data from excel file to Lakehouse with simple transformations using Dataflow.

Setting up Dataflow in Data Factory

Click on Data Factory from the Fabric workspace. It will open a new window from where you can choose Dataflow Gen2 and Data pipeline. Select the Dataflow Gen2. You will see the editor like below.

Fabric Data Dactory - Dataflow Gen2

Click on Import from Excel option from the screen. You will be see a window where you can enter the SharePoint location of the Excel files.

Connect to Data Source

Add the Excel file to the dataflow editor and choose the required files. Select the appropriate one as shown below.

Add Excel File to Data Flow

You will see the Dataflow editor as below for all data and fields that were added.

Data Flow Editor

Adding Lakehouse as Destination Source

To add a destination, click on Add to Destination at the bottom right on the Query Setting pane. Choose Lakehouse from the list of options.

Adding Lakehouse Destination SourceThis will open a window to create a connection to Lakehouse. Click on the Next button.

Create Lakehouse Connection

 

Clicking Next will redirect you to choose the destination target (similar to warehouse). Choose the table names from your Lakehouse, if already created, or create the new table and enter the table names accordingly. In our case, let’s select new table and enter the table name,

Select New Table

Click the Next button to view the predefined mapping. Choose Replace.

View Predefined Mapping

Save the settings. You will be redirected to the Dataflow editor.

Dataflow Editor with Destination Settings

Create custom column in the table

Next, we will see how to create a new custom column from existing columns in the same table. In the data pane, scroll to the right on the selected table. You will see “Exit Date” as the last column. To add a new custom column, click on “Add Column” tab at the top of navigation bar and click on custom column.

Create Custom Column from Existing Column

In the new window to create the custom column, enter the column name as Monthly Salary. Choose the datatype as Currency from the dropdown. Select the Annual Salary column from the list and click on Insert column so that it will be added to its left canvas and divide by 12. Click OK as shown below. This will create a new column in the existing table.

Create custom column - Insert new column

To see the new column, scroll to the extreme right. Also, if you notice, for every transformation, it generates the power queries in the top query editor.

View the newly added column in Power Query Editor

Click the Publish button. As we already configured the destination, you will see the Lakehouse as target endpoint.

Lakehouse as a Datapoint

Viewing the data in Lakehouse

Once you Publish your changes, it will be saved, and the Dataflow will do an auto refresh by its own. This will load the data into target table TBL_Employees. To see the data in the Lakehouse, just open Lakehouse in a separate tab. You can see that the table is loaded with data automatically.

Dataflow autoloading data into table

View Dataflow History

To see the history of dataflow, go to Monitoring Hub pane in your workspace. Select the most recent dataflow activity, click on menu and select Historical runs.

Monitoring Hub - Dataflow History

This will show the details and the status of the run.

Dataflow details and history of runs

Wrapping Up

We have taken a look at how we can import data from excel file to Lakehouse with simple transformations using Dataflow. In case you missed out, make sure to check out our previous blogs in this Data Factory using Microsoft Fabric series by clicking the below links. Our next blog in this series will explain how you can perform incremental data load with data pipelines in Microsoft Fabric.

Data Factory in Microsoft Fabric: A Powerful Data Integration Solution

Create Pipeline in Data Factory to Copy Data from Azure Blob Storage to Lakehouse with Auto Mail Implementation

Create Dataflow in Data Factory to Copy Data from Excel Web Source to Warehouse

Related Posts


Power Platform Community Conference Banner

September 6, 2024

Power Platform Community Conference 2024 #PPCC24

VNB is returning to the Power Platform Community Conference #PPCC24 as an exhibitor for the third time in a row!

Copilot Studio

June 1, 2024

Microsoft Build 2024: A Look at the Important Announcements

Let’s dive deeper and explore the key takeaways from Microsoft Build 2024! We’ll cover the key updates to Microsoft Copilot Studio and Power Platform in this blog.