Azure Synapse Analytics – Part 4: Predictive Analytics
In this part 4 of the series of articles on Azure Synapse Analytics, we will take a detailed look at the predictive analytics capabilities of Synapse. In case you missed reading our previous articles, check them out in the below links –
Azure Synapse Analytics – Part 1: An Overview
Azure Synapse Analytics –Part 2: Creating a Workspace
Azure Synapse Analytics –Part 3: Benefits and Limitations of Synapse Reporting
What is Predictive Analytics?
Predictive analytics is a modeling technique of using statistical algorithms and machine learning (ML) techniques to ascertain future outcomes, based on the historical data. This decision making technique helps businesses to forecast the inventory and manage resources, improve business operations and retain their customers. Predictive analytics is mostly used in data forecasting.
Azure Synapse and Predictive Analytics
Azure Synapse integrates deeply with Azure Machine Learning. The ML workflow helps businesses to deliver predictive analytics. Data analysts can take advantage of Azure Machine Learning’s central model registry to access the different available models. In addition, they can also build their own models easily within Azure Synapse. These models can be used on the data to deliver analytics and enrich the data.
The Azure ML can generate ‘n’ number of predictions in just seconds. The Automated Machine Learning (a.k.a. AutoML) allows to automate time consuming and iterative tasks in developing ML models. With AutoML, you can speed up the process of generating production-ready ML models easily with improved efficiency. Azure ML also offers code-free ML scoring – to generate latest values from new input data based on a specific trained ML model.
Setting up Linked Services for Predictive Analytics
As explained in the previous article, Synapse offers excellent end-to-end system integration capabilities. To set up this integration, the primary step is to create a “Linked Service”. Linked Services help to set up a seamless collaboration between Azure Synapse and other services. To enable predictive analytics, the primary step is to set up linked services to enable a seamless collaboration between Azure Synapse and Azure Machine Learning workspace.
To connect Azure Synapse and Azure Machine Learning workspace, go to your Synapse Studio and click “Manage” from the left menu. Click “Linked Services” under “External Connections”. Click “New”, search for “Azure Machine Learning”. Select “Azure Machine Learning”.
Creating the Linked Service
Before creating the Azure Machine Learning linked service, make sure you have created the Service Principal. Follow the steps explained in this article to create the service principal. If you already have a service principal, you can use the existing one. Make sure you add the service principal as a contributor to the Azure Machine Learning workspace.
Let’s create the linked service. In the Azure Machine Learning, enter the name of the linked service, select the subscription, enter the service principal ID and service principal key (from the app that you created in the previous step). Click “Test Connection” to make sure the entered configuration is correct. If the connection is successful, click Save to save the linked service. If the connection fails, make sure you have entered the correct service principal ID and secret key.
Enrich the data with Predictive ML Models
In this section, we will take a look at how easily you can enrich the data within SQL pool. The prerequisite for this is that you already have an Azure Synapse Analytics workspace, a dedicated SQL Pool and Azure ML linked service within the Azure Synapse Analytics workspace (refer above steps). The first step is to train a model inside Azure Machine Learning. Open the Azure Machine Learning Studio by clicking Launch Studio.
We will use the “Predict NYC Taxi Tips” sample for this blog series. This sample will predict the tip amount for a taxi ride in New York City. Download the file from here. Within the Azure Machine Learning workspace, go to Notebooks > Upload files. Select the downloaded .ipynb file and upload it to the notebook.
If you have not created a compute space, you can create one on the go. Allow the compute instance to get completely setup and wait for the Jupiter kernel to show idle status. Click the fast forward button (Run all cells). Follow the run result.
IMPORTANT: Make sure you uncomment the “%pip install scikit-learn==0.20.3” line in the code and restart the kernel to make the script work as expected.
The notebook will train the ONNX model and register it. You can find the new model is created successfully under the Models tab.
The test data will also be exported into a CSV file. You can download the CSV to your system and import it back into the SQL pool to test the model.
Navigate to the Synapse Studio and go to Data > Linked > Storage Accounts. Upload the CSV file.
In the Develop > SQL Scripts screen, create a new SQL Script to load the CSV into the SQL pool. Remember to change the URL of the file in the script. The URL is nothing but the URL of the linked storage account.
Run the script. This will create the table under the Data > Workspace section. Now, to view the SQL scoring wizard, right-click on the dedicated SQL pool table. Select Machine Learning > Enrich with existing model. This option will only appear if you have created a linked service for Azure Machine Learning.
You will see the model that we trained in the previous steps. If you do not see a model, follow the steps explained to create and train a model. Select the model from the list and click Continue.
Next, map the source table columns to the expected model inputs and specify the model outputs also. In case you are using the MLflow format, this mapping will be done automatically based on an internal logic. The logic will compare the similarity of names and perform the mapping. Click Continue.
In the next step, enter a name for the stored procedure that will be created after you run the script. The model binary and the metadata will be made available from Azure Machine Learning to a dedicated SQL pool table. Therefore, you must specify a table where the model has to be saved. You can also choose to create a new target table or choose an existing table. Once done, click Deploy model + open script.
Click Run to execute the scoring and get the predictions from the model.
Therefore, as you can observe, with the help of Azure Machine Learning and Synapse analytics, you can train a predictive machine learning model and register it in the Azure Machine Learning model registry. With the help of the SQL scoring wizard, you can enrich the existing data within a dedicated SQL pool.
The strong integration capability between Microsoft Azure and Azure Machine Learning ensures collaboration between the teams to build predictive analytics solutions. Azure Machine Learning supports portable ONNX model format and users can add various models into Synapse. You can build the data models within Synapse (where the actual data resides) for batch scoring. As a result, we can ensure the data remains within the Azure Synapse boundary. This also gives the teams the ability to enrich the data and deliver more actionable insights. You can use these predicted values to populate rich Power BI reports. This integration helps businesses to leverage the machine learning skills to analyze data and deliver insights.