While Stream Analytics does some things very well in creating intelligent systems it is still based on programmed logic. For example it will allow you to define simple fraud detection logic (if a credit card purchase is done in Russia at this moment but the same credit cars was used in USA an hour earlier) you can detect that directly in stream analytics. Another example you can detect directly in Stream Analytics could be to detect temperature deviations an aquarium compared to the last three entries which could indicate a problem in the thermostat. Sometimes, however, you’ll need to incorporate more advanced prediction based logic and this is where this post comes in as a help on how to add your Machine Learning experiments to your Stream Analytics.
The main issue is that you have to define all logic yourself in Stream Analytics Queries (which is very powerful) but by using machine learning algorithms you can incorporate prediction based trained logic into your stream to further improve your stream processing. Such logic could be extended to include normal user buying patterns, how many temperature deviations are normal in the aquarium before the thermostat needs to be replaced. Such logic could save time, money and increase goodwill but often has the characteristics of being difficult to define by setting up rule-based (coded) predictions and can be greatly put to use with machine learning algorithms by learning from observations. Consider the cost of not just having to replace the thermostat for the aquarium but also having to replace all the fish because they died due to the temperature deviation. If you could predict the problem you could save money for the consumer. This scenario can sometimes be referenced to proactive maintenance and could probably have some important decision factors like tank size, thermostat brand, thermostat model, water type (salt%), fish sensitivity and would be difficult to write all this logic in code but if you have previous historical data and can train machine learning with it you can fill in the missing logic and make predictions to yours Stream Analytics definition based on actual events rather than manually guessed logic.
This post will not go through the complex process of training a model, I would recommend that you start at Machine Learning Studio for introduction on Azure Machine Learning but assumes you have defined an experiment and have generated a Web Service based on one of these already. This post will neither go through the basics of setting up Stream Analytics Inputs, Queries and Outputs but will rather assume that you have all this already but want to combine your Stream Analysis and your Experiment into a working whole.
You likely have a scenario similar to the following.
You have one or several input streams (likely event hubs or storage blobs). Then you have your analytical queries analyzing the stream and the results are then processed to one or several output streams (for example one archiving stream that saves all results to a blob storage file, and likely some other stream like sending found deviations to for example a service bus queue for further processing).
But when you incorporate Machine Learning in the algorithm you typically want to improve your model with one or several Machine Learning Web Services before you make your decisions.
This post will focus on how to set this up in Azure.
Set things up
Since Stream Analytics does not contain a visual drag/drop scenario but is based on a query based syntax the first thing you have to do is to make sure that your inputs and outputs are defined correctly and that you know their names. The example that I will go through will have the following inputs/outputs. For simplicity it will analyze each message and will not use any time based queries but in reality you will likely combine the strength of advanced stream analytics queries even if you incorporate machine learning.
- inputCreditCardTrans: This is the Event Hub input stream which contains information about credit card transactions (including amount, purchase country etc.) and payer person information (name, age, country etc.).
- ArchiveOutput: All transactions (Fraud or No Fraud are stored in this output). This to save data that was based to do the routing and to be able to rerun it or use for further machine learning training.
- fraudsToQueue: The detected frauds are sent to a service bus queue that should take action on the possible frauds. Maybe send a message to the customer contact to call the credit card owner to verify whether it is a fraud or not.
- outputFraud: All likely frauds are logged to a special Blob in storage.
Add your Machine learning web service to Stream Analytics
You can do this in two ways. You can (if it is your own algorithm and if it exists in your subscription where the Stream Analytics is presented) browse to it through the UI and just select the right method in the drop downs or you can enter it manually (for all other scenarios). More about that later but be prepared that you might some information about the ML Web Method if you cannot browse to it.
Either way you go to the functions tab in stream analytics
In the bottom of the portal press the Add Function button.
By pressing the Add function to start the wizard.
You are now at the step that I described earlier where you connect to Machine Learning in two ways. I will should how you set this up in both ways below.
Before I do I will just clarify that the Alias will be the name of the method so use a descriptive name, especially if you will have more than one function to make your query easy to interpret later.
Add function from Current Subscription
This is of course the simplest method and is done within seconds.
You just simply select the Workspace, Web Service (yes I know poorly named experiment) and the endpoint and the wizard itself sets the credentials and finishes the configuration.
Provide Machine Learning settings manually
First you need to obtain the URL and credentials. This requires you to go through a series of steps to obtain it. I will show you how to do this from the Azure portal but you can go through Machine Learning Studio also/instead.
First log on to the Azure portal with the subscription that has the ML Algorithm then go to Machine Learning in the portal and select your ML project
Then go to the web services management
Then select the Web Service that you want to include in your Stream Analytics
Then choose the endpoint (could be several endpoints to choose from) that you want to use from Stream Analytics.
You can then get the API key from the Dashboard in the bottom right corner of the screen (see 1. API Key). After you have saved this into your favourite text editor press the Request/Response link (see 2.URL) to obtain the URL.
A new page opens and you can find the URL from the Request/POST section. This is the URL that you need to enter in the Wizard.
So now we can go back to the Add Function part in Stream Analytics again and enter these two parameters like below…
How to use the added function from Stream Analytics
Either way you have now added the function. Stream Analytics will now run for a few seconds to verify the connection to the web service. Once it is done it will look like below.
Notice the OK status that indicated that it has verified the connection.
You can not look at the details of the added Function by clicking on the Alias.
From here you can see that the function takes in 5 input parameters and returns a RECORD that contains the output of the function (i.e multiple response values). This is the syntax that we have to use in our Query processing the stream input.
Extend your Query with the Added Function
So what is it we intend to do now? We must extend the original input data from the input stream to include data from the Machine Learning Function. So if the original data contained columns FirstName, LastName, Age, City, HasCreditRemark, CreditCardNo, CreditCardIssuedInCtyCode, CreditCardPurchaseInCtyCode,PurchaseAmount, PurchaseDateTime I, for example, want the output of the web service FraudEvaluation column included in the record before I do the Stream Analysis.
Do so by selecting the Query tab in the portal. In the first part of the query we add the following with statement.
So what the is the above supposed to mean? This drawing might explain it a little bit better.
- After the WITH statement we name a new “dataset” so we can run further queries on it later. In this case we name it as subquery to distinguish it from the original Input Stream.
- We still want the columns from the input stream available in the new “dataset” so we select all these columns
- We call the Added function with parameters from the original Input as parameters
- We name the result of the Function as result. To check the outcome of the fraud algorithm we must go through column result in the subquery “dataset”. This can be compared to a complex data type in some RDBs.
- We select from the Input stream named inboundCreditCardTrans.
So what we have to do now is to define the rest of the queries. In my simplified case it ended up this way.
All the subsequent queries now selects from the subquery “dataset” instead of the input stream.
- The first select just archives the results from all evaluations to the Output Stream named ArchiveOutput that logs all data to the archive
- The second logs all suspected frauds to another Output Stream to Blob named outputFraud. To find these we check the output from the ML Function and the column named ‘Fraud Evaluation’ within square brackets in the result column (result.[‘FraudEvaluation’]) if it equals ‘Likely Fraud’.
- The third does the same as the second except that it writes to a service bus queue instead
This is really all you have to do. If it fails you will get errors visible in the portal and must try to analyze. Most often due to syntax problems in the Query.