Machine Learning in SQL Server





ML Toolkit Inside SQL Server



This article describes the steps to create a procedure that uses the mtcars dataset included in R to create a simple generalized linear model (GLM) that can predict the likelihood of a vehicle being equipped with a manual transmission. The second procedure is for estimation β€” it calls the model created in the first procedure to output a set of predictions based on the new data.



Explanation of terms



SQL is a language of structured queries sent to the database.



SQL Server is a relational database from Microsoft.



Machine Learning Services is a component of SQL Server that enables you to execute Python and R scripts on data.



A trigger in SQL is a mechanism for reacting to a change in the state of a database.



Scripts are small programs designed for a narrow, most often periodic, range of tasks.



R is a programming language created specifically for data processing.



Reasons Why SQL Server is Beneficial for Machine Learning



Let's discuss the list of available benefits of using ML in SQL Server.



One of the most important reasons is the convenience of storing SQL commands and ML code in a common area of ​​visibility, which allows you to take full advantage of the two technologies.



Another important reason is security, because if the conditional database server is located in one place, and requests to it come from another place, the data can be intercepted. If the DBMS kernel stores data and calls code, this problem can be avoided.



Among other things, SQL Server has good support for the R programming language, both in terms of supplied libraries and in terms of performance. According to the most recent benchmarks, the database engine is capable of calculating about a million R predictions per second ( link ).



Checking the health of SQL Server



First, let's make sure that the machine learning service and extensions for R are working as expected. To do this, run the following code.



EXEC sp_execute_external_script @language = N'R'
, @script = N'
OutputDataSet <- data.frame(installed.packages()[,c("Package", ...;'
WITH result sets((Package NVARCHAR(255)
, Version NVARCHAR(100)...));


Result:







Let's briefly analyze the SQL command:



1.  EXEC sp_execute_external_script @language = N'R'
Here we define that the kernel should handle the R language



2.  @script = N'OutputDataSet <- data.frame(installed...'
script this is a special variable that can be processed in a subsequent request, equal to the result of the R command



3.  WITH result sets((Package NVARCHAR(255)..
Recursive content display operation is used scriptin the form of a table.



An example of using ML in SQL Server ( link )



Create a table to store the data with which we will train the model:



CREATE TABLE dbo.MTCars(
hp int NOT NULL,
...
wt decimal(10, 3) NOT NULL,
am int NOT NULL);


We enter data into it:



INSERT INTO dbo.MTCars
EXEC sp_execute_external_script @language = N'R'
    , @script = N'MTCars <- mtcars;'
    , @input_data_1 = N''
    , @output_data_1_name = N'MTCars';


Result:







Create and train the model:



CREATE PROCEDURE generate_GLM
... , @script = N'carsModel <- carsModel <- glm(... data = MTCarsData, ...;
        trained_model <- ...'
    , @input_data_1 = N'SELECT hp, wt, am FROM MTCars'
    , @input_data_1_name = N'MTCarsData'
    ...;


At this stage, I would like to focus on how R receives data on the SQL Server engine: @ input_data_1 are the columns of the table, which are represented in the code as a variable @ input_data_1_name



Create a table for the model:



CREATE TABLE GLM_models (
    model_name varchar(30) not null default('default model') primary key,
    model varbinary(max) not null
);


We save the model:



INSERT INTO GLM_models(model)
EXEC generate_GLM;


Result:







SQL Server is able to load trained models into tables so that they can be quickly reused later.



We create a table where data for analysis will be stored:



CREATE TABLE dbo.NewMTCars(
    hp INT NOT NULL
    , wt DECIMAL (10,3) NOT NULL
    , am INT NULL)


We fill in with random data within the required range:




INSERT INTO dbo.NewMTCars(hp, wt) VALUES (110, 2.634)
INSERT INTO dbo.NewMTCars(hp, wt) VALUES (72, 3.435)
INSERT INTO dbo.NewMTCars(hp, wt) VALUES (220, 5.220)
INSERT INTO dbo.NewMTCars(hp, wt) VALUES (120, 2.800)


Result: We







predict the result for new data:



DECLARE @glmmodel varbinary(max) = ...
    ...    , @script = N'
            --     
            '
   --    ;


Result:







The closer the predicated_am value is to one, the greater the chance that the car has a manual transmission.



Conclusion



In addition to this example, machine learning in data processing can be used to separate useful information from noise, find dependencies between columns, and much more.



In addition to the functionality presented above, you can apply more advanced prediction methods and set up triggers that fire every time new information comes in like:



CREATE TRIGGER `add_car` AFTER INSERT ON `NewMTCars `


… Apply the forecasting procedure.



All Articles