Skip to content

Model Serving Platform - Use Case: Predicting Wine Quality

Information

Take into account that this tutorial can be used as an illustrative guide for Sidra's versions due to, from 1.13 version (2022.R3) onwards, pysidra will not be used anymore. For more information, you can contact our Sidra's Support Team.

This tutorial presents a use case of Model Serving Platform: Predicting Wine Quality.

Wine Quality is one of the most used datasets in Machine Learning to understand a wide range of regression models. The goal is to model wine quality based on physicochemical tests.

Requirements: - Databricks (it is a core service in Sidra) - Python notebook in Databricks

Where to run the notebook?

The intake clusters are generic and inmutable, with predictable loads. Therefore, it is not recommended to run model project notebooks and experiments on the data intake cluster. We recommend having separate cluster just for ML, or use the DataLabs environment, sync the data and use that cluster for the model activities.

Register Model

This first step is just to register a new model using the Model Serving API.

First, we need to authenticate against Sidra API:

from pysidra.api.auth.authentication import Authentication
from pysidra.api.client import Client

from pysidra.api.models.modelserving import Model

class Credentials:
  def __init__(self):
    self.auth_url = dbutils.secrets.get(
        scope="api", key="auth_url"
    ).lower()  # Authority
    self.client_id = dbutils.secrets.get(scope="api", key="client_id")  # ClientId
    self.scope = dbutils.secrets.get(scope="api", key="scope")  # Scope
    self.client_secret = dbutils.secrets.get(
        scope="api", key="client_secret"
    )  # ClientSecret
    self.api_url = dbutils.secrets.get(scope="api", key="api_url")


def get_token(credentials, json_error_sleep_t=30, json_error_retry_times=10):
  authenticated = False
  token = None
  while not authenticated and json_error_retry_times > 0:
    try:
      token = Authentication(
        base_url=credentials.auth_url,
        scope=credentials.scope,
        client_id=credentials.client_id,
        client_secret=credentials.client_secret,
      ).get_token()
      authenticated = True
    except json.JSONDecodeError as json_error:
      print(
      f"Error while authenticating against Sidra API. Trying in {json_error_sleep_t} seconds"
      )
      sleep(json_error_sleep_t)
      json_error_retry_times -= 1

  return token


def get_pysidra_client():
  credentials = Credentials()
  return Client(credentials.api_url, get_token(credentials))

Then, we can create a new Model, which is just a way to group experiments into the same problem to solve:

modelName="wineClassifier"

pysidra_client = get_pysidra_client()

model = Model(name=modelName, description="Example of model registered in Sidra ML platform")

model = pysidra_client.ModelServing.Model.create(model)

Track a new experiment

In the notebook, import the required packages. In this example, ElasticNet approach is used to train the model:

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient

Next step consists of preparing data to train and evaluate the model. It is important to separate features from target column. To perform that, train_test_split method from Scikit-learn package is used:

csv_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
data = pd.read_csv(csv_url, sep=';')

# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

Before training the model, it is necessary to configure the experiment in which MLflow will track all the information. In our case, below ML_Demo folder with the name of experiment:

mlflow.set_experiment('/ML_Demo/{name}/experiment'.format(name=modelName))

Finally, it is necessary to create a new MLflow run to track parameters, metrics and the model. Model parameters, alpha and l1_ratio, are both set to 0.5 value.

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2  

alpha=0.5
l1_ratio=0.5

with mlflow.start_run():
    # Execute ElasticNet
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    # Evaluate Metrics
    predicted_qualities = lr.predict(test_x)
    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    # Print out metrics
    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    # Log parameter, metrics, and model to MLflow
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    mlflow.sklearn.log_model(lr, "model")

MLflow log capacities are used, which will help a lot on tracking the variables during training.

After executing the previous piece of code, a new experiment and run appears in MLflow UI:

ci-cd-ml-model

Then, it is necessary to access to that run to copy the run id, which appears both in url and upper section of the run's details.

ci-cd-ml-model

The created Model and related metadata is stored in Model and ModelVersion Management tables in Sidra Core metadata.

Create a ModelVersion and Docker image from MLflow experiment

Now it is time to create a ModelVersion to track the experiment into our platform.

It can be performed in two ways:

  1. Create a ModelVersion with POST request and create a Docker image from that ModelVersion.
  2. Create Docker image from MLflow run.

First we set the required information about the experiment:

client = MlflowClient()

run_id = 'a198de769b314371b48138efce96d44e'
run = client.get_run(run_id)

experiment = client.get_experiment_by_name("/ML_Demo/{name}/experiment".format(name=modelName))

Finally, we can create the Docker image:

from pysidra.api.controllers.modelserving.modelversion import CreateImageRequest, DeployRequest, InferenceRequest

# We need to provide a Data Storage Unit in which everything is executed. 
dsu = pysidra_client.Datacatalog.DataStorageUnit.get_list().items[0]

image_request = CreateImageRequest(runId=run_id, idModel=model.id, imageName=modelName)

modelversion = pysidra_client.ModelServing.ModelVersion.create_image(dsu['id'], image_request)

Deploy model

At this point, it is time to deploy our model. There are two choices: Azure Container Instances (ACI) and Azure Kubernetes Services (AKS). For the sake of simplicity, we go ahead with ACI deployment.

This is the easiest step in Model Serving Platform:

modelversion = pysidra_client.ModelServing.ModelVersion.deploy(dsu['id'], DeployRequest(modelVersionId = modelversion.id))

Now, it is possible to query our model by means of the recently created web service.

Let's define a method to query the endpoint:

import requests
import json

def query_endpoint_example(scoring_uri, inputs, service_key=None):
  headers = {
    "Content-Type": "application/json",
  }
  if service_key is not None:
    headers["Authorization"] = "Bearer {service_key}".format(service_key=service_key)

  print("Sending batch prediction request with inputs: {}".format(inputs))
  response = requests.post(scoring_uri, data=json.dumps(inputs), headers=headers)
  preds = json.loads(response.text)
  print("Received response: {}".format(preds))
  return preds

And let's prepare data to request the model:

# Import various libraries including sklearn, mlflow, numpy, pandas
from sklearn import datasets
import numpy as np
import pandas as pd

csv_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
try:
  data = pd.read_csv(csv_url, sep=';')
except Exception as e:
  logger.exception("Unable to download training & test CSV, check your internet connection. Error: %s", e)

sample = data.drop(["quality"],axis=1).iloc[[0]]

query_input = sample.to_json(orient='split')
query_input = eval(query_input)
query_input.pop('index', None)

Finally, execute the request:

prod_prediction1 = query_endpoint_example(scoring_uri=modelversion.endPoint, inputs=query_input)

Undeploy model

In a normal scenario, a model can be undeployed if it is not used anymore, or a new and improved version is deployed.

To undeploy a model, it is as simple as:

pysidra_client.ModelServing.ModelVersion.undeploy(modelversion, dsu["id"])

Delete ModelVersion

In addition to the previous step, it is also possible to delete a Model or a ModelVersion. If a Model is deleted, all the related ModelVersions are also deleted. If a ModelVersion is deleted, all the Docker images and deployments are deleted as well. Besides, it is possible to remove MLflow experiments and runs, but this is an optional behaviour of this platform.

For example, to remove our ModelVersion without removing MLflow experiment or run:

NOTHING = 0
RUN = 1
ALL = 2

pysidra_client.ModelServing.ModelVersion.delete(modelversion, dsu["id"], deleteMode=NOTHING)

And finally, remove the Model:

# We can ignore deleteMode parameter since its default value is 0
pysidra_client.ModelServing.Model.delete(model, dsu["id"])


Sidra Ideas Portal


Last update: 2022-06-16
Back to top