Deploy a machine learning model to expose an endpoint URL for API requests.
To deploy a machine learning model, first configure a model deployment that defines how you want the deployment to run. When you start the model deployment, Model Serve provisions the required cloud resources and deploys the machine learning model. After the model deployment becomes available, you can make requests to the endpoint URL.
To deploy a machine learning model, perform the following tasks:
1Configure runtime properties in a model deployment.
2Start the model deployment.
3Optionally, test that the model is generating predictions as you expect.
Configure a model deployment
Configure a model deployment to define the runtime properties used to deploy a machine learning model.
1Select New > Model Deployment.
2On the General tab, configure general properties for the model deployment. The following table describes the properties to configure:
Property
Description
Model deployment name
Required. Name of the model deployment.
The name must begin with a letter and can contain alphanumeric characters, hyphens (-), and underscores (_). Maximum length is 100 characters.
Description
Description of the model deployment.
Location
Required. Project folder in which the model deployment resides.
If the Explore page is currently active and a project or folder is selected, the default location for the model deployment is the selected project or folder. Otherwise, the Default folder is the default location.
Machine learning model
Required. Machine learning model to deploy.
3On the Compute tab, set the Compute Units property to the maximum number of compute units that you want the model deployment to use.
One compute unit includes 4 CPUs and 16 GB of memory. The default is 4 compute units.
4Save the model deployment.
After you configure and save the model deployment, you can start the deployment or close the editor to view the model deployment overview page.
Note: You can't edit a model deployment while it's running. To edit, stop the model deployment and wait until it's unavailable.
Start the model deployment
When you start a model deployment, Model Serve deploys your machine learning model and makes the URL endpoint available for requests.
You can start a model deployment in any of the following ways:
•In the model deployment editor, select Start Deployment after you save a valid deployment.
•On the model deployment overview page, select Start Deployment.
•On the Monitor page, open the Actions menu for a model deployment and select Start.
If there are no other deployments available when you start a model deployment, Model Serve brings up the necessary cloud resources, which can take up to 10 minutes.
When you request predictions from the model deployment, a request can run for up to five minutes. If the model doesn't return a response after five minutes, the request times out.
Test the model deployment
Send a request to verify that your model is generating predictions as you expect. You can generate test predictions for model deployments that are available.
1Navigate to the Test tab in the model deployment overview.
2Enter a request call in JSON format. Ensure that your call includes all of the necessary input fields.
3Select Send Request.
Model Serve sends the request and then displays the response from the model. Model Serve returns the response in JSON format if possible. Otherwise, it returns the response as a String.