Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
85cf573
[HWORKS-2662] Extensive improvements to serving docs
javierdlrm Mar 5, 2026
373132d
Fix lint
javierdlrm Mar 5, 2026
2601ae0
Fix lint
javierdlrm Mar 5, 2026
68637d0
Fix lint
javierdlrm Mar 5, 2026
8b363c6
Add info about init params and feature logging
javierdlrm Mar 9, 2026
503a3e8
Update docs/concepts/mlops/serving.md
javierdlrm Mar 10, 2026
5712ae5
Update docs/concepts/mlops/serving.md
javierdlrm Mar 10, 2026
d81e4e4
Update docs/concepts/mlops/serving.md
javierdlrm Mar 10, 2026
8a354de
Update docs/concepts/mlops/serving.md
javierdlrm Mar 10, 2026
debf029
Update docs/user_guides/mlops/serving/autoscaling.md
javierdlrm Mar 10, 2026
ec289c3
Update docs/user_guides/mlops/serving/deployment.md
javierdlrm Mar 10, 2026
207cb7d
Update docs/user_guides/mlops/serving/deployment.md
javierdlrm Mar 10, 2026
773abf1
Update docs/user_guides/mlops/serving/deployment.md
javierdlrm Mar 10, 2026
8c2856e
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 10, 2026
a9638dc
Update docs/user_guides/mlops/serving/troubleshooting.md
javierdlrm Mar 10, 2026
9d9cbf6
Update docs/user_guides/mlops/serving/scheduling.md
javierdlrm Mar 10, 2026
0fe80c0
Update docs/user_guides/mlops/serving/resources.md
javierdlrm Mar 10, 2026
90fa7f1
Update docs/user_guides/mlops/serving/deployment.md
javierdlrm Mar 10, 2026
60a5562
Address comments
javierdlrm Mar 10, 2026
ecd0d73
Fix trailing space
javierdlrm Mar 11, 2026
10bfd8d
Update docs/concepts/mlops/serving.md
javierdlrm Mar 12, 2026
5d72a72
Update docs/user_guides/mlops/serving/external-access.md
javierdlrm Mar 12, 2026
c8fb64e
Update docs/user_guides/mlops/serving/resources.md
javierdlrm Mar 12, 2026
3a1176d
Update docs/user_guides/mlops/serving/resources.md
javierdlrm Mar 12, 2026
281a6ab
Update docs/user_guides/mlops/serving/scheduling.md
javierdlrm Mar 12, 2026
666ec3e
Update docs/user_guides/mlops/serving/scheduling.md
javierdlrm Mar 12, 2026
bd36445
Update docs/user_guides/mlops/serving/transformer.md
javierdlrm Mar 12, 2026
019c2fe
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
e5e0e41
Update docs/user_guides/mlops/serving/scheduling.md
javierdlrm Mar 12, 2026
6fd88e9
Update docs/user_guides/mlops/serving/transformer.md
javierdlrm Mar 12, 2026
b706493
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
b4e1262
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
c6b2c63
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
2e35129
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
b2a6499
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
6d5fd68
Update docs/user_guides/projects/python-deployment/rest-api.md
javierdlrm Mar 12, 2026
26de22c
Update docs/user_guides/projects/python-deployment/rest-api.md
javierdlrm Mar 12, 2026
0c567b1
Update docs/user_guides/projects/python-deployment/troubleshooting.md
javierdlrm Mar 12, 2026
50f9b68
Update docs/user_guides/projects/python-deployment/rest-api.md
javierdlrm Mar 12, 2026
29e95cc
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
769c89d
Update docs/user_guides/projects/python-deployment/rest-api.md
javierdlrm Mar 12, 2026
81439d9
Update docs/user_guides/mlops/serving/autoscaling.md
javierdlrm Mar 12, 2026
f9083a0
Update docs/user_guides/mlops/serving/autoscaling.md
javierdlrm Mar 12, 2026
b35c0a4
Update docs/user_guides/mlops/serving/autoscaling.md
javierdlrm Mar 12, 2026
048aac6
Update docs/user_guides/mlops/serving/autoscaling.md
javierdlrm Mar 12, 2026
6011e45
Update docs/user_guides/mlops/serving/autoscaling.md
javierdlrm Mar 12, 2026
550f50c
Update docs/user_guides/mlops/serving/rest-api.md
javierdlrm Mar 12, 2026
1b5dc34
Update docs/user_guides/mlops/serving/rest-api.md
javierdlrm Mar 12, 2026
8aebbeb
Update docs/user_guides/mlops/serving/scheduling.md
javierdlrm Mar 12, 2026
9a03246
Update docs/user_guides/mlops/serving/scheduling.md
javierdlrm Mar 12, 2026
1f0ed83
Update docs/user_guides/projects/python-deployment/python-deployment.md
javierdlrm Mar 12, 2026
d3cccd0
Update docs/user_guides/mlops/serving/transformer.md
javierdlrm Mar 12, 2026
92613f4
Update docs/user_guides/mlops/serving/transformer.md
javierdlrm Mar 12, 2026
60fa9dc
Update docs/user_guides/mlops/serving/transformer.md
javierdlrm Mar 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/concepts/hopsworks.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ Data can be also be securely shared between projects.
## Data Science Platform

You can develop feature engineering, model training and inference pipelines in Hopsworks.
There is support for version control (GitHub, GitLab, BitBucket), Jupyter notebooks, a shared distributed file system, many bundled modular project python environments for managing python dependencies without needing to write Dockerfiles, jobs (Python, Spark, Flink), and workflow orchestration with Airflow.
There is support for version control (GitHub, GitLab, BitBucket), Jupyter notebooks, a shared distributed file system, many bundled modular project Python environments for managing Python dependencies without needing to write Dockerfiles, jobs (Python, Spark, Flink), and workflow orchestration with Airflow.
25 changes: 17 additions & 8 deletions docs/concepts/mlops/serving.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,41 @@
In Hopsworks, you can easily deploy models from the model registry in KServe or in Docker containers (for Hopsworks Community).
KServe is the defacto open-source framework for model serving on Kubernetes.
You can deploy models in either programs, using the HSML library, or in the UI.
In Hopsworks, you can easily deploy models from the model registry using [KServe](https://kserve.github.io/website/latest/), the standard open-source framework for model serving on Kubernetes.
You can deploy models programmatically using [`Model.deploy`][hsml.model.Model.deploy] or via the UI.
A KServe model deployment can include the following components:

**`Transformer`**
**`Predictor (KServe component)`**

: A ^^pre-processing^^ and ^^post-processing^^ component that can transform model inputs before predictions are made, and predictions before these are delivered back to the client.
: A predictor runs a model server (Python, TensorFlow Serving, or vLLM) that loads a trained model, handles inference requests and returns predictions.

**`Predictor`**
**`Transformer (KServe component)`**

: A predictor is a ML model in a Python object that takes a feature vector as input and returns a prediction as output.
: A ^^pre-processing^^ and ^^post-processing^^ component that can transform model inputs before predictions are made, and predictions before these are delivered back to the client.
Not available for vLLM deployments.

**`Inference Logger`**

: Hopsworks logs inputs and outputs of transformers and predictors to a ^^Kafka topic^^ that is part of the same project as the model.
Not available for vLLM deployments.

**`Inference Batcher`**

: Inference requests can be batched to improve throughput (at the cost of slightly higher latency).

**`Istio Model Endpoint`**

: You can publish a model over ^^REST(HTTP)^^ or ^^gRPC^^ using a Hopsworks API key.
: You can publish a model over REST(HTTP) or gRPC using a Hopsworks API key, accessible via **path-based routing** through Istio.
API keys have scopes to ensure the principle of least privilege access control to resources managed by Hopsworks.
For more details on path-based routing of requests through Istio, see [REST API Guide](../../user_guides/mlops/serving/rest-api.md).

!!! warning "Host-based routing"
The Istio Model Endpoint supports host-based routing for inference requests; however, this approach is considered legacy.
Path-based routing is recommended for new deployments.

Models deployed on KServe in Hopsworks can be easily integrated with the Hopsworks Feature Store using either a Transformer or Predictor Python script, that builds the predictor's input feature vector using the application input and pre-computed features from the Feature Store.

<img src="../../../assets/images/concepts/mlops/kserve.svg">

!!! info "Model Serving Guide"
More information can be found in the [Model Serving guide](../../user_guides/mlops/serving/index.md).

!!! tip "Python deployments"
For deploying Python scripts without a model artifact, see the [Python Deployments](../../user_guides/projects/python-deployment/python-deployment.md) page.
2 changes: 1 addition & 1 deletion docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ This is a quick-start of the Hopsworks Feature Store; using a fraud use case we

### Batch

This is a batch use case variant of the fraud tutorial, it will give you a high level view on how to use our python APIs and the UI to navigate the feature groups.
This is a batch use case variant of the fraud tutorial, it will give you a high level view on how to use our Python APIs and the UI to navigate the feature groups.

| Notebooks |
| --- |
Expand Down
3 changes: 0 additions & 3 deletions docs/user_guides/fs/feature_view/feature-vectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,6 @@ However, you can retrieve the untransformed feature vectors without applying mod
entry=[{"id": 1}, {"id": 2}], transform=False
)


```

## Retrieving feature vector without on-demand features
Expand All @@ -258,7 +257,6 @@ To achieve this, set the parameters `transform` and `on_demand_features` to `Fa
entry=[{"id": 1}, {"id": 2}], transform=False, on_demand_features=False
)


```

## Passing Context Variables to Transformation Functions
Expand All @@ -274,7 +272,6 @@ After [defining a transformation function using a context variable](../transform
entry=[{"pk1": 1}], transformation_context={"context_parameter": 10}
)


```

## Choose the right Client
Expand Down
5 changes: 0 additions & 5 deletions docs/user_guides/fs/feature_view/helper-columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ for computing the [on-demand feature](../../../concepts/fs/feature_group/on_dema
inference_helper_columns=["expiry_date"],
)


```

### Inference Data Retrieval
Expand Down Expand Up @@ -88,7 +87,6 @@ However, they can be optionally fetched with inference or training data.
]
]


```

#### Online inference
Expand Down Expand Up @@ -129,7 +127,6 @@ However, they can be optionally fetched with inference or training data.
passed_features={"days_valid": days_valid},
)


```

## Training Helper columns
Expand All @@ -156,7 +153,6 @@ For example one might want to use feature like `category` of the purchased produ
training_helper_columns=["category"],
)


```

### Training Data Retrieval
Expand Down Expand Up @@ -190,7 +186,6 @@ However, they can be optionally fetched.
training_dataset_version=1, training_helper_columns=True
)


```

!!! note
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ Additionally, Hopsworks also allows users to specify custom names for transforme
transformation_functions=[add_two, add_one_multiple],
)


```

### Specifying input features
Expand All @@ -77,7 +76,6 @@ The features to be used by a model-dependent transformation function can be spec
],
)


```

### Using built-in transformations
Expand Down Expand Up @@ -106,7 +104,6 @@ The only difference is that they can either be retrieved from the Hopsworks or i
],
)


```

To attach built-in transformation functions from the `hopsworks` module they can be directly imported into the code from `hopsworks.builtin_transformations`.
Expand Down Expand Up @@ -134,7 +131,6 @@ To attach built-in transformation functions from the `hopsworks` module they can
],
)


```

## Using Model Dependent Transformations
Expand All @@ -160,7 +156,6 @@ Model-dependent transformation functions can also be manually applied to a featu
# Apply Model Dependent transformations
encoded_feature_vector = fv.transform(feature_vector)


```

### Retrieving untransformed feature vector and batch inference data
Expand All @@ -185,5 +180,4 @@ To achieve this, set the `transform` parameter to False.
# Fetching untransformed batch data.
untransformed_batch_data = feature_view.get_batch_data(transform=False)


```
1 change: 0 additions & 1 deletion docs/user_guides/fs/feature_view/training-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,6 @@ Once you have [defined a transformation function using a context variable](../tr
transformation_context={"context_parameter": 10},
)


```

## Read training data with primary key(s) and event time
Expand Down
21 changes: 3 additions & 18 deletions docs/user_guides/mlops/serving/api-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,9 @@
## Introduction

Hopsworks supports both REST and gRPC as API protocols for sending inference requests to model deployments.
While REST API protocol is supported in all types of model deployments, support for gRPC is only available for models served with [KServe](predictor.md#serving-tool).
While REST API protocol is supported in all types of model deployments, gRPC is currently supported for **Python model deployments** only.

!!! warning
At the moment, the gRPC API protocol is only supported for **Python model deployments** (e.g., scikit-learn, xgboost).
Support for Tensorflow model deployments is coming soon.

## GUI
## Web UI

### Step 1: Create a new deployment

Expand Down Expand Up @@ -40,17 +36,7 @@ To navigate to the advanced creation form, click on `Advanced options`.

### Step 3: Select the API protocol

Enabling gRPC as the API protocol for a model deployment requires KServe as the serving platform for the deployment.
Make sure that KServe is enabled by activating the corresponding checkbox.

<p align="center">
<figure>
<img style="max-width: 85%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_adv_form_kserve.png" alt="KServe enabled in advanced deployment form">
<figcaption>Enable KServe in the advanced deployment form</figcaption>
</figure>
</p>

Then, you can select the API protocol to be enabled in your model deployment.
You can select the API protocol to be enabled in your model deployment in the advanced deployment form.

<p align="center">
<figure>
Expand Down Expand Up @@ -102,7 +88,6 @@ Once you are done with the changes, click on `Create new deployment` at the bott
my_deployment = ms.create_deployment(my_predictor)
my_deployment.save()


```

### API Reference
Expand Down
154 changes: 154 additions & 0 deletions docs/user_guides/mlops/serving/autoscaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# How To Configure Scaling For A Deployment

## Introduction

This guide explains how to set up **autoscaling** for model deployments using either the [web UI](#web-ui) or the [Python API](#code).

Deployments use [Knative Pod Autoscaler (KPA)](https://knative.dev/docs/serving/autoscaling/) to automatically scale the number of replicas based on traffic.
Autoscaling enables the deployment to use resources more efficiently, by growing and shrinking the allocated resources according to its actual, real-time usage.

See [Scale metrics](#scale-metrics) and [Scaling parameters](#scaling-parameters) for details on the available scaling options.

## Web UI

### Step 1: Create new deployment

If you have at least one model already trained and saved in the Model Registry, navigate to the deployments page by clicking on the `Deployments` tab on the navigation menu on the left.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployments_tab_sidebar.png" alt="Deployments navigation tab">
<figcaption>Deployments navigation tab</figcaption>
</figure>
</p>

Once in the deployments page, you can create a new deployment by either clicking on `New deployment` (if there are no existing deployments) or on `Create new deployment` it the top-right corner.
Both options will open the deployment creation form.

### Step 2: Go to advanced options

A simplified creation form will appear including the most common deployment fields from all available configurations.
Autoscaling is part of the advanced options of a deployment.
To navigate to the advanced creation form, click on `Advanced options`.

<p align="center">
<figure>
<img style="max-width: 55%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_simple_form_adv_options.png" alt="Advance options">
<figcaption>Advanced options. Go to advanced deployment creation form</figcaption>
</figure>
</p>

### Step 3: Configure autoscaling

In the `Autoscaling` section of the advanced form, you can configure the scaling parameters for the predictor and/or the transformer (if available).
You can set the scale metric, target value, minimum and maximum instances, as well as the panic and stable window parameters.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_adv_form_scaling.png" alt="Autoscaling configuration for the predictor and transformer components">
<figcaption>Autoscaling configuration for the predictor and transformer</figcaption>
</figure>
</p>

Once you are done with the changes, click on `Create new deployment` at the bottom of the page to create the deployment for your model.

## Code

### Step 1: Connect to Hopsworks

=== "Python"

```python
import hopsworks

project = hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()
```

### Step 2: Define the predictor scaling configuration

You can use the [`PredictorScalingConfig`][hsml.scaling_config.PredictorScalingConfig] class to configure the scaling options according to your preferences.
Default values for scaling metrics and parameters are listed in the [Scale metrics](#scale-metrics) and [Scaling parameters](#scaling-parameters) sections above.

=== "Python"

```python
from hsml.scaling_config import PredictorScalingConfig

predictor_scaling = PredictorScalingConfig(
min_instances=1, max_instances=5, scale_metric="RPS", target=100
)
```

### Step 3 (Optional): Define the transformer scaling configuration

If a transformer script is also provided, you can use the [`TransformerScalingConfig`][hsml.scaling_config.TransformerScalingConfig] class to configure the scaling options according to your preferences.
Default values for scaling metrics and parameters are listed in the [Scale metrics](#scale-metrics) and [Scaling parameters](#scaling-parameters) sections above.

=== "Python"

```python
from hsml.scaling_config import TransformerScalingConfig

transformer_scaling = TransformerScalingConfig(
min_instances=1, max_instances=3, scale_metric="CONCURRENCY", target=50
)
```

### Step 4: Create a deployment with the scaling configuration

=== "Python"

```python
my_model = mr.get_model("my_model", version=1)

# optional
my_transformer = ms.create_transformer(
script_file="Resources/my_transformer.py",
scaling_configuration=transformer_scaling
)

my_deployment = my_model.deploy(
scaling_configuration=predictor_scaling,
# optional:
transformer=my_transformer
)
```

### API Reference

[`PredictorScalingConfig`][hsml.scaling_config.PredictorScalingConfig]

[`TransformerScalingConfig`][hsml.scaling_config.TransformerScalingConfig]

## Scale metrics

The autoscaler supports two metrics to determine when to scale.
See [Knative autoscaling metrics](https://knative.dev/docs/serving/autoscaling/autoscaling-metrics/) for more details.

| Scale Metric | Default Target | Description |
| ------------ | -------------- | ------------------------------- |
| RPS | 200 | Requests per second per replica |
| CONCURRENCY | 100 | Concurrent requests per replica |

## Scaling parameters

The following parameters can be used to fine-tune the autoscaling behavior.
See [scale bounds](https://knative.dev/docs/serving/autoscaling/scale-bounds/), [autoscaling concepts](https://knative.dev/docs/serving/autoscaling/autoscaling-concepts/) and [scale-to-zero](https://knative.dev/docs/serving/autoscaling/scale-to-zero/) in the Knative documentation for more details.

| Parameter | Default | Range | Description |
| ----------------------------- | ------- | ------ | ------------------------------------------- |
| `minInstances` | — | ≥ 0 | Minimum replicas (0 enables scale-to-zero) |
| `maxInstances` | — | ≥ 1 | Maximum replicas (cannot be less than min) |
| `panicWindowPercentage` | 10.0 | 1–100 | Panic window as percentage of stable window |
| `stableWindowSeconds` | 60 | 6–3600 | Stable window duration in seconds |
| `panicThresholdPercentage` | 200.0 | > 0 | Traffic threshold to trigger panic mode |
| `scaleToZeroRetentionSeconds` | 0 | ≥ 0 | Time to retain pods before scaling to zero |

!!! note "Cluster-level constraints"
==Administrators== can set cluster-wide limits on the maximum and minimum number of instances. When the minimum is set to 0, scale-to-zero is enforced for all deployments.
Loading
Loading