Using Azure API Management with Databricks Model Serving Endpoints

When it comes to generative AI projects I’d argue that the hardest and most tedious part has moved into a new area: hosting and serving your models. Whether you’re working with CPU intensive models, or models that require GPU horsepower, sourcing the hardware, building out deployment pipelines, configuring monitoring, and then securing everything is real, serious work that requires everyone to lean in to get it right.

And then, there’s the real question of how you’re going to use those models: will you be setting up automation and doing batch processing using your models and infrastructure? Or do you want to get really serious and offer up real-time inference? If the latter, you can add one more thing to solve for: managing your front-end APIs that you will have to build to support that use case.

Where I work though, this is a solved problem: When you load your data, analyze it, build your feature stores, and then develop your while following mature MLOps practices, you can also host that model on infrastructure that is provisioned, secured, and managed for you. Seems like a great thing, right?

Well, almost, because while it solves all the other problems I mentioned, there’s still one it doesn’t: how do you wrangle your model serving endpoints in Databricks, and integrate them into your larger API management strategy? After all, most IT shops that develop and manage API services probably have really good tooling around the in-house developed endpoints, so shouldn’t you follow the same rules and integrate as part of their solution?

Fortunately for you, Databricks has thought of this and makes OpenAPI specifications available for every endpoint you. Better yet, they are bespoke specifications to your models. The question is: how do you integrate API Management solutions to be part of your current model deployment frameworks? It’s easier than you think, so let’s dive in!

“I can fix them”

One solution that’s popular, especially for people in the Microsoft ecosystem is Azure API Management (APIM). If you’ve never heard of it before, the easiest way to think of it is a way to gather all your enterprise APIs into one place and then govern, observe, and even document the various endpoints in your organization. It also supports the API Gateway feature, which can be a facade for all your APIs, and through the use of policies even transform requests in-flight to the back-end services that serve the interfaces. It’s very powerful, and useful, to organizations that leverage it.

“What about other API management solutions?”

APIM is just one example of an API Management solution, and there’s plenty of others. Even if you don’t use APIM, the workflow described in this post should work for you as well, it’s just you’ll need to figure out what the similar functions are in it to do the same thing.

So how do you, a Databricks machine learning developer or engineer, fit into this? Simple: the goal is to get your endpoints hosted in Databricks into this ecosystem. Before that though, we need to do a few things, like make sure we deploy an APIM Workspace in your Azure environment, and with that, let’s explore some of the terms used when talking about APIM:

Workspace – Similar to Databricks, each APIM resource you deploy is known as a workspace, which serves as a “top-level container” for every object inside it.
API – The service definition inside of APIM. This usually consists of the service URL or URI, and every endpoint that exists on it. For endpoint, detailed information about the type of requests it will accept, what responses can be sent back, and more. You can even set up mock responses, so developers can work with your API without incurring usage on your Databricks endpoints.
Product – A collection of APIs can be grouped into a Product; for instance, if you have a series of models being served for a particular business unit or project, you could group them into
Subscription – Subscriptions give you a way to secure access to your backend APIs; this way, you can provide different credentials, if needed, to your backend services.
Policy – Policies let you adjust, modify, route, and secure your backend APIs that you’re specifying if APIM; you can add common headers, set up authentication, and more.

Once your APIM workspace is ready (or you get access to one) we then need to add an API. We could manually provide every detail of the API here, such as the different requests, the types of responses, and any required parameters, but there’s an easier way. To see how easy, let’s jump back over to Databricks. In my example, I deployed the example RAG Chatbot provided by Databricks here. The example notebooks, if you were to run the “Simple App” first and second notebooks in it end-to-end, will take care of building a simple chatbot built to answer questions about Databricks. It also will register the model in Unity Catalog and then create a model serving endpoint for you. It’ll take a few minutes to get ready, but when it’s done, if you check the “Serving” page, you should see the endpoint ready to answer questions:

And you can query it directly from that page:

Now for the cool part: remember how I said every model serving endpoint has an OpenAPI spec? Well, all we have to do now is get it. First, let’s do it manually through the Databricks CLI:

The get-open-api command will retrieve the OpenAPI spec of the model serving endpoint. There are other ways to get this too, which we’ll come back to in a moment. But look at this document: this is our OpenAPI spec document.

{
    "openapi": "3.1.0",
    "info": {
        "title": "agents_main-dbdemos_rag_chatbot-dbdemos_rag_demo",
        "version": "1"
    },
    "servers": [
        {
            "url": "https://adb-3436974546539940.0.azuredatabricks.net/serving-endpoints/agents_main-dbdemos_rag_chatbot-dbdemos_rag_demo"
        }
    ],
    "paths": {
        "/served-models/main-dbdemos_rag_chatbot-dbdemos_rag_demo_1/invocations": {
            "post": {
                "requestBody": {
                    "content": {
                        "application/json": {
                            "schema": {
                                "oneOf": [
                                    {
                                        "type": "object",
                                        "properties": {
                                            "dataframe_split": {
                                                "type": "object",
                                                "properties": {
                                                    "index": {
                                                        "type": "array",
                                                        "items": {
                                                            "type": "integer"
                                                        }
                                                    },
                                                    "columns": {
                                                        "description": "required fields: messages",
                                                        "type": "array",
                                                        "items": {
                                                            "type": "string",
                                                            "enum": [
                                                                "messages"
                                                            ]
                                                        }
                                                    },
                                                    "data": {
                                                        "type": "array",
                                                        "items": {
                                                            "type": "array",
                                                            "prefixItems": [
                                                                {
                                                                    "type": "array",
                                                                    "items": {
                                                                        "required": [
                                                                            "content",
                                                                            "role"
                                                                        ],
                                                                        "type": "object",
                                                                        "properties": {
                                                                            "content": {
                                                                                "type": "string"
                                                                            },
                                                                            "role": {
                                                                                "type": "string"
                                                                            }
                                                                        }
                                                                    }
                                                                }
                                                            ]
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    },
                                    {
                                        "type": "object",
                                        "properties": {
                                            "dataframe_records": {
                                                "type": "array",
                                                "items": {
                                                    "required": [
                                                        "messages"
                                                    ],
                                                    "type": "object",
                                                    "properties": {
                                                        "messages": {
                                                            "type": "array",
                                                            "items": {
                                                                "required": [
                                                                    "content",
                                                                    "role"
                                                                ],
                                                                "type": "object",
                                                                "properties": {
                                                                    "content": {
                                                                        "type": "string"
                                                                    },
                                                                    "role": {
                                                                        "type": "string"
                                                                    }
                                                                }
                                                            }
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    }
                },
                "responses": {
                    "200": {
                        "description": "Successful operation",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "object",
                                    "properties": {
                                        "predictions": {
                                            "type": "array",
                                            "items": {
                                                "type": "string"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "/served-models/feedback/invocations": {
            "post": {
                "requestBody": {
                    "content": {
                        "application/json": {
                            "schema": {
                                "oneOf": [
                                    {
                                        "type": "object",
                                        "properties": {
                                            "dataframe_split": {
                                                "type": "object",
                                                "properties": {
                                                    "index": {
                                                        "type": "array",
                                                        "items": {
                                                            "type": "integer"
                                                        }
                                                    },
                                                    "columns": {
                                                        "description": "required fields: request_id, source, text_assessments, retrieval_assessments",
                                                        "type": "array",
                                                        "items": {
                                                            "type": "string",
                                                            "enum": [
                                                                "request_id",
                                                                "source",
                                                                "text_assessments",
                                                                "retrieval_assessments"
                                                            ]
                                                        }
                                                    },
                                                    "data": {
                                                        "type": "array",
                                                        "items": {
                                                            "type": "array",
                                                            "prefixItems": [
                                                                {
                                                                    "type": "string"
                                                                },
                                                                {
                                                                    "type": "string"
                                                                },
                                                                {
                                                                    "type": "string"
                                                                },
                                                                {
                                                                    "type": "string"
                                                                }
                                                            ]
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    },
                                    {
                                        "type": "object",
                                        "properties": {
                                            "dataframe_records": {
                                                "type": "array",
                                                "items": {
                                                    "required": [
                                                        "request_id",
                                                        "source",
                                                        "text_assessments",
                                                        "retrieval_assessments"
                                                    ],
                                                    "type": "object",
                                                    "properties": {
                                                        "request_id": {
                                                            "type": "string"
                                                        },
                                                        "source": {
                                                            "type": "string"
                                                        },
                                                        "text_assessments": {
                                                            "type": "string"
                                                        },
                                                        "retrieval_assessments": {
                                                            "type": "string"
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    }
                },
                "responses": {
                    "200": {
                        "description": "Successful operation",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "object",
                                    "properties": {
                                        "predictions": {
                                            "type": "array",
                                            "items": {
                                                "type": "object",
                                                "properties": {
                                                    "result": {
                                                        "type": "string"
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}%

With our updated document ready to go, we could save it as a file, then upload it in our APIM portal to add the API. But let’s go one step further and automate it: back in our notebook code, let’s add a couple cells:

%pip install --upgrade databricks-sdk

%restart_python

These two commands, run in separate notebook cells, will install the latest Databricks Python SDK and restart the Python kernel to make sure the latest version is loaded. Next, we’ll use the SDK to get our OpenAPI spec from our endpoint, just like the Databricks CLI did:

from databricks.sdk import WorkspaceClient
import json

w = WorkspaceClient()

srv = w.serving_endpoints.get_open_api("<your model serving endpoint name>").as_dict()["contents"]

json_spec = json.loads(srv.read().decode())

Our variable, json_spec, has the OpenAPI spec inside of it. We’re almost ready to add it to API Management, but first, we need to clean it up a bit. See, the spec documents that are generated for the endpoint are OpenAPI 3.1 compliant, however, Azure API Management seems to choke on the import for the prefixItems definition, for whatever reason. I searched all over the place but couldn’t figure out why, so if you know leave a comment because I have no idea. But no matter: we can remove the offending bits of the spec and still submit and it will work, since we (probably) won’t be using the dataframe_split method to submit to this endpoint anyhow. The JSON snippet below will take care of it:

for k in json_spec["paths"].keys():
    x = 0
    for o in json_spec["paths"][k]["post"]["requestBody"]["content"]["application/json"]["schema"]["oneOf"]:
        if "dataframe_split" in o["properties"].keys():
            json_spec["paths"][k]["post"]["requestBody"]["content"]["application/json"]["schema"]["oneOf"].pop(x)
        x = x + 1

Now, we’re ready to go. Our last step is to import the relevant Azure SDKs for authenticating to Azure and then working with API Management:

%pip install azure-mgmt-apimanagement
%pip install azure-identity

from azure.identity import DefaultAzureCredential
from azure.mgmt.apimanagement import ApiManagementClient

credential= dbutils.credentials.getServiceCredentialsProvider("APIM")

apim_client =  ApiManagementClient(credential=credential, subscription_id="<subscription ID where APIM is deployed>")

Those two cells will first install the libraries, and then create the connection to APIM using Unity Catalog Credentials, which can be Azure Managed Identities which we can give the required permissions inside the APIM workspace. The final step, then, is to actually create the API definition in APIM, which can be done as follows:

apim_client.api.begin_create_or_update(
    resource_group_name="<resource group name where the APIM workspace is deployed>",
    service_name="<the APIM workspace name>",
    api_id="dublindata-chatbots",
    parameters = {
        "properties": {
            "description": "RAG Chatbot Example",
            "displayName": "agents_main-dbdemos_rag_chatbot-dbdemos_rag_demo",            
            "format": "openapi+json",
            "value" : json_spec,
            "serviceUrl": json_spec["servers"][0]["url"],
            "path": next(iter(json_spec["paths"]))           
        }
    }
)

For this command to work, you need to provide a few properties; the import will supply (and correct) everything else. First, let’s look at the main function call here that happens with api.begin_create_or_update. We need to provide the resource group name that the API Management Workspace exists in, and then the name of the service itself. Next, we need to specify the API ID, which needs to be unique. The API can have multiple paths and methods. Then, for this particular API, we’ll provide the description, then display name (what it will be called if you reference it in the Azure Portal, and then the value of our json_spec that has the actual definition. Lastly, we need at least one of the paths from our spec doc; the method won’t work without one, and the specification doc will append any extras. When the command finishes, if you check the APIM console, you should see the result:

Each method of the API should be present. There’s just one thing left to do: test it out.

“What do you want from theory alone?”

Inside our APIM instance, we can go to the “test” tab to submit a request to our invocation endpoint. And since our API spec has all the required information, it helpfully tells us the parameters in our request. We can also modify the request here as well, adding any other headers we need to provide. Let’s set up our test:

A note about testing your APIs through the portal: When you use the Test tab to run these tests, you may get an error that says something along the lines of not having a valid subscription key. Remember, subscription keys act as a way to provide keys from your calling applications (in this case, our testing page in the Azure portal), so if you see this, you need to add a Header to the request in the test called Ocp-Apim-Subscription-Key and one of your subscription keys associated with your product. See here for more info: https://learn.microsoft.com/en-us/azure/api-management/api-management-subscriptions

When we click “Send” however, we’re going to get an error back from our endpoint:

That’s because our API call doesn’t know how to authenticate to the Databricks model serving endpoint. This is where things get a little beyond the scope of the blog post, as there are a lot of different options here; should calls to this API through the API Gateway Service pass through a JSON Web Token? Should there be some other form of OAuth call first? That’s where you can work with your enterprise strategy on authentication to decide. For our test though, we can try this with Databricks Personal Access tokens for now, and we’ll explore using a policy definition on the request to automatically add it.

Once you create a Personal Access Token in Databricks inside your workspace, we’ll next create a Named Value in APM. From the Named Value blade in your APIM workspace, add a new value, set its type to “Secret” and paste in the Personal Access Token value. Note the “display name” value of the secret you create (not the name of the named value, weirdly, as the tooltip shows when you create the value):

Then, return to the API section, and now we can add a policy setting on the inbound processing to set the Authorization header to use the personal access token value:

Now, let’s go back and test our call again:

And that’s it! Your Databricks model serving endpoints are now integrated in Azure API management. From here, you can access other cool features, like the Developer Portal where your developers can now view, test, and even mock responses against your endpoints without actually hitting them.

One thing to take away from, however, is that API management is a really big deal, and you should be confident that just like everything else with Databricks, you can easily integrate into them thanks to open standards.

Drew Furgiuele
March 7, 2025
Administration, Architecture, Databricks, Programming
databricks, genai, llm, machinelearning, rag
1533
0

One thought on “Using Azure API Management with Databricks Model Serving Endpoints”

Pingback: Serving Databricks Models via API Management Endpoints – Curated SQL

Comments are closed.

Using Azure API Management with Databricks Model Serving Endpoints

“I can fix them”

“What do you want from theory alone?”

Share this:

Like this:

One thought on “Using Azure API Management with Databricks Model Serving Endpoints”