It’s time to celebrate! All the tests on your data scientist workstation prove that this is really it: Your machine learning model finally makes excellent predictions about the future, classifies the input text/images to right categories, or does something else your model is supposed to do. Day after the celebrations your project manager asks you how the other applications can start using your supermodel in production? Hmmm… turns out that you haven’t thought that at all.

This article is about how to do just that: How to publish your model for other applications to use. You’ll create a small web application, which uses your model, and offers a REST API to the outside world. The callers feed in input new values, and get the model output as response. No rocket science here.

The process from a distance

You’re already familiar with the traditional machine learning workflow. You collect and cleanse the input data, set up your model, do the training and maybe even run a bit hyperparameter tuning. When you’re happy with the result, you export the model by serializing it into a file. In the good old days another app would then run scheduled on a server, and use your model to make the predictions as a batch operation. This method is still valid for certain use cases, but when we today talk about publishing the model, we mean producing a http REST API for multiple client applications to call. For this you need a model wrapper app, which uses your serialized model, and offers the API for client apps. Client apps send in new data, and out comes the prediction.

Process from a distance

Image: Process from a distance

Publishing as REST API

REST APIs are today’s de-facto method for publishing functionality for other applications to use. Client apps use http requests to call the API, and input and output parameters are passed as text formatted json objects. Client application can be really anything: A mobile app on your smartphone, a javascript based browser user interface, or an integration app that’s responsible for moving data from place A to B. Client can even be another API app that uses your API to enrich it’s own data.

To make this everything happen, you’ll need three parts: 1) Your model trainer app, which will store the finished model to a 2) common storage, where the 3) publishing API app picks it up, and serves the incoming API calls. Simple as that. And you can implement these with any language you prefer or are used to.

Python sample apps

Get the source code!


The repo contains a sample model trainer ( in the PySampleModelCreator dir), and the simplest model publishing python flask app ( in PyFlaskApp dir). Both the and are fully working skeleton apps, which you can use to implement your own model publishing. The end result will be like in the following image:

The apps & common file storage python flask example

Image: The apps & common file storage python flask example

Prerequisite: Running the sample apps which use Azure storage as the model common storage requires you to have an Azure subscription (you can get one free here).

Common Storage

The idea of common storage is to have a location for your model files, where the both the trainer app and the publishing app can access them. Traditionally the trainer app has serialized the finished model files to a local disk drive (like in /tmp directory) and used the files from there when needed. In this excercise, the aim is to create a publishing app, which will be run on a server (or a container). The server/container has no access to your data science workstation local filesystem, so we need a place to store the model files where they can be loaded when needed.

For this excercise I chose Azure blob storage as the common storage. It’s cheap, well secured, and super easy to use. So start by creating a storage account, and a blob container to store your files. If you are not familiar on how to do this, start from this page that explains you the storage account basics, and here are the instructions to create new storage account and private blob container.

In the sample apps, you need to know your storage account name, access key and container name. Here is a screenshot that helps you locate these on azure portal.

Image: Storage account name, access keys location and blob storage containers.

Model trainer app

This is the data scientists’ comfort zone. The model trainer app is responsible for reading the data, training the models and then serializing the output to a file. We need to make a little addition to your existing model trainer:  When finished, it must copy the model files to the common storage.

Our sample model trainer, first creates a simple numpy array as the model to publish. The finished model is first stored to a local file with numpy serialization. upload_model_to_blob() copies the local file to our common storage (Azure blob storage).

To run the sample model trainer, change your own storage account settings in the beginning of the file.

  • storagename = mystorageaccountname“. Set your storage account name here. 
  • storagekey = “mystorageaccountkey”. The key can be obtained from your storage account
  • storagecontainer = “mycontainername”. The container name you gave to your blob container.

Flask app for publishing the API

This app is responsible for offering the REST API for your model for other applications to use (in the sample repo: The app:

  1. Copies the model files from common storage to local disk.
  2. Reads the local model files and initialized the model for use.
  3. Starts up the flask based http-server, which serves responses to client calls.

Flask is a “python microframework” which offers you a simplish method to implement the http server. Using Flask is just my taste, you could use any of the other available lib you may prefer.

In the sample app, calling the root address “/” will produce you a simple welcome page just that you can see the app is running ok. The API calls to url address /api/v1/fake are routed to fakeApi() function, which is responsible for:

  1. Reading the input parameters from http request
  2. Feeding the received input to the model
  3. Picking up the prediction from the model output
  4. Formulating the reply Json object
  5. Return status 200 (ok) and the generated json response

You can run this server app with the normal “python3” command. It starts an http server to your machine’s port 5000, and you can use browser to call the api.

  • http://localhost:5000 — gives you the hello page on your browser.
  • http://localhost:5000/api/v1/fake?input=3 — to call the API. This will return you the json object containing the prediction.

That’s pretty much it. You have now published your model as python flask application, and succesfully called the API that returns you the model prediction.


Now you have your trainer app, common storage for model files and a python flask app to publish the model. You can re-run your trainer app whenever you want to update your model. The model files are just overwritten in common storage.

You can now also create a (virtual machine) server, and make it run your publisher flask app. This server would then be used as the to respond to other applications http requests.

What next?

Running the publisher flask app in a dedicated virtual machine is not very handy solution, although this was the way it was done in pre-cloud era. Next step is really to make the app more deployment and maintenance friendly.

In real life, the machine learning community hasn’t reached a consensus yet on how to you should deploy the API apps. While trying to avoid making assumptions too early, it would seem that containers is going to be the winning bet. The method presented here is very simple and straightforward, and fully supports containerization. A model published now is better than waiting for the rest of the world trying to decide on how the publishing (or model management) should be done.

Next: Package your publisher app into a docker container (For your convenience, I’ve included the API app Dockerfile in the repo if you like to experiment), upload it into container registry and then run it on your favorite container execution platform. For ML models, always an excellent runtime platform is Azure Web App for Containers.

Read the sequel: Run Python Web Apps on Azure