← Back to all posts

Ollama Models on Cloud Run

2 May, 2025

Ollama Models on Cloud Run

This blog is a read-along for the repository xprilion/ollama-cloud-run which shows how to deploy various models using the Ollama API on Cloud Run, to run inference using CPU only on a serverless platform - incurring bills only when you use them.

Ollama is a framework that makes it easy for developers to prototype apps with open models. It comes with a REST API, and this repository provides Dockerfiles and deployment scripts for each model.

Google Cloud Run is a fully managed compute platform that automatically scales your stateless containers. You can run code in any language, and all dependencies are included in a container image, which Google Cloud Run handles deployment and scaling for.

Ispiration (and gemma2b code) from wietsevenema/samples.

Models

Model Version Folder Link
codegemma 2b codegemma/2b
codegemma 7b codegemma/7b
gemma 2b gemma/2b
gemma 7b gemma/7b
gemma2 9b gemma2/9b
llama3 8b llama3/8b
llava 7b llava/7b
mistral 7b mistral/7b
phi3 3.8b phi3/3.8b
qwen2 0.5b qwen2/0.5b

Usage

To build the container with a specific model included and deploy the Ollama API to a publicly accessible URL on Cloud Run, use the following command from the corresponding model's directory. For example, to deploy gemma:2b:

bash gemma/2b/deploy.sh

Respond to any prompts the command gives you. You might need to enable a few APIs and choose a region to deploy to.

Building the container takes roughly 3-20 minutes, depending on model size.

Once the command completes, the deploy command shows the public URL of the service.

Explore the API

Ask the deployed model a question:

curl <PUBLIC_URL>/api/generate -d \
 '{ 
    "model": "gemma:2b", 
    "prompt": "Why is the sky blue?" 
  }'

The first request to a new instance will take some extra setup time because the model is loaded into memory. Ollama keeps the model in memory for 5 minutes.

For the full Ollama API, refer to the API docs.

Clean Up

To clean up after following this short tutorial, you can do the following:

  • In Artifact Registry, find the cloud-run-source-deploy repository and remove the container image used by the Cloud Run service you created.
  • In Cloud Run, delete the service you created.

Links


Use for research, exploration, and prototyping.

© Anubhav Singh 2026

Powered by Whiz.pub. Create your own minimal blog today.