Self-Hosting an Open WebUI ChatGPT Clone with Model Rotation

If you have ever wanted a ChatGPT-style web app that you fully control, with the freedom to point each conversation at a different model on a different backend, the open source stack has caught up to that wish. The combination most teams converge on right now is Open WebUI for the front end, Ollama for local model hosting, and LiteLLM for proxying remote providers behind an OpenAI-compatible API. In this post I will walk through that exact setup, including a docker compose file, document upload, and multi-user access.

This is not a theoretical post. It is the boring middle of self-hosting where you actually wire things together. If you want a wider survey of the tools in this space, my colleague has a piece on self-hosted AI coding tools that pairs well with this one.

Why this stack

Open WebUI started life as the Ollama Web UI but has grown into something much closer to a polished ChatGPT clone. Its README lists a long set of features that go well beyond a chat box: a local RAG integration that supports multiple vector databases including ChromaDB, PGVector, Qdrant, Milvus, and Pinecone, web search through providers like SearXNG, Brave, Kagi, and DuckDuckGo, and a hands-free voice and video call mode that can use Whisper or OpenAI for speech to text. There is also role-based access control, LDAP and SSO, and SCIM 2.0 provisioning for Okta, Azure AD, and Google Workspace.

Ollama is the friendlier face of llama.cpp. You install it with a one-line script, run ollama pull llama3.1, and it exposes a REST API on port 11434 with both its native shape and an OpenAI-compatible endpoint. That OpenAI-compatible piece is what lets it slot in cleanly behind any client that already speaks the OpenAI protocol.

LiteLLM is the glue when you want to mix local and remote models. It sits in front of providers like Anthropic, OpenAI, Bedrock, Vertex AI, and Ollama, and exposes one OpenAI-shaped API. You can also skip it and use Open WebUI's built-in support for OpenAI-compatible APIs to add Anthropic or OpenRouter directly. I will show both approaches.

A docker compose starting point

Open WebUI ships an official image at ghcr.io/open-webui/open-webui:main. Their README also documents a bundled image that includes Ollama for GPU machines, but I find it cleaner to keep services separate so you can restart one without the other.

services:
  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=True
    volumes:
      - open-webui:/app/backend/data
    ports:
      - "3000:8080"
    restart: unless-stopped

volumes:
  ollama:
  open-webui:

Bring it up with docker compose up -d, hit http://localhost:3000, and the first account you create becomes the admin. From there, open the admin settings and pull a model: ollama pull llama3.1 or whatever fits your hardware. Open WebUI will pick it up automatically because it points at the OLLAMA_BASE_URL you set.

For GPU support, the Open WebUI README documents adding --gpus all to the run command and using the cuda tagged image. The compose translation is to add a deploy block with nvidia device reservations on the Ollama service, since that is the container actually running the model.

Adding remote models

Open WebUI supports OpenAI-compatible APIs out of the box. In the admin settings under Connections you can add base URLs and API keys for OpenAI itself, Anthropic via a compatible shim, or any provider. Each connection becomes a model source you can pick per chat.

That works fine for two or three providers. Once you start mixing local models, multiple cloud providers, and want unified spend tracking, LiteLLM earns its keep. Run it as a sidecar:

  litellm:
    image: ghcr.io/berriai/litellm:main-stable
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    volumes:
      - ./litellm.config.yaml:/app/config.yaml
    command: ["--config", "/app/config.yaml"]
    ports:
      - "4000:4000"

The LiteLLM README describes virtual key management, per user and per project spend tracking, load balancing across deployments, and request routing with retry and fallback logic. Their config takes a list of model entries, each pointing at an upstream provider, and exposes them under a single OpenAI-shaped endpoint. In Open WebUI, point a single OpenAI-compatible connection at http://litellm:4000 and every model you defined shows up.

Documents and RAG

Click the plus icon in any chat and you can upload a PDF, a Word doc, or a folder of text files. Open WebUI's RAG integration handles the chunking and embedding for you. By default it uses an internal vector store, but the README lists ChromaDB, PGVector, Qdrant, Milvus, Elasticsearch, OpenSearch, Pinecone, S3Vector, and Oracle 23ai as options you can switch to via environment variables. For a small team, the default is fine. For a larger workspace, point it at PGVector and your existing Postgres and call it a day.

There is also a Knowledge feature where admins can build curated collections that all users can query. Think of it as a shared folder of source material that any chat can pull from with a slash command.

Multi-user setup

Set WEBUI_AUTH=True in the environment, as in the compose file above, and Open WebUI will require login. The first registered account is the admin; subsequent registrations are queued for approval by default. From the admin panel you can create user groups, set per group model access, and toggle features like web search or document upload. The README documents granular permissions for who can pull models, who can create custom prompts, and who can manage knowledge bases.

For an actual production rollout, plug in your identity provider. Open WebUI supports OAuth, trusted header SSO, LDAP, and SCIM 2.0. The SCIM piece matters if you have an existing IdP that should be the source of truth for who joins or leaves the team.

Where it bites

Two practical notes. First, the Ollama base URL needs to be reachable from inside the Open WebUI container. If you skip docker compose and run Ollama on the host, use host.docker.internal on Mac and Windows, or --network host on Linux. Second, RAG quality depends on your embedding model and chunk size. The defaults are reasonable, but if your documents are technical or non-English, swap the embedding model in the admin settings.

For the official source, the Open WebUI repo lives at https://github.com/open-webui/open-webui and tracks new features fast.

Tools mentioned in this post

Open WebUI: self-hosted ChatGPT-style web app with RAG, voice, RBAC, and OpenAI-compatible API support.
Ollama: local model runtime with REST API and OpenAI-compatible endpoint, built on llama.cpp.
LiteLLM: proxy gateway that unifies 100+ LLM providers behind one OpenAI-shaped API with routing, fallbacks, and spend tracking.

Self-Hosting an Open WebUI ChatGPT Clone with Model Rotation

Self-Hosting an Open WebUI ChatGPT Clone with Model Rotation

Why this stack

A docker compose starting point

Adding remote models

Documents and RAG

Multi-user setup

Where it bites

Tools mentioned in this post

Related Tools

LiteLLM

Ollama

Open WebUI

More Articles

LLM Gateways in Production: LiteLLM vs Portkey vs Bifrost

From OpenAI to LiteLLM: Cutting the AI Bill with Smart Routing

Building a Private RAG Stack with Ollama, Qdrant, and AnythingLLM