Llamafile

Single-file executable LLMs by Mozilla that run on any OS without installation.

Open SourceSelf HostedOffline Capable

0.0 (0)

About

Mozilla's llamafile collapses a language model and its inference engine into one executable file that runs on Windows, macOS, Linux, and BSD systems without any installation. It achieves this by combining llama.cpp with Cosmopolitan Libc, which produces polyglot binaries that work across operating systems and CPU architectures from a single build. Downloading one file yields a local chat interface, an OpenAI-compatible API server, and GGUF model support; the project also ships whisperfile, which applies the same approach to speech recognition. One practical limit: Windows caps executables at 4 GB, so larger models are loaded as external weight files alongside the runner. Pre-built llamafiles for popular open models are published for direct download. The project is maintained under Mozilla and released under the Apache 2.0 license, with its llama.cpp modifications kept MIT for upstream compatibility. It suits anyone who wants to distribute or run open models with zero setup.