Infinity Embedding Server
Fast embedding inference server supporting many embedding models.
About
Infinity is a high-throughput, low-latency REST API for serving text embedding and reranking models, with added support for CLIP, CLAP, and ColPali multimodal embeddings. It offers dynamic batching, caching, and an OpenAI-compatible interface, and serves many embedding models from Hugging Face. It installs via pip and runs from a CLI for production deployment. Released under the MIT license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- Vector Databases & Embeddings
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- MIT
- Added
- Apr 3, 2026
Related Tools
Python client library for Qdrant vector database.
Approximate nearest neighbor library by Spotify optimized for memory usage.
Efficient similarity search library by Meta for dense vector clustering and retrieval.
Open-source big data serving engine with built-in vector search and ML inference.
Open-source vector similarity search extension for PostgreSQL.
End-to-end vector search engine with built-in model inference.