Tools/Vector Databases & Embeddings/TEI (Text Embeddings Inference)

TEI (Text Embeddings Inference)

High-performance embedding server by Hugging Face for production deployment.

Open SourceSelf HostedOffline CapableGPU Required (4GB+ VRAM)

0.0 (0)

About

Text Embeddings Inference, or TEI, is Hugging Face's high-performance server for deploying text embedding, reranking, and sequence classification models in production. Written in Rust, it skips model graph compilation for fast startup, ships small Docker images, and applies token-based dynamic batching alongside optimized inference paths built on Flash Attention, Candle, and cuBLASLt. Supported architectures span BERT-style encoders, XLM-RoBERTa, CamemBERT, JinaBERT, ModernBERT, and MPNet plus embedding families like Nomic, GTE, E5, Qwen, Mistral, and Gemma, loading Safetensors or ONNX weights. It exposes both HTTP and gRPC APIs, emits OpenTelemetry traces and Prometheus metrics, and runs on x86 and ARM CPUs, NVIDIA GPUs from Volta onward, Apple Silicon via Metal, and experimentally on AMD ROCm. Engineers building RAG pipelines, semantic search, and reranking services reach for TEI when they need scalable embedding throughput without writing their own serving layer. Released under the Apache 2.0 license.