Tools/AI Deployment & MLOps/Triton Inference Server

Featured Tool

Triton Inference Server

NVIDIA inference serving platform for deploying AI models at scale.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

Triton Inference Server by NVIDIA is an inference-serving platform for deploying models from many frameworks, including TensorRT, PyTorch, TensorFlow, ONNX, OpenVINO, Python, and vLLM. It offers dynamic batching, model ensembles, concurrent execution, and metrics, and runs across cloud, data center, edge, and embedded devices on NVIDIA GPUs or x86 and ARM CPUs. Released under the BSD-3-Clause license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: AI Deployment & MLOps
Price: Free
Platform: Local/Desktop
Difficulty: Advanced (4/5)
License: BSD-3-Clause
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Tags

inference serving nvidia multi-framework production gpu batching

Related Tools

Featured

BentoML

AI Deployment & MLOps

Framework for building production-ready AI application services.

Open SourceSelf HostedOffline

Easy

0.0 (0)

Cog

AI Deployment & MLOps

Container tool by Replicate for packaging ML models as standard Docker images.

Open SourceSelf HostedOffline

Easy

0.0 (0)

Cortex

AI Deployment & MLOps

Local AI API platform that runs LLMs on your hardware with OpenAI-compatible API.

Open SourceSelf HostedOffline

Easy

0.0 (0)

TensorFlow Serving

AI Deployment & MLOps

Production model serving system for TensorFlow models.

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

TorchServe

AI Deployment & MLOps

PyTorch model serving framework for production deployment.

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Cortex (NVIDIA)

AI Deployment & MLOps

Open-source ML deployment platform for Kubernetes.

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Browse all AI Deployment & MLOps tools