NeMo

NVIDIA framework for building and training speech AI models including ASR, TTS, and speech LLMs.

Open SourceSelf HostedOffline CapableGPU Required

0.0 (0)

About

NVIDIA NeMo is a framework for researchers and PyTorch developers building speech AI, and in its current form it concentrates on automatic speech recognition, text-to-speech, and speech-focused large language models. It leverages pretrained checkpoints published on Hugging Face, with recent releases including multilingual ASR models with controllable latency, unified offline and streaming recognition, and multilingual TTS, letting developers choose their own point on the latency and accuracy trade-off. Speech LLM work extends to full-duplex, interruptible voice conversation models. NeMo installs with pip into systems running recent Python and PyTorch versions and is designed to slot into an existing PyTorch environment rather than replace it, with CUDA acceleration optional but expected for serious training. The framework is released under the Apache 2.0 license and is used across academic speech research and production voice pipelines built on NVIDIA hardware.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: AI Frameworks & Libraries
Price: Free
Platform: Local/Desktop
Difficulty: Advanced (4/5)
License: Apache-2.0
Added: May 7, 2026

Website GitHub

Browse all AI Frameworks & Libraries tools

NeMo

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

ColossalAI

Guidance

Hugging Face Datasets

Equinox

Keras

CatBoost