Usage Examples¶
Load ONNX model from Hugging Face¶
Load ONNX model from Hugging Face and recognize WAV file:
import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3")
print(model.recognize("test.wav"))
API reference: onnx_asr.load_model, recognize
Warning
Supported WAV file formats: PCM_U8, PCM_16, PCM_24, and PCM_32 formats. For other formats, you either need to convert them first, or use a library that can read them into a NumPy array.
Supported model names¶
gigaam-v2-ctcfor GigaChat GigaAM v2 CTC (origin, onnx)gigaam-v2-rnntfor GigaChat GigaAM v2 RNN-T (origin, onnx)gigaam-v3-ctcfor GigaChat GigaAM v3 CTC (origin, onnx)gigaam-v3-rnntfor GigaChat GigaAM v3 RNN-T (origin, onnx)gigaam-v3-e2e-ctcfor GigaChat GigaAM v3 E2E CTC (origin, onnx)gigaam-v3-e2e-rnntfor GigaChat GigaAM v3 E2E RNN-T (origin, onnx)nemo-fastconformer-ru-ctcfor Nvidia FastConformer-Hybrid Large (ru) with CTC decoder (origin, onnx)nemo-fastconformer-ru-rnntfor Nvidia FastConformer-Hybrid Large (ru) with RNN-T decoder (origin, onnx)nemo-parakeet-ctc-0.6bfor Nvidia Parakeet CTC 0.6B (en) (origin, onnx)nemo-parakeet-rnnt-0.6bfor Nvidia Parakeet RNNT 0.6B (en) (origin, onnx)nemo-parakeet-tdt-0.6b-v2for Nvidia Parakeet TDT 0.6B V2 (en) (origin, onnx)nemo-parakeet-tdt-0.6b-v3for Nvidia Parakeet TDT 0.6B V3 (multilingual) (origin, onnx)nemo-canary-1b-v2for Nvidia Canary 1B V2 (multilingual) (origin, onnx)istupakov/canary-180m-flash-onnxfor Nvidia Canary 180M Flash (multilingual) (origin, onnx)istupakov/canary-1b-flash-onnxfor Nvidia Canary 1B Flash (multilingual) (origin, onnx)whisper-basefor OpenAI Whisper Base exported with onnxruntime (origin, onnx)alphacep/vosk-model-rufor Alpha Cephei Vosk 0.54-ru (origin)alphacep/vosk-model-small-rufor Alpha Cephei Vosk 0.52-small-ru (origin)t-tech/t-onefor T-Tech T-one (origin)onnx-community/whisper-tiny,onnx-community/whisper-base,onnx-community/whisper-small,onnx-community/whisper-large-v3-turbo, etc. for OpenAI Whisper exported with Hugging Face optimum (onnx-community)
Warning
Some long-ago converted onnx-community models have a broken fp16 precision version.
Using soundfile¶
import onnx_asr
import soundfile as sf
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3")
waveform, sample_rate = sf.read("test.wav", dtype="float32")
model.recognize(waveform, sample_rate=sample_rate)
API reference: onnx_asr.load_model, recognize
Batch processing¶
import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3")
print(model.recognize(["test1.wav", "test2.wav", "test3.wav", "test4.wav"]))
API reference: onnx_asr.load_model, recognize
Quantized models¶
Most models have quantized versions:
import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3", quantization="int8")
print(model.recognize("test.wav"))
API reference: onnx_asr.load_model, recognize
Timestamps and log probabilities¶
Return tokens, timestamps and log probabilities:
import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3").with_timestamps()
print(model.recognize("test1.wav"))
API reference: onnx_asr.load_model, with_timestamps, recognize, TimestampedResult
TensorRT¶
Running an ONNX model on the TensorRT provider with fp16 precision:
import onnx_asr
import tensorrt_libs # If installed via pip tensorrt-cu12-libs
providers = [
(
"TensorrtExecutionProvider",
{
"trt_max_workspace_size": 6 * 1024**3, # for big models
"trt_fp16_enable": True, # for auto conversion to fp16
},
)
]
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3", providers=providers)
print(model.recognize("test.wav"))
API reference: onnx_asr.load_model, recognize
VAD (Voice Activity Detection)¶
Load a VAD ONNX model from Hugging Face and recognize a WAV file:
import onnx_asr
vad = onnx_asr.load_vad("silero")
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3").with_vad(vad)
for res in model.recognize("test.wav"):
print(res)
API reference: onnx_asr.load_vad, onnx_asr.load_model, with_vad, recognize, SegmentResult
Tip
You will most likely need to adjust VAD parameters to get the correct results.
Supported VAD names¶
CLI¶
The package has a simple CLI interface:
onnx-asr nemo-parakeet-tdt-0.6b-v3 test.wav
For full usage parameters, see help:
onnx-asr -h
Gradio¶
Create simple web interface with Gradio:
import onnx_asr
import gradio as gr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3")
def recognize(audio):
if not audio:
return None
sample_rate, waveform = audio
waveform = waveform / 2**15
return model.recognize(waveform, sample_rate=sample_rate, channel="mean")
demo = gr.Interface(fn=recognize, inputs="audio", outputs="text")
demo.launch()
API reference: onnx_asr.load_model, recognize
Load ONNX model from local directory¶
Load ONNX model from local directory and recognize WAV file:
import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3", "models/parakeet-v3")
print(model.recognize("test.wav"))
API reference: onnx_asr.load_model, recognize
Note
If the directory does not exist, it will be created and the model will be loaded into it.
Load a custom ONNX model from Hugging Face¶
Load the Canary 180M Flash model from Hugging Face repo and recognize the WAV file:
import onnx_asr
model = onnx_asr.load_model("istupakov/canary-180m-flash-onnx")
print(model.recognize("test.wav"))
API reference: onnx_asr.load_model, recognize
Supported model types¶
- All models from supported model names
kaldi-rnntorvoskfor Kaldi Icefall Zipformer with stateless RNN-T decodernemo-conformer-ctcfor NeMo Conformer/FastConformer/Parakeet with CTC decodernemo-conformer-rnntfor NeMo Conformer/FastConformer/Parakeet with RNN-T decodernemo-conformer-tdtfor NeMo Conformer/FastConformer/Parakeet with TDT decodernemo-conformer-aedfor NeMo Canary with Transformer decodert-one-ctcfor T-Tech T-one with CTC decoderwhisper-ortfor Whisper (exported with onnxruntime)whisperfor Whisper (exported with optimum)