Troubleshooting / FAQ¶

Common Issues¶

Model download fails: Ensure Hugging Face is accessible. To improve download speed set the HF_TOKEN environment variable.
Model loading fails: Ensure you have the latest onnxruntime version compatible with your setup. For GPU, verify CUDA / TensorRT installation. Try a different provider (not all models compatible with all providers).
Model loading fails on onnxruntime 1.24.1: ONNX Runtime 1.24.1 does not support symlinks to data files used in the HuggingFace cache for large models. Please upgrade to 1.24.2+ or downgrade to 1.23.
Audio loading issues: Check that your WAV file is in a supported format (PCM_U8, PCM_16, PCM_24, PCM_32). Use soundfile for other formats.
Audio recognition fails: Most models support up to 20-30 seconds of audio. For longer files, use VAD for segmentation.
Slow performance: Try quantized models (e.g., quantization="int8") on CPU or TensorRT for GPU acceleration.
Incorrect segmentation with VAD: Adjust VAD parameters like threshold or min_speech_duration_ms for your audio.

For more help, check the GitHub Issues or open a new one.