API Reference¶

onnx_asr ¶

A lightweight Python package for Automatic Speech Recognition using ONNX models.

Modules:

Name	Description
`adapters`	ASR adapter classes.
`asr`	Base ASR classes.
`cli`	CLI for speech recognition from WAV files.
`loader`	Loader for ASR models.
`models`	ASR and VAD model implementations.
`onnx`	Helpers for ONNX.
`preprocessors`	ASR preprocessor implementations.
`utils`	Utils for ASR.
`vad`	Base VAD classes.

Functions:

Name	Description
`load_model`	Load ASR model.
`load_vad`	Load VAD model.

load_model ¶

load_model(model: str | ModelNames | ModelTypes, path: str | Path | None = None, *, quantization: str | None = None, sess_options: SessionOptions | None = None, providers: Sequence[str | tuple[str, dict[Any, Any]]] | None = None, provider_options: Sequence[dict[Any, Any]] | None = None, cpu_preprocessing: bool | None = None, asr_config: OnnxSessionOptions | None = None, preprocessor_config: PreprocessorRuntimeConfig | None = None, resampler_config: OnnxSessionOptions | None = None) -> TextResultsAsrAdapter

Load ASR model.

Parameters:

Name	Type	Description	Default
`model` ¶	`str \| ModelNames \| ModelTypes`	Model name or type (download from Hugging Face supported if full model name is provided): GigaAM v2 (`gigaam-v2-ctc` \| `gigaam-v2-rnnt`) GigaAM v3 (`gigaam-v3-ctc` \| `gigaam-v3-rnnt` \| `gigaam-v3-e2e-ctc` \| `gigaam-v3-e2e-rnnt`) Kaldi Transducer (`kaldi-rnnt`) NeMo Conformer (`nemo-conformer-ctc` \| `nemo-conformer-rnnt` \| `nemo-conformer-tdt` \| `nemo-conformer-aed`) NeMo FastConformer Hybrid Large Ru P&C (`nemo-fastconformer-ru-ctc` \| `nemo-fastconformer-ru-rnnt`) NeMo Parakeet 0.6B En (`nemo-parakeet-ctc-0.6b` \| `nemo-parakeet-rnnt-0.6b` \| `nemo-parakeet-tdt-0.6b-v2`) NeMo Parakeet 0.6B Multilingual (`nemo-parakeet-tdt-0.6b-v3`) NeMo Canary (`nemo-canary-1b-v2`) T-One (`t-one-ctc` \| `t-tech/t-one`) Vosk (`vosk` \| `alphacep/vosk-model-ru` \| `alphacep/vosk-model-small-ru`) Whisper Base exported with onnxruntime (`whisper-ort` \| `whisper-base-ort`) Whisper from onnx-community (`whisper` \| `onnx-community/whisper-large-v3-turbo` \| `onnx-community/whisper`)	required
`path` ¶	`str \| Path \| None`	Path to directory with model files.	`None`
`quantization` ¶	`str \| None`	Model quantization (`None` \| `int8` \| ... ).	`None`
`sess_options` ¶	`SessionOptions \| None`	Default SessionOptions for onnxruntime.	`None`
`providers` ¶	`Sequence[str \| tuple[str, dict[Any, Any]]] \| None`	Default providers for onnxruntime.	`None`
`provider_options` ¶	`Sequence[dict[Any, Any]] \| None`	Default provider_options for onnxruntime.	`None`
`cpu_preprocessing` ¶	`bool \| None`	Deprecated and ignored, use `preprocessor_config` and `resampler_config` instead.	`None`
`asr_config` ¶	`OnnxSessionOptions \| None`	ASR ONNX config.	`None`
`preprocessor_config` ¶	`PreprocessorRuntimeConfig \| None`	Preprocessor ONNX and concurrency config.	`None`
`resampler_config` ¶	`OnnxSessionOptions \| None`	Resampler ONNX config.	`None`

Returns:

Type	Description
`TextResultsAsrAdapter`	ASR model class.

Raises:

Type	Description
`ModelLoadingError`	Model loading error (onnx-asr specific).

Source code in src/onnx_asr/loader.py

def load_model(
    model: str | ModelNames | ModelTypes,
    path: str | Path | None = None,
    *,
    quantization: str | None = None,
    sess_options: rt.SessionOptions | None = None,
    providers: Sequence[str | tuple[str, dict[Any, Any]]] | None = None,
    provider_options: Sequence[dict[Any, Any]] | None = None,
    cpu_preprocessing: bool | None = None,
    asr_config: OnnxSessionOptions | None = None,
    preprocessor_config: PreprocessorRuntimeConfig | None = None,
    resampler_config: OnnxSessionOptions | None = None,
) -> TextResultsAsrAdapter:
    """Load ASR model.

    Args:
        model: Model name or type (download from Hugging Face supported if full model name is provided):

                GigaAM v2 (`gigaam-v2-ctc` | `gigaam-v2-rnnt`)
                GigaAM v3 (`gigaam-v3-ctc` | `gigaam-v3-rnnt` |
                           `gigaam-v3-e2e-ctc` | `gigaam-v3-e2e-rnnt`)
                Kaldi Transducer (`kaldi-rnnt`)
                NeMo Conformer (`nemo-conformer-ctc` | `nemo-conformer-rnnt` | `nemo-conformer-tdt` |
                                `nemo-conformer-aed`)
                NeMo FastConformer Hybrid Large Ru P&C (`nemo-fastconformer-ru-ctc` |
                                                        `nemo-fastconformer-ru-rnnt`)
                NeMo Parakeet 0.6B En (`nemo-parakeet-ctc-0.6b` | `nemo-parakeet-rnnt-0.6b` |
                                       `nemo-parakeet-tdt-0.6b-v2`)
                NeMo Parakeet 0.6B Multilingual (`nemo-parakeet-tdt-0.6b-v3`)
                NeMo Canary (`nemo-canary-1b-v2`)
                T-One (`t-one-ctc` | `t-tech/t-one`)
                Vosk (`vosk` | `alphacep/vosk-model-ru` | `alphacep/vosk-model-small-ru`)
                Whisper Base exported with onnxruntime (`whisper-ort` | `whisper-base-ort`)
                Whisper from onnx-community (`whisper` | `onnx-community/whisper-large-v3-turbo` |
                                             `onnx-community/*whisper*`)
        path: Path to directory with model files.
        quantization: Model quantization (`None` | `int8` | ... ).
        sess_options: Default SessionOptions for onnxruntime.
        providers: Default providers for onnxruntime.
        provider_options: Default provider_options for onnxruntime.
        cpu_preprocessing: Deprecated and ignored, use `preprocessor_config` and `resampler_config` instead.
        asr_config: ASR ONNX config.
        preprocessor_config: Preprocessor ONNX and concurrency config.
        resampler_config: Resampler ONNX config.

    Returns:
        ASR model class.

    Raises:
        utils.ModelLoadingError: Model loading error (onnx-asr specific).

    """
    if cpu_preprocessing is not None:
        warnings.warn(
            "The cpu_preprocessing argument is deprecated and ignored (use preprocessor_config and resampler_config).",
            stacklevel=2,
        )

    loader = AsrLoader(model, path)

    default_onnx_config: OnnxSessionOptions = {
        "sess_options": sess_options,
        "providers": providers or rt.get_available_providers(),
        "provider_options": provider_options,
    }

    if asr_config is None:
        asr_config = update_onnx_providers(default_onnx_config, excluded_providers=loader.get_excluded_providers())

    if preprocessor_config is None:
        preprocessor_config = {
            **update_onnx_providers(
                default_onnx_config,
                new_options={"TensorrtExecutionProvider": {"trt_fp16_enable": False, "trt_int8_enable": False}},
                excluded_providers=OnnxPreprocessor._get_excluded_providers(),
            ),
            "max_concurrent_workers": 1,
        }

    if resampler_config is None:
        resampler_config = update_onnx_providers(
            default_onnx_config, excluded_providers=Resampler._get_excluded_providers()
        )

    return loader.create_model(asr_config, preprocessor_config, resampler_config, quantization=quantization)

load_vad ¶

load_vad(model: VadNames = 'silero', path: str | Path | None = None, *, quantization: str | None = None, sess_options: SessionOptions | None = None, providers: Sequence[str | tuple[str, dict[Any, Any]]] | None = None, provider_options: Sequence[dict[Any, Any]] | None = None) -> Vad

Load VAD model.

Parameters:

Name	Type	Description	Default
`model` ¶	`VadNames`	VAD model name (supports download from Hugging Face).	`'silero'`
`path` ¶	`str \| Path \| None`	Path to directory with model files.	`None`
`quantization` ¶	`str \| None`	Model quantization (`None` \| `int8` \| ... ).	`None`
`sess_options` ¶	`SessionOptions \| None`	Optional SessionOptions for onnxruntime.	`None`
`providers` ¶	`Sequence[str \| tuple[str, dict[Any, Any]]] \| None`	Optional providers for onnxruntime.	`None`
`provider_options` ¶	`Sequence[dict[Any, Any]] \| None`	Optional provider_options for onnxruntime.	`None`

Returns:

Type	Description
`Vad`	VAD model class.

Raises:

Type	Description
`ModelLoadingError`	Model loading error (onnx-asr specific).

Source code in src/onnx_asr/loader.py

def load_vad(
    model: VadNames = "silero",
    path: str | Path | None = None,
    *,
    quantization: str | None = None,
    sess_options: rt.SessionOptions | None = None,
    providers: Sequence[str | tuple[str, dict[Any, Any]]] | None = None,
    provider_options: Sequence[dict[Any, Any]] | None = None,
) -> Vad:
    """Load VAD model.

    Args:
        model: VAD model name (supports download from Hugging Face).
        path: Path to directory with model files.
        quantization: Model quantization (`None` | `int8` | ... ).
        sess_options: Optional SessionOptions for onnxruntime.
        providers: Optional providers for onnxruntime.
        provider_options: Optional provider_options for onnxruntime.

    Returns:
        VAD model class.

    Raises:
        utils.ModelLoadingError: Model loading error (onnx-asr specific).

    """
    loader = VadLoader(model, path)

    onnx_options = update_onnx_providers(
        {"providers": rt.get_available_providers()}, excluded_providers=loader.get_excluded_providers()
    ) | {
        "sess_options": sess_options,
        "providers": providers,
        "provider_options": provider_options,
    }

    return loader.create_model(onnx_options, quantization=quantization)

ModelNames `module-attribute` ¶

ModelNames = Literal['gigaam-v2-ctc', 'gigaam-v2-rnnt', 'gigaam-v3-ctc', 'gigaam-v3-rnnt', 'gigaam-v3-e2e-ctc', 'gigaam-v3-e2e-rnnt', 'nemo-fastconformer-ru-ctc', 'nemo-fastconformer-ru-rnnt', 'nemo-parakeet-ctc-0.6b', 'nemo-parakeet-rnnt-0.6b', 'nemo-parakeet-tdt-0.6b-v2', 'nemo-parakeet-tdt-0.6b-v3', 'nemo-canary-1b-v2', 'alphacep/vosk-model-ru', 'alphacep/vosk-model-small-ru', 't-tech/t-one', 'whisper-base']

Supported ASR model names (can be automatically downloaded from the Hugging Face).

ModelTypes `module-attribute` ¶

ModelTypes = Literal['kaldi-rnnt', 'nemo-conformer-ctc', 'nemo-conformer-rnnt', 'nemo-conformer-tdt', 'nemo-conformer-aed', 't-one-ctc', 'vosk', 'whisper-ort', 'whisper']

Supported ASR model types.

VadNames `module-attribute` ¶

VadNames = Literal['silero']

Supported VAD model names (can be automatically downloaded from the Hugging Face).

OnnxSessionOptions `typed-dict` ¶

OnnxSessionOptions(*, sess_options: SessionOptions | None = ..., providers: Sequence[str | tuple[str, dict[Any, Any]]] | None = ..., provider_options: Sequence[dict[Any, Any]] | None = ...)

Bases: TypedDict

Options for onnxruntime InferenceSession.

Parameters:

Name	Type	Description	Default
`sess_options` ¶	`SessionOptions \| None`	ONNX Session options.	`...`
`providers` ¶	`Sequence[str \| tuple[str, dict[Any, Any]]] \| None`	ONNX providers.	`...`
`provider_options` ¶	`Sequence[dict[Any, Any]] \| None`	ONNX provider options.	`...`

PreprocessorRuntimeConfig ¶

PreprocessorRuntimeConfig(*, sess_options: SessionOptions | None = ..., providers: Sequence[str | tuple[str, dict[Any, Any]]] | None = ..., provider_options: Sequence[dict[Any, Any]] | None = ...)

Bases: OnnxSessionOptions

Preprocessor runtime config.

Parameters:

Name	Type	Description	Default
`sess_options` ¶	`SessionOptions \| None`	ONNX Session options.	`...`
`providers` ¶	`Sequence[str \| tuple[str, dict[Any, Any]]] \| None`	ONNX providers.	`...`
`provider_options` ¶	`Sequence[dict[Any, Any]] \| None`	ONNX provider options.	`...`

Attributes:

Name	Type	Description
`max_concurrent_workers`	`int \| None`	Max parallel preprocessing threads (None - auto, 1 - without parallel processing).

max_concurrent_workers `instance-attribute` ¶

max_concurrent_workers: int | None

Max parallel preprocessing threads (None - auto, 1 - without parallel processing).

TensorRtOptions ¶

Options for onnxruntime TensorRT providers.

Methods:

Name	Description
`add_profile`	Add TensorRT profile options.
`get_provider_names`	Get TensorRT provider names.
`is_fp16_enabled`	Check if TensorRT provider use fp16 precision.

Attributes:

Name	Type	Description
`profile_max_shapes`	`dict[str, int]`	Maximal value for model input shapes.
`profile_min_shapes`	`dict[str, int]`	Minimal value for model input shapes.
`profile_opt_shapes`	`dict[str, int]`	Optimal value for model input shapes.

profile_max_shapes `class-attribute` ¶

profile_max_shapes: dict[str, int] = {'batch': 16, 'waveform_len_ms': 30000}

Maximal value for model input shapes.

profile_min_shapes `class-attribute` ¶

profile_min_shapes: dict[str, int] = {'batch': 1, 'waveform_len_ms': 50}

Minimal value for model input shapes.

profile_opt_shapes `class-attribute` ¶

profile_opt_shapes: dict[str, int] = {'batch': 1, 'waveform_len_ms': 20000}

Optimal value for model input shapes.

add_profile `classmethod` ¶

add_profile(onnx_options: OnnxSessionOptions, transform_shapes: Callable[..., str]) -> OnnxSessionOptions

Add TensorRT profile options.

Source code in src/onnx_asr/onnx.py

@classmethod
def add_profile(cls, onnx_options: OnnxSessionOptions, transform_shapes: Callable[..., str]) -> OnnxSessionOptions:
    """Add TensorRT profile options."""
    return update_onnx_providers(
        onnx_options,
        default_options={
            "TensorrtExecutionProvider": cls._generate_profile("trt_profile", transform_shapes),
            "NvTensorRtRtxExecutionProvider": cls._generate_profile("nv_profile", transform_shapes),
        },
    )

get_provider_names `staticmethod` ¶

get_provider_names() -> list[str]

Get TensorRT provider names.

Source code in src/onnx_asr/onnx.py

@staticmethod
def get_provider_names() -> list[str]:
    """Get TensorRT provider names."""
    return ["TensorrtExecutionProvider", "NvTensorRtRtxExecutionProvider"]

is_fp16_enabled `staticmethod` ¶

is_fp16_enabled(onnx_options: OnnxSessionOptions) -> bool

Check if TensorRT provider use fp16 precision.

Source code in src/onnx_asr/onnx.py

@staticmethod
def is_fp16_enabled(onnx_options: OnnxSessionOptions) -> bool:
    """Check if TensorRT provider use fp16 precision."""
    return bool(
        _merge_onnx_provider_options(onnx_options)
        .get("TensorrtExecutionProvider", {})
        .get("trt_fp16_enable", False)
    )

ModelLoadingError ¶

Bases: Exception

Model loading error.

adapters ¶

ASR adapter classes.

Classes:

Name	Description
`AsrAdapter`	Base ASR adapter class.
`RecognizeOptions`	Options for ASR recognition.
`SegmentResultsAsrAdapter`	ASR with VAD adapter (text results).
`TextResultsAsrAdapter`	ASR adapter (text results).
`TimestampedResultsAsrAdapter`	ASR adapter (timestamped results).
`TimestampedSegmentResultsAsrAdapter`	ASR with VAD adapter (timestamped results).
`VadOptions`	Options for VAD.

AsrAdapter ¶

AsrAdapter(asr: Asr, resampler: Resampler)

Bases: ABC, Generic[R]

Base ASR adapter class.

Create ASR adapter.

Methods:

Name	Description
`recognize`	Recognize speech (single or batch).
`with_vad`	Create ASR adapter with VAD.

Source code in src/onnx_asr/adapters.py

def __init__(self, asr: Asr, resampler: Resampler):
    """Create ASR adapter."""
    self.asr = asr
    self.resampler = resampler

recognize ¶

recognize(waveform: str | Path | NDArray[float32], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> R

recognize(waveform: list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> list[R]

recognize(waveform: str | Path | NDArray[float32] | list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, language: str | None = ..., target_language: str | None = ..., pnc: Literal['pnc', 'nopnc'] | bool = ...) -> R | list[R]

Recognize speech (single or batch).

Parameters:

Name	Type	Description	Default
`waveform` ¶	`str \| Path \| NDArray[float32] \| list[str \| Path \| NDArray[float32]]`	Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported) or Numpy array with PCM waveform. A list of file paths or numpy arrays for batch recognition are also supported.	required
`sample_rate` ¶	`SampleRates`	Sample rate for Numpy arrays in waveform.	`16000`
`language` ¶	`str \| None`	Speech language (only for Whisper and Canary models).	`...`
`target_language` ¶	`str \| None`	Output language (only for Canary models).	`...`
`pnc` ¶	`Literal['pnc', 'nopnc'] \| bool`	Output punctuation and capitalization (only for Canary models).	`...`

Returns:

Type	Description
`R \| list[R]`	Speech recognition results (single or list for batch recognition).

Raises:

Type	Description
`AudioLoadingError`	Audio loading error (onnx-asr specific).
`FileNotFoundError`	File not found error.
`Error`	WAV file reading error.
`OSError`	Other IO errors.

Source code in src/onnx_asr/adapters.py

def recognize(
    self,
    waveform: str | Path | npt.NDArray[np.float32] | list[str | Path | npt.NDArray[np.float32]],
    *,
    sample_rate: SampleRates = 16_000,
    **kwargs: Unpack[RecognizeOptions],
) -> R | list[R]:
    """Recognize speech (single or batch).

    Args:
        waveform: Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported)
                  or Numpy array with PCM waveform.
                  A list of file paths or numpy arrays for batch recognition are also supported.
        sample_rate: Sample rate for Numpy arrays in waveform.
        **kwargs: ASR options.

    Returns:
        Speech recognition results (single or list for batch recognition).

    Raises:
        utils.AudioLoadingError: Audio loading error (onnx-asr specific).
        FileNotFoundError: File not found error.
        wave.Error: WAV file reading error.
        OSError: Other IO errors.

    """
    if isinstance(waveform, list) and not waveform:
        return []

    waveform_batch = waveform if isinstance(waveform, list) else [waveform]
    result = self._recognize_batch(*self.resampler(*read_wav_files(waveform_batch, sample_rate)), **kwargs)

    if isinstance(waveform, list):
        return list(result)
    return next(result)

with_vad ¶

with_vad(vad: Vad, *, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...) -> SegmentResultsAsrAdapter

Create ASR adapter with VAD.

Parameters:

Name	Type	Description	Default
`vad` ¶	`Vad`	VAD model.	required
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

Returns:

Type	Description
`SegmentResultsAsrAdapter`	ASR with VAD adapter (text results).

Source code in src/onnx_asr/adapters.py

def with_vad(self, vad: Vad, **kwargs: Unpack[VadOptions]) -> SegmentResultsAsrAdapter:
    """Create ASR adapter with VAD.

    Args:
        vad: VAD model.
        **kwargs: VAD options.

    Returns:
        ASR with VAD adapter (text results).

    """
    return SegmentResultsAsrAdapter(self.asr, vad, self.resampler, **kwargs)

RecognizeOptions `typed-dict` ¶

RecognizeOptions(*, language: str | None = ..., target_language: str | None = ..., pnc: Literal['pnc', 'nopnc'] | bool = ...)

Bases: TypedDict

Options for ASR recognition.

Parameters:

Name	Type	Description	Default
`language` ¶	`str \| None`	Speech language (only for Whisper and Canary models).	`...`
`target_language` ¶	`str \| None`	Output language (only for Canary models).	`...`
`pnc` ¶	`Literal['pnc', 'nopnc'] \| bool`	Output punctuation and capitalization (only for Canary models).	`...`

SegmentResultsAsrAdapter ¶

SegmentResultsAsrAdapter(asr: Asr, vad: Vad, resampler: Resampler, *, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...)

Bases: AsrAdapter[Iterator[SegmentResult]]

ASR with VAD adapter (text results).

Create ASR adapter.

Parameters:

Name	Type	Description	Default
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

Methods:

Name	Description
`recognize`	Recognize speech (single or batch).
`with_timestamps`	ASR with VAD adapter (timestamped results).
`with_vad`	Create ASR adapter with VAD.

Source code in src/onnx_asr/adapters.py

def __init__(self, asr: Asr, vad: Vad, resampler: Resampler, **kwargs: Unpack[VadOptions]):
    """Create ASR adapter."""
    super().__init__(asr, resampler)
    self.vad = vad
    self._vadargs = kwargs

recognize ¶

recognize(waveform: str | Path | NDArray[float32], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> R

recognize(waveform: list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> list[R]

recognize(waveform: str | Path | NDArray[float32] | list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, language: str | None = ..., target_language: str | None = ..., pnc: Literal['pnc', 'nopnc'] | bool = ...) -> R | list[R]

Recognize speech (single or batch).

Parameters:

Name	Type	Description	Default
`waveform` ¶	`str \| Path \| NDArray[float32] \| list[str \| Path \| NDArray[float32]]`	Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported) or Numpy array with PCM waveform. A list of file paths or numpy arrays for batch recognition are also supported.	required
`sample_rate` ¶	`SampleRates`	Sample rate for Numpy arrays in waveform.	`16000`
`language` ¶	`str \| None`	Speech language (only for Whisper and Canary models).	`...`
`target_language` ¶	`str \| None`	Output language (only for Canary models).	`...`
`pnc` ¶	`Literal['pnc', 'nopnc'] \| bool`	Output punctuation and capitalization (only for Canary models).	`...`

Returns:

Type	Description
`R \| list[R]`	Speech recognition results (single or list for batch recognition).

Raises:

Type	Description
`AudioLoadingError`	Audio loading error (onnx-asr specific).
`FileNotFoundError`	File not found error.
`Error`	WAV file reading error.
`OSError`	Other IO errors.

Source code in src/onnx_asr/adapters.py

def recognize(
    self,
    waveform: str | Path | npt.NDArray[np.float32] | list[str | Path | npt.NDArray[np.float32]],
    *,
    sample_rate: SampleRates = 16_000,
    **kwargs: Unpack[RecognizeOptions],
) -> R | list[R]:
    """Recognize speech (single or batch).

    Args:
        waveform: Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported)
                  or Numpy array with PCM waveform.
                  A list of file paths or numpy arrays for batch recognition are also supported.
        sample_rate: Sample rate for Numpy arrays in waveform.
        **kwargs: ASR options.

    Returns:
        Speech recognition results (single or list for batch recognition).

    Raises:
        utils.AudioLoadingError: Audio loading error (onnx-asr specific).
        FileNotFoundError: File not found error.
        wave.Error: WAV file reading error.
        OSError: Other IO errors.

    """
    if isinstance(waveform, list) and not waveform:
        return []

    waveform_batch = waveform if isinstance(waveform, list) else [waveform]
    result = self._recognize_batch(*self.resampler(*read_wav_files(waveform_batch, sample_rate)), **kwargs)

    if isinstance(waveform, list):
        return list(result)
    return next(result)

with_timestamps ¶

with_timestamps() -> TimestampedSegmentResultsAsrAdapter

ASR with VAD adapter (timestamped results).

Source code in src/onnx_asr/adapters.py

def with_timestamps(self) -> TimestampedSegmentResultsAsrAdapter:
    """ASR with VAD adapter (timestamped results)."""
    return TimestampedSegmentResultsAsrAdapter(self.asr, self.vad, self.resampler, **self._vadargs)

with_vad ¶

with_vad(vad: Vad, *, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...) -> SegmentResultsAsrAdapter

Create ASR adapter with VAD.

Parameters:

Name	Type	Description	Default
`vad` ¶	`Vad`	VAD model.	required
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

Returns:

Type	Description
`SegmentResultsAsrAdapter`	ASR with VAD adapter (text results).

Source code in src/onnx_asr/adapters.py

def with_vad(self, vad: Vad, **kwargs: Unpack[VadOptions]) -> SegmentResultsAsrAdapter:
    """Create ASR adapter with VAD.

    Args:
        vad: VAD model.
        **kwargs: VAD options.

    Returns:
        ASR with VAD adapter (text results).

    """
    return SegmentResultsAsrAdapter(self.asr, vad, self.resampler, **kwargs)

TextResultsAsrAdapter ¶

TextResultsAsrAdapter(asr: Asr, resampler: Resampler)

Bases: AsrAdapter[str]

ASR adapter (text results).

Create ASR adapter.

Methods:

Name	Description
`recognize`	Recognize speech (single or batch).
`with_timestamps`	ASR adapter (timestamped results).
`with_vad`	Create ASR adapter with VAD.

Source code in src/onnx_asr/adapters.py

def __init__(self, asr: Asr, resampler: Resampler):
    """Create ASR adapter."""
    self.asr = asr
    self.resampler = resampler

recognize ¶

recognize(waveform: str | Path | NDArray[float32], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> R

recognize(waveform: list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> list[R]

recognize(waveform: str | Path | NDArray[float32] | list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, language: str | None = ..., target_language: str | None = ..., pnc: Literal['pnc', 'nopnc'] | bool = ...) -> R | list[R]

Recognize speech (single or batch).

Parameters:

Name	Type	Description	Default
`waveform` ¶	`str \| Path \| NDArray[float32] \| list[str \| Path \| NDArray[float32]]`	Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported) or Numpy array with PCM waveform. A list of file paths or numpy arrays for batch recognition are also supported.	required
`sample_rate` ¶	`SampleRates`	Sample rate for Numpy arrays in waveform.	`16000`
`language` ¶	`str \| None`	Speech language (only for Whisper and Canary models).	`...`
`target_language` ¶	`str \| None`	Output language (only for Canary models).	`...`
`pnc` ¶	`Literal['pnc', 'nopnc'] \| bool`	Output punctuation and capitalization (only for Canary models).	`...`

Returns:

Type	Description
`R \| list[R]`	Speech recognition results (single or list for batch recognition).

Raises:

Type	Description
`AudioLoadingError`	Audio loading error (onnx-asr specific).
`FileNotFoundError`	File not found error.
`Error`	WAV file reading error.
`OSError`	Other IO errors.

Source code in src/onnx_asr/adapters.py

def recognize(
    self,
    waveform: str | Path | npt.NDArray[np.float32] | list[str | Path | npt.NDArray[np.float32]],
    *,
    sample_rate: SampleRates = 16_000,
    **kwargs: Unpack[RecognizeOptions],
) -> R | list[R]:
    """Recognize speech (single or batch).

    Args:
        waveform: Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported)
                  or Numpy array with PCM waveform.
                  A list of file paths or numpy arrays for batch recognition are also supported.
        sample_rate: Sample rate for Numpy arrays in waveform.
        **kwargs: ASR options.

    Returns:
        Speech recognition results (single or list for batch recognition).

    Raises:
        utils.AudioLoadingError: Audio loading error (onnx-asr specific).
        FileNotFoundError: File not found error.
        wave.Error: WAV file reading error.
        OSError: Other IO errors.

    """
    if isinstance(waveform, list) and not waveform:
        return []

    waveform_batch = waveform if isinstance(waveform, list) else [waveform]
    result = self._recognize_batch(*self.resampler(*read_wav_files(waveform_batch, sample_rate)), **kwargs)

    if isinstance(waveform, list):
        return list(result)
    return next(result)

with_timestamps ¶

with_timestamps() -> TimestampedResultsAsrAdapter

ASR adapter (timestamped results).

Source code in src/onnx_asr/adapters.py

def with_timestamps(self) -> TimestampedResultsAsrAdapter:
    """ASR adapter (timestamped results)."""
    return TimestampedResultsAsrAdapter(self.asr, self.resampler)

with_vad ¶

with_vad(vad: Vad, *, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...) -> SegmentResultsAsrAdapter

Create ASR adapter with VAD.

Parameters:

Name	Type	Description	Default
`vad` ¶	`Vad`	VAD model.	required
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

Returns:

Type	Description
`SegmentResultsAsrAdapter`	ASR with VAD adapter (text results).

Source code in src/onnx_asr/adapters.py

def with_vad(self, vad: Vad, **kwargs: Unpack[VadOptions]) -> SegmentResultsAsrAdapter:
    """Create ASR adapter with VAD.

    Args:
        vad: VAD model.
        **kwargs: VAD options.

    Returns:
        ASR with VAD adapter (text results).

    """
    return SegmentResultsAsrAdapter(self.asr, vad, self.resampler, **kwargs)

TimestampedResultsAsrAdapter ¶

TimestampedResultsAsrAdapter(asr: Asr, resampler: Resampler)

Bases: AsrAdapter[TimestampedResult]

ASR adapter (timestamped results).

Create ASR adapter.

Methods:

Name	Description
`recognize`	Recognize speech (single or batch).
`with_vad`	Create ASR adapter with VAD.

Source code in src/onnx_asr/adapters.py

def __init__(self, asr: Asr, resampler: Resampler):
    """Create ASR adapter."""
    self.asr = asr
    self.resampler = resampler

recognize ¶

recognize(waveform: str | Path | NDArray[float32], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> R

recognize(waveform: list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> list[R]

recognize(waveform: str | Path | NDArray[float32] | list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, language: str | None = ..., target_language: str | None = ..., pnc: Literal['pnc', 'nopnc'] | bool = ...) -> R | list[R]

Recognize speech (single or batch).

Parameters:

Name	Type	Description	Default
`waveform` ¶	`str \| Path \| NDArray[float32] \| list[str \| Path \| NDArray[float32]]`	Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported) or Numpy array with PCM waveform. A list of file paths or numpy arrays for batch recognition are also supported.	required
`sample_rate` ¶	`SampleRates`	Sample rate for Numpy arrays in waveform.	`16000`
`language` ¶	`str \| None`	Speech language (only for Whisper and Canary models).	`...`
`target_language` ¶	`str \| None`	Output language (only for Canary models).	`...`
`pnc` ¶	`Literal['pnc', 'nopnc'] \| bool`	Output punctuation and capitalization (only for Canary models).	`...`

Returns:

Type	Description
`R \| list[R]`	Speech recognition results (single or list for batch recognition).

Raises:

Type	Description
`AudioLoadingError`	Audio loading error (onnx-asr specific).
`FileNotFoundError`	File not found error.
`Error`	WAV file reading error.
`OSError`	Other IO errors.

Source code in src/onnx_asr/adapters.py

def recognize(
    self,
    waveform: str | Path | npt.NDArray[np.float32] | list[str | Path | npt.NDArray[np.float32]],
    *,
    sample_rate: SampleRates = 16_000,
    **kwargs: Unpack[RecognizeOptions],
) -> R | list[R]:
    """Recognize speech (single or batch).

    Args:
        waveform: Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported)
                  or Numpy array with PCM waveform.
                  A list of file paths or numpy arrays for batch recognition are also supported.
        sample_rate: Sample rate for Numpy arrays in waveform.
        **kwargs: ASR options.

    Returns:
        Speech recognition results (single or list for batch recognition).

    Raises:
        utils.AudioLoadingError: Audio loading error (onnx-asr specific).
        FileNotFoundError: File not found error.
        wave.Error: WAV file reading error.
        OSError: Other IO errors.

    """
    if isinstance(waveform, list) and not waveform:
        return []

    waveform_batch = waveform if isinstance(waveform, list) else [waveform]
    result = self._recognize_batch(*self.resampler(*read_wav_files(waveform_batch, sample_rate)), **kwargs)

    if isinstance(waveform, list):
        return list(result)
    return next(result)

with_vad ¶

with_vad(vad: Vad, *, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...) -> SegmentResultsAsrAdapter

Create ASR adapter with VAD.

Parameters:

Name	Type	Description	Default
`vad` ¶	`Vad`	VAD model.	required
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

Returns:

Type	Description
`SegmentResultsAsrAdapter`	ASR with VAD adapter (text results).

Source code in src/onnx_asr/adapters.py

def with_vad(self, vad: Vad, **kwargs: Unpack[VadOptions]) -> SegmentResultsAsrAdapter:
    """Create ASR adapter with VAD.

    Args:
        vad: VAD model.
        **kwargs: VAD options.

    Returns:
        ASR with VAD adapter (text results).

    """
    return SegmentResultsAsrAdapter(self.asr, vad, self.resampler, **kwargs)

TimestampedSegmentResultsAsrAdapter ¶

TimestampedSegmentResultsAsrAdapter(asr: Asr, vad: Vad, resampler: Resampler, *, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...)

Bases: AsrAdapter[Iterator[TimestampedSegmentResult]]

ASR with VAD adapter (timestamped results).

Create ASR adapter.

Parameters:

Name	Type	Description	Default
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

Methods:

Name	Description
`recognize`	Recognize speech (single or batch).
`with_vad`	Create ASR adapter with VAD.

Source code in src/onnx_asr/adapters.py

def __init__(self, asr: Asr, vad: Vad, resampler: Resampler, **kwargs: Unpack[VadOptions]):
    """Create ASR adapter."""
    super().__init__(asr, resampler)
    self.vad = vad
    self._vadargs = kwargs

recognize ¶

recognize(waveform: str | Path | NDArray[float32], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> R

recognize(waveform: list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, **kwargs: Unpack[RecognizeOptions]) -> list[R]

recognize(waveform: str | Path | NDArray[float32] | list[str | Path | NDArray[float32]], *, sample_rate: SampleRates = 16000, language: str | None = ..., target_language: str | None = ..., pnc: Literal['pnc', 'nopnc'] | bool = ...) -> R | list[R]

Recognize speech (single or batch).

Parameters:

Name	Type	Description	Default
`waveform` ¶	`str \| Path \| NDArray[float32] \| list[str \| Path \| NDArray[float32]]`	Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported) or Numpy array with PCM waveform. A list of file paths or numpy arrays for batch recognition are also supported.	required
`sample_rate` ¶	`SampleRates`	Sample rate for Numpy arrays in waveform.	`16000`
`language` ¶	`str \| None`	Speech language (only for Whisper and Canary models).	`...`
`target_language` ¶	`str \| None`	Output language (only for Canary models).	`...`
`pnc` ¶	`Literal['pnc', 'nopnc'] \| bool`	Output punctuation and capitalization (only for Canary models).	`...`

Returns:

Type	Description
`R \| list[R]`	Speech recognition results (single or list for batch recognition).

Raises:

Type	Description
`AudioLoadingError`	Audio loading error (onnx-asr specific).
`FileNotFoundError`	File not found error.
`Error`	WAV file reading error.
`OSError`	Other IO errors.

Source code in src/onnx_asr/adapters.py

def recognize(
    self,
    waveform: str | Path | npt.NDArray[np.float32] | list[str | Path | npt.NDArray[np.float32]],
    *,
    sample_rate: SampleRates = 16_000,
    **kwargs: Unpack[RecognizeOptions],
) -> R | list[R]:
    """Recognize speech (single or batch).

    Args:
        waveform: Path to wav file (only PCM_U8, PCM_16, PCM_24 and PCM_32 formats are supported)
                  or Numpy array with PCM waveform.
                  A list of file paths or numpy arrays for batch recognition are also supported.
        sample_rate: Sample rate for Numpy arrays in waveform.
        **kwargs: ASR options.

    Returns:
        Speech recognition results (single or list for batch recognition).

    Raises:
        utils.AudioLoadingError: Audio loading error (onnx-asr specific).
        FileNotFoundError: File not found error.
        wave.Error: WAV file reading error.
        OSError: Other IO errors.

    """
    if isinstance(waveform, list) and not waveform:
        return []

    waveform_batch = waveform if isinstance(waveform, list) else [waveform]
    result = self._recognize_batch(*self.resampler(*read_wav_files(waveform_batch, sample_rate)), **kwargs)

    if isinstance(waveform, list):
        return list(result)
    return next(result)

with_vad ¶

with_vad(vad: Vad, *, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...) -> SegmentResultsAsrAdapter

Create ASR adapter with VAD.

Parameters:

Name	Type	Description	Default
`vad` ¶	`Vad`	VAD model.	required
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

Returns:

Type	Description
`SegmentResultsAsrAdapter`	ASR with VAD adapter (text results).

Source code in src/onnx_asr/adapters.py

def with_vad(self, vad: Vad, **kwargs: Unpack[VadOptions]) -> SegmentResultsAsrAdapter:
    """Create ASR adapter with VAD.

    Args:
        vad: VAD model.
        **kwargs: VAD options.

    Returns:
        ASR with VAD adapter (text results).

    """
    return SegmentResultsAsrAdapter(self.asr, vad, self.resampler, **kwargs)

VadOptions `typed-dict` ¶

VadOptions(*, batch_size: int = ..., threshold: float = ..., neg_threshold: float = ..., min_speech_duration_ms: float = ..., max_speech_duration_s: float = ..., min_silence_duration_ms: float = ..., speech_pad_ms: float = ...)

Bases: TypedDict

Options for VAD.

Parameters:

Name	Type	Description	Default
`batch_size` ¶	`int`	Number of parallel processed segments.	`...`
`threshold` ¶	`float`	Speech detection threshold.	`...`
`neg_threshold` ¶	`float`	Non-speech detection threshold.	`...`
`min_speech_duration_ms` ¶	`float`	Minimum speech segment duration in milliseconds.	`...`
`max_speech_duration_s` ¶	`float`	Maximum speech segment duration in seconds.	`...`
`min_silence_duration_ms` ¶	`float`	Minimum silence duration in milliseconds to split speech segments.	`...`
`speech_pad_ms` ¶	`float`	Padding for speech segments in milliseconds.	`...`

TimestampedResult `dataclass` ¶

TimestampedResult(text: str, timestamps: list[float] | None = None, tokens: list[str] | None = None, logprobs: list[float] | None = None)

Timestamped recognition result.

Attributes:

Name	Type	Description
`logprobs`	`list[float] \| None`	Tokens logprob list.
`text`	`str`	Recognized text.
`timestamps`	`list[float] \| None`	Tokens timestamp list.
`tokens`	`list[str] \| None`	Tokens list.

logprobs `class-attribute` `instance-attribute` ¶

logprobs: list[float] | None = None

Tokens logprob list.

text `instance-attribute` ¶

text: str

Recognized text.

timestamps `class-attribute` `instance-attribute` ¶

timestamps: list[float] | None = None

Tokens timestamp list.

tokens `class-attribute` `instance-attribute` ¶

tokens: list[str] | None = None

Tokens list.

SegmentResult `dataclass` ¶

SegmentResult(start: float, end: float, text: str)

Segment recognition result.

Attributes:

Name	Type	Description
`end`	`float`	Segment end time.
`start`	`float`	Segment start time.
`text`	`str`	Segment recognized text.

end `instance-attribute` ¶

end: float

Segment end time.

start `instance-attribute` ¶

start: float

Segment start time.

text `instance-attribute` ¶

text: str

Segment recognized text.

TimestampedSegmentResult `dataclass` ¶

TimestampedSegmentResult(start: float, end: float, text: str, timestamps: list[float] | None = None, tokens: list[str] | None = None, logprobs: list[float] | None = None)

Bases: TimestampedResult, SegmentResult

Timestamped segment recognition result.

Attributes:

Name	Type	Description
`end`	`float`	Segment end time.
`logprobs`	`list[float] \| None`	Tokens logprob list.
`start`	`float`	Segment start time.
`text`	`str`	Recognized text.
`timestamps`	`list[float] \| None`	Tokens timestamp list.
`tokens`	`list[str] \| None`	Tokens list.

end `instance-attribute` ¶

end: float

Segment end time.

logprobs `class-attribute` `instance-attribute` ¶

logprobs: list[float] | None = None

Tokens logprob list.

start `instance-attribute` ¶

start: float

Segment start time.

text `instance-attribute` ¶

text: str

Recognized text.

timestamps `class-attribute` `instance-attribute` ¶

timestamps: list[float] | None = None

Tokens timestamp list.

tokens `class-attribute` `instance-attribute` ¶

tokens: list[str] | None = None

Tokens list.

SampleRates `module-attribute` ¶

SampleRates = Literal[8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000]

Supported sample rates.

AudioLoadingError ¶

Bases: ValueError

Audio loading error.

API Reference¶

onnx_asr ¶

load_model ¶

model ¶

path ¶

quantization ¶

sess_options ¶

providers ¶

provider_options ¶

cpu_preprocessing ¶

asr_config ¶

preprocessor_config ¶

resampler_config ¶

load_vad ¶

model ¶

path ¶

quantization ¶

sess_options ¶

providers ¶

provider_options ¶

ModelNames module-attribute ¶

ModelTypes module-attribute ¶

VadNames module-attribute ¶

OnnxSessionOptions typed-dict ¶

sess_options ¶

providers ¶

provider_options ¶

PreprocessorRuntimeConfig ¶

sess_options ¶

providers ¶

provider_options ¶

max_concurrent_workers instance-attribute ¶

TensorRtOptions ¶

profile_max_shapes class-attribute ¶

profile_min_shapes class-attribute ¶

profile_opt_shapes class-attribute ¶

add_profile classmethod ¶

get_provider_names staticmethod ¶

is_fp16_enabled staticmethod ¶

ModelLoadingError ¶

adapters ¶

AsrAdapter ¶

recognize ¶

waveform ¶

sample_rate ¶

language ¶

target_language ¶

pnc ¶

with_vad ¶

vad ¶

batch_size ¶

threshold ¶

neg_threshold ¶

min_speech_duration_ms ¶

max_speech_duration_s ¶

min_silence_duration_ms ¶

speech_pad_ms ¶

RecognizeOptions typed-dict ¶

language ¶

target_language ¶

pnc ¶

SegmentResultsAsrAdapter ¶

batch_size ¶

threshold ¶

neg_threshold ¶

min_speech_duration_ms ¶

max_speech_duration_s ¶

min_silence_duration_ms ¶

speech_pad_ms ¶

recognize ¶

waveform ¶

sample_rate ¶

language ¶

target_language ¶

pnc ¶

with_timestamps ¶

with_vad ¶

vad ¶

batch_size ¶

threshold ¶

`model` ¶

`path` ¶

`quantization` ¶

`sess_options` ¶

`providers` ¶

`provider_options` ¶

`cpu_preprocessing` ¶

`asr_config` ¶

`preprocessor_config` ¶

`resampler_config` ¶

`model` ¶

`path` ¶

`quantization` ¶

`sess_options` ¶

`providers` ¶

`provider_options` ¶

ModelNames `module-attribute` ¶

ModelTypes `module-attribute` ¶

VadNames `module-attribute` ¶

OnnxSessionOptions `typed-dict` ¶

`sess_options` ¶

`providers` ¶

`provider_options` ¶

`sess_options` ¶

`providers` ¶

`provider_options` ¶

max_concurrent_workers `instance-attribute` ¶

profile_max_shapes `class-attribute` ¶

profile_min_shapes `class-attribute` ¶

profile_opt_shapes `class-attribute` ¶

add_profile `classmethod` ¶

get_provider_names `staticmethod` ¶

is_fp16_enabled `staticmethod` ¶

`waveform` ¶

`sample_rate` ¶

`language` ¶

`target_language` ¶

`pnc` ¶

`vad` ¶

`batch_size` ¶

`threshold` ¶

`neg_threshold` ¶

`min_speech_duration_ms` ¶

`max_speech_duration_s` ¶

`min_silence_duration_ms` ¶

`speech_pad_ms` ¶

RecognizeOptions `typed-dict` ¶

`language` ¶

`target_language` ¶

`pnc` ¶

`batch_size` ¶

`threshold` ¶

`neg_threshold` ¶

`min_speech_duration_ms` ¶

`max_speech_duration_s` ¶

`min_silence_duration_ms` ¶

`speech_pad_ms` ¶

`waveform` ¶

`sample_rate` ¶

`language` ¶

`target_language` ¶

`pnc` ¶

`vad` ¶

`batch_size` ¶

`threshold` ¶

`neg_threshold` ¶

`min_speech_duration_ms` ¶

`max_speech_duration_s` ¶

`min_silence_duration_ms` ¶

`speech_pad_ms` ¶

`waveform` ¶

`sample_rate` ¶

`language` ¶

`target_language` ¶

`pnc` ¶

`vad` ¶

`batch_size` ¶

`threshold` ¶

`neg_threshold` ¶

`min_speech_duration_ms` ¶