Awesome Open Source

Programming Languages

Search results for speech processing

speech-processing x

192 search results found

Speechbrain ⭐ 7,166

A PyTorch-based Speech Toolkit

Awesome Multimodal Ml ⭐ 5,399

Reading list for research topics in multimodal machine learning

Pyannote Audio ⭐ 4,460

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Torchscale ⭐ 2,804

Foundation Architecture for (M)LLMs

Deepvoice3_pytorch ⭐ 1,906

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Wavenet_vocoder ⭐ 1,617

WaveNet vocoder

Awesome Diarization ⭐ 1,384

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Whisper Timestamped ⭐ 1,217

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Open source audio annotation tool for humans

Parselmouth ⭐ 961

Praat in Python, the Pythonic way

Open Speech Corpora ⭐ 830

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Speechpy ⭐ 828

💬 SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

Sincnet ⭐ 764

SincNet is a neural architecture for efficiently processing raw audio samples.

Voicefixer ⭐ 735

General Speech Restoration

Speechalgorithms ⭐ 625

Speech Algorithms

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

Uspeech ⭐ 452

Speech recognition toolkit for the arduino

Fullsubnet ⭐ 443

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Resemble Enhance ⭐ 438

AI powered speech denoising and enhancement

Speech Backbones ⭐ 429

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Ims Toucan ⭐ 426

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Speech Denoising Wavenet ⭐ 414

A neural network for end-to-end speech denoising

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Speech Resources ⭐ 388

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

Neural Voice Cloning With Few Samples ⭐ 379

This repository has implementation for "Neural Voice Cloning With Few Samples"

Nnmnkwii ⭐ 375

Library to build speech synthesis systems designed for easy and fast prototyping.

Surfboard ⭐ 369

Novoic's audio feature extraction library

Multibench ⭐ 356

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

🔉 spafe: Simplified Python Audio Features Extraction

Pyannote Video ⭐ 328

Face detection, tracking and clustering in videos

Unispeech ⭐ 328

UniSpeech - Large Scale Self-Supervised Learning for Speech

Nonautoreggenprogress ⭐ 290

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Problem Agnostic Speech Encoder

Voicefixer_main ⭐ 244

General Speech Restoration

Collection of EM algorithms for blind source separation of audio signals

Cleanunet ⭐ 231

Official PyTorch Implementation of CleanUNet (ICASSP 2022)

Speechtransprogress ⭐ 218

Tracking the progress in end-to-end speech translation

Neural Voice Cloning With Few Samples ⭐ 211

Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu

Speech_signal_processing_and_classification ⭐ 203

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of th

A suite of speech signal processing tools

Ttslearn ⭐ 197

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Wave U Net For Speech Enhancement ⭐ 184

Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.

Gcc Nmf ⭐ 179

Real-time GCC-NMF Blind Speech Separation and Enhancement

Audio Development Tools ⭐ 165

This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.

React Native Dialogflow ⭐ 164

A React-Native Bridge for the Google Dialogflow (API.AI) SDK

Runtimespeechrecognizer ⭐ 153

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Awesome Speech Enhancement ⭐ 151

A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.

Zzz Retired__openstt ⭐ 146

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

Speech Enhancement ⭐ 145

Deep neural network based speech enhancement toolkit

Vq Vae Speech ⭐ 145

PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]

Soundsourceseparation ⭐ 134

The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.

Voicelab ⭐ 127

Automated Reproducible Acoustical Analysis

Elevateaijavasdk ⭐ 121

Java SDK for ElevateAI

Tutorial_separation ⭐ 117

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Elevateaidotnetsdk ⭐ 115

.Net core 6 SDK for ElevateAI

Mevonai Speech Emotion Recognition ⭐ 112

Identify the emotion of multiple speakers in an Audio Segment

Elevateaipythonsdk ⭐ 111

ElevateAI - Speech-to-text API Python SDK

Tfg Voice Conversion ⭐ 109

Deep Learning-based Voice Conversion system

Awesome Keyword Spotting ⭐ 107

This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).

Awesome Speech Translation ⭐ 98

Whisper Auto Transcribe ⭐ 91

Auto transcribe tool based on whisper

Uhv Ots Speech ⭐ 90

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Vokaturi Android ⭐ 83

Emotion recognition by speech in android.

Speechclip ⭐ 80

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022

Speechprompt ⭐ 80

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Soundstorm Pytorch ⭐ 79

Google's SoundStorm: Efficient Parallel Audio Generation

A Convolutional Recurrent Neural Network For Real Time Speech Enhancement ⭐ 79

A minimum unofficial implementation of the A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement (CRN) using PyTorch.

A modified version of Speech Signal Processing Toolkit (SPTK)

Quantumspeech Qcnn ⭐ 75

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Awesome Spoken Language Identification ⭐ 74

An awesome spoken LID repository. (Working in progress

Discriminative Neural Clustering for Speaker Diarisation

Time delay neural network (TDNN) implementation in Pytorch using unfold method

A neural network framework for researchers studying acoustic communication

Voice Activity Detector

Nlp Guide ⭐ 61

Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

Speechprompt V2 ⭐ 59

《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm

Discordearsbot ⭐ 56

A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.

Voice2series Reprogramming ⭐ 55

ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification

maracas is a library for corrupting audio files with additive and convolutive noise.

Gcommandspytorch ⭐ 54

ConvNets for Audio Recognition using Google Commands Dataset

a Wide Shelf for AI and Data Science | Resources 🍔

Torchsubband ⭐ 51

Pytorch implementation of subband decomposition

Keras Sincnet ⭐ 49

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

Voice Privacy Challenge 2020 ⭐ 47

Baseline Recipe for VoicePrivacy Challenge 2020: https://www.voiceprivacychallenge.org/vp2020/docs/

React Native Spokestack ⭐ 46

Spokestack: give your React Native app a voice interface!

Voice Privacy Challenge 2022 ⭐ 45

Baseline Recipe for VoicePrivacy Challenge 2022: anonymization systems and evaluation software

Formant Analyzer ⭐ 45

iOS application for finding formants in spoken sounds

Simpleder ⭐ 44

A lightweight library to compute Diarization Error Rate (DER).

Awesome Asr Contextualization ⭐ 42

A curated list of awesome papers on contextualizing E2E ASR outputs

Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob

Clarity_cc ⭐ 38

Clarity Enhancement and Prediction Challenges

Awesome Speech Emotion Recognition ⭐ 36

😎 Awesome lists about Speech Emotion Recognition

Wavencoder ⭐ 36

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Paderbox ⭐ 35

Paderbox: A collection of utilities for audio / speech processing

Speech2affective_gestures ⭐ 35

This is the official implementation of the paper "Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning".

Asr_course ⭐ 34

ASR course at Chula 2018

A framework for automatic speech recognition

A implementation of Power Normalized Cepstral Coefficients: PNCC

1-100 of 192 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.