Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Stable Diffusion Webui | 51,738 | 19 hours ago | 1,667 | agpl-3.0 | Python | |||||
Stable Diffusion web UI | ||||||||||
Faceswap | 43,824 | 22 days ago | 15 | gpl-3.0 | Python | |||||
Deepfakes Software For All | ||||||||||
Mockingbird | 26,942 | 2 | 16 days ago | 9 | February 28, 2022 | 379 | other | Python | ||
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time | ||||||||||
Machine Learning For Software Engineers | 26,596 | a month ago | 22 | cc-by-sa-4.0 | ||||||
A complete daily plan for studying to become a machine learning engineer. | ||||||||||
Spacy | 25,566 | 1,533 | 842 | 20 hours ago | 196 | April 05, 2022 | 109 | mit | Python | |
💫 Industrial-strength Natural Language Processing (NLP) in Python | ||||||||||
Ai Expert Roadmap | 24,033 | 25 days ago | 13 | mit | JavaScript | |||||
Roadmap to becoming an Artificial Intelligence Expert in 2022 | ||||||||||
Lightning | 22,033 | 7 | 389 | 19 hours ago | 221 | June 01, 2022 | 664 | apache-2.0 | Python | |
Deep learning framework to train, deploy, and ship AI products Lightning fast. | ||||||||||
Netron | 21,696 | 4 | 63 | a day ago | 489 | July 04, 2022 | 22 | mit | JavaScript | |
Visualizer for neural network, deep learning, and machine learning models | ||||||||||
Mediapipe | 20,999 | 94 | 18 hours ago | 24 | June 28, 2022 | 498 | apache-2.0 | C++ | ||
Cross-platform, customizable ML solutions for live and streaming media. | ||||||||||
Colossalai | 19,195 | 20 hours ago | 301 | apache-2.0 | Python | |||||
Making large AI models cheaper, faster and more accessible |
Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc.
PyTorch worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060
Windows + Linux run in both Windows OS and linux OS (even in M1 MACOS)
Easy & Awesome effect with only newly-trained synthesizer, by reusing the pretrained encoder/vocoder
Webserver Ready to serve your result with remote calling
./mkgui
and tech design
[X] Add demo part of Voice Cloning and Conversion
[X] Add preprocessing and training for Voice Conversion
[ ] Add preprocessing and training for Encoder/Synthesizer/VocoderFollow the original repo to test if you got all environment ready. **Python 3.7 or higher ** is needed to run the toolbox.
If you get an
ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu102 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2 )
This error is probably due to a low version of python, try using 3.9 and it will install successfully
pip install -r requirements.txt
to install the remaining necessary packages.pip install webrtcvad-wheels
(If you need)The following steps are a workaround to directly use the original
demo_toolbox.py
without the changing of codes.Since the major issue comes with the PyQt5 packages used in
demo_toolbox.py
not compatible with M1 chips, were one to attempt on training models with the M1 chip, either that person can forgodemo_toolbox.py
, or one can try theweb.py
in the project.
PyQt5
, with ref here./usr/bin/python3 -m venv /PathToMockingBird/venv
source /PathToMockingBird/venv/bin/activate
PyQt5
pip install --upgrade pip
pip install pyqt5
pyworld
and ctc-segmentation
Both packages seem to be unique to this project and are not seen in the original Real-Time Voice Cloning project. When installing with
pip install
, both packages lack wheels so the program tries to directly compile from c code and could not findPython.h
.
Install pyworld
brew install python
Python.h
can come with Python installed by brewexport CPLUS_INCLUDE_PATH=/opt/homebrew/Frameworks/Python.framework/Headers
The filepath of brew-installed Python.h
is unique to M1 MacOS and listed above. One needs to manually add the path to the environment variables.pip install pyworld
that should do.Installctc-segmentation
Same method does not apply to
ctc-segmentation
, and one needs to compile it from the source code on github.
git clone https://github.com/lumaku/ctc-segmentation.git
cd ctc-segmentation
source /PathToMockingBird/venv/bin/activate
If the virtual environment hasn't been deployed, activate it.cythonize -3 ctc_segmentation/ctc_segmentation_dyn.pyx
/usr/bin/arch -x86_64 python setup.py build
Build with x86 architecture./usr/bin/arch -x86_64 python setup.py install --optimize=1 --skip-build
Install with x86 architecture./usr/bin/arch -x86_64 pip install torch torchvision torchaudio
Pip installing PyTorch
as an example, articulate that it's installed with x86 architecturepip install ffmpeg
Install ffmpegpip install -r requirements.txt
Install other requirements.To run the project on x86 architecture. ref.
vim /PathToMockingBird/venv/bin/pythonM1
Create an executable file pythonM1
to condition python interpreter at /PathToMockingBird/venv/bin
.#!/usr/bin/env zsh
mydir=${0🅰️h}
/usr/bin/arch -x86_64 $mydir/python "[email protected]"
chmod +x pythonM1
Set the file as executable.pythonM1
(steps here), if using command line python, run /PathToMockingBird/venv/bin/pythonM1 demo_toolbox.py
Note that we are using the pretrained encoder/vocoder but not synthesizer, since the original model is incompatible with the Chinese symbols. It means the demo_cli is not working at this moment, so additional synthesizer models are required.
You can either train your models or use existing ones:
Preprocess with the audios and the mel spectrograms:
python encoder_preprocess.py <datasets_root>
Allowing parameter --dataset {dataset}
to support the datasets you want to preprocess. Only the train set of these datasets will be used. Possible names: librispeech_other, voxceleb1, voxceleb2. Use comma to sperate multiple datasets.
Train the encoder: python encoder_train.py my_run <datasets_root>/SV2TTS/encoder
For training, the encoder uses visdom. You can disable it with
--no_visdom
, but it's nice to have. Run "visdom" in a separate CLI/process to start your visdom server.
Download dataset and unzip: make sure you can access all .wav in folder
Preprocess with the audios and the mel spectrograms:
python pre.py <datasets_root>
Allowing parameter --dataset {dataset}
to support aidatatang_200zh, magicdata, aishell3, data_aishell, etc.If this parameter is not passed, the default dataset will be aidatatang_200zh.
Train the synthesizer:
python synthesizer_train.py mandarin <datasets_root>/SV2TTS/synthesizer
Go to next step when you see attention line show and loss meet your need in training folder synthesizer/saved_models/.
Thanks to the community, some models will be shared:
author | Download link | Preview Video | Info |
---|---|---|---|
@author | https://pan.baidu.com/s/1iONvRxmkI-t1nHqxKytY3g Baidu 4j5d | 75k steps trained by multiple datasets | |
@author | https://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw Baidu codeom7f | 25k steps trained by multiple datasets, only works under version 0.0.1 | |
@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing https://u.teknik.io/AYxWf.pt | input output | 200k steps with local accent of Taiwan, only works under version 0.0.1 |
@miven | https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code: 2021 https://www.aliyundrive.com/s/AwPsbo8mcSP code: z2m0 | https://www.bilibili.com/video/BV1uh411B7AD/ | only works under version 0.0.1 |
note: vocoder has little difference in effect, so you may not need to train a new one.
python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>
<datasets_root>
replace with your dataset root<synthesizer_model_path>
replace with directory of your best trained models of sythensizer, e.g. sythensizer\saved_mode\xxx
Train the wavernn vocoder:
python vocoder_train.py mandarin <datasets_root>
Train the hifigan vocoder
python vocoder_train.py mandarin <datasets_root> hifigan
You can then try to run:python web.py
and open it in browser, default as http://localhost:8080
You can then try the toolbox:
python demo_toolbox.py -d <datasets_root>
You can then try the command:
python gen_voice.py <text_file.txt> your_wav_file.wav
you may need to install cn2an by "pip install cn2an" for better digital number result.
This repository is forked from Real-Time-Voice-Cloning which only support English.
URL | Designation | Title | Implementation source |
---|---|---|---|
1803.09017 | GlobalStyleToken (synthesizer) | Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis | This repo |
2010.05646 | HiFi-GAN (vocoder) | Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | This repo |
2106.02297 | Fre-GAN (vocoder) | Fre-GAN: Adversarial Frequency-consistent Audio Synthesis | This repo |
1806.04558 | SV2TTS | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | This repo |
1802.08435 | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | fatchord/WaveRNN |
1703.10135 | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | fatchord/WaveRNN |
1710.10467 | GE2E (encoder) | Generalized End-To-End Loss for Speaker Verification | This repo |
Dataset | Original Source | Alternative Sources |
---|---|---|
aidatatang_200zh | OpenSLR | Google Drive |
magicdata | OpenSLR | Google Drive (Dev set) |
aishell3 | OpenSLR | Google Drive |
data_aishell | OpenSLR |
After unzip aidatatang_200zh, you need to unzip all the files under
aidatatang_200zh\corpus\train
<datasets_root>
?If the dataset path is D:\data\aidatatang_200zh
,then <datasets_root>
isD:\data
Train the synthesizeradjust the batch_size in synthesizer/hparams.py
//Before
tts_schedule = [(2, 1e-3, 20_000, 12), # Progressive training schedule
(2, 5e-4, 40_000, 12), # (r, lr, step, batch_size)
(2, 2e-4, 80_000, 12), #
(2, 1e-4, 160_000, 12), # r = reduction factor (# of mel frames
(2, 3e-5, 320_000, 12), # synthesized for each decoder iteration)
(2, 1e-5, 640_000, 12)], # lr = learning rate
//After
tts_schedule = [(2, 1e-3, 20_000, 8), # Progressive training schedule
(2, 5e-4, 40_000, 8), # (r, lr, step, batch_size)
(2, 2e-4, 80_000, 8), #
(2, 1e-4, 160_000, 8), # r = reduction factor (# of mel frames
(2, 3e-5, 320_000, 8), # synthesized for each decoder iteration)
(2, 1e-5, 640_000, 8)], # lr = learning rate
Train Vocoder-Preprocess the dataadjust the batch_size in synthesizer/hparams.py
//Before
### Data Preprocessing
max_mel_frames = 900,
rescale = True,
rescaling_max = 0.9,
synthesis_batch_size = 16, # For vocoder preprocessing and inference.
//After
### Data Preprocessing
max_mel_frames = 900,
rescale = True,
rescaling_max = 0.9,
synthesis_batch_size = 8, # For vocoder preprocessing and inference.
Train Vocoder-Train the vocoderadjust the batch_size in vocoder/wavernn/hparams.py
//Before
# Training
voc_batch_size = 100
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad = 2
//After
# Training
voc_batch_size = 6
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad =2
RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).
Please refer to issue #37
Adjust the batch_size as appropriate to improve
the page file is too small to complete the operation
Please refer to this video and change the virtual memory to 100G (102400), for example : When the file is placed in the D disk, the virtual memory of the D disk is changed.
FYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps.