Metadata-Version: 2.1
Name: crank-vc
Version: 0.4.1
Summary: Non-parallel Voice Conversion called crank
Home-page: https://github.com/k2kobayashi/crank
Author: K. KOBAYASHI
License: MIT
Description: # crank
        
        Non-parallel voice conversion based on vector-quantized variational autoencoder with adversarial learning
        
        ## Setup
        
        - Install Python dependency
        
        ```sh
        $ git clone https://github.com/k2kobayashi/crank.git
        $ cd crank/tools
        $ make
        ```
        
        - install dependency for mosnet
        
        ```sh
        $ sudo apt install ffmpeg   # mosnet dependency
        ```
        
        ## Recipes
        - English
            - VCC2020
            - VCC2018 (Thanks to [@unilight](https://github.com/unilight))
        - Japanese
            - jsv_ver1
        
        ### Conversion samples
        You can access several converted audio samples of VCC 2018 dataset in the [URL](https://k2kobayashi.github.io/crankSamples/).
        - [vcc2020v1](https://drive.google.com/file/d/1uInvCwggpBYmpplYxuIOidvJkPmav8kE/view?usp=sharing)
        - [vcc2018v1](https://drive.google.com/file/d/1-Z_Y9pahPQcKR0rqdhu4elI6Hz686qX6/view?usp=sharing)
        
        ## Run VCC2020 recipe
        
        crank has prepared recipe for Voice Conversion Challenge 2020.
        In crank recipe, there are 7 stages to implement non-parallel voice conversion.
        
        - stage 0
            - download dataset
        - stage 1
            - initialization
                - generate scp files and figures to be determine speaker-dependent parameters
        - stage 2
            - feature extraction
                - extract mlfb and mcep features
        - stage 3
            - training
        - stage 4
            - reconstuction
                - generate reconstructed feature for fine-tuning of neural vocoder
        - stage 5
            - evaluation
                - convert evaluation waveform
        - stage 6
            - synthesis
                - synthesis waveform by pre-trained [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)
                - synthesis waveform by GriffinLim
        - stage 7
            - objective evalution
                - mel-cepstrum distortion
                - mosnet
        
        ### Put dataset to downloads
        
        Note that dataset is only released for the participants (2020/05/26).
        ```
        $ cd egs/vaevc/vcc2020v1
        $ mkdir downloads && cd downloads
        $ mv <path_to_zip>/vcc2020_{training,evaluation}.zip downloads
        $ unzip vcc2020_training.zip
        $ unzip vcc2020_evaluation.zip
        ```
        
        ### Run feature extraction and model training
        
        Because the challenge defines its training and evaluation set, we have initially put configuration files.
        So, you need to run from 2nd stage.
        
        ```sh
        $ ./run.sh --n_jobs 10 --stage 2 --stop_stage 5
        ```
        
        where the ```n_jobs``` indicates the number of CPU cores used in the training.
        
        
        ## Configuration
        Configurations are defined in ```conf/mlfb_vqvae.yml```.
        Followings are explanation of representative parameters.
        
        - feature
        
        When you create your own recipe, be carefull to set parameters for feature extraction such as ```fs```, ```fftl```, ```hop_size```, ```framems```, ```shiftms```, and ```mcep_alpha```. These parameters depend on sampling frequency.
        
        - feat_type
        
        You can choose ```feat_type``` either ```mlfb``` or ```mcep```.
        If you choose ```mlfb```, the converted waveforms are generated by either GllifinLim vocoder or ParallelWaveGAN vocoder.
        If you choose ```mcep```, the converted waveforms are generated by world vocoder (i.e., excitation generation and MLSA filtering).
        
        - trainer_type
        
        We support training with ```vqvae```, ```lsgan```, ```cyclegan```, and ```stargan``` using same generator network.
          - ```vqvae```: default vqvae setting
          - ```lsgan```: vqvae with adversarial learning
          - ```cyclegan```: vqvae with adevesarial learning and cyclic constraints
          - ```stargan```: vqvae with adevesarial learning similar to cyclegan
        
        ## Create your recipe
        
        ### Copy recipe template
        
        Please copy template directory to start creation of your recipe.
        
        ```sh
        $ cp -r egs/vaevc/template egs/vaevc/<new_recipe>
        $ cd egs/vaevc/<new_recipe>
        ```
        
        ### Put .wav files
        
        You need to put wav files appropriate directory.
        You can choose either modifying ```download.sh``` or putting wav files.
        In either case, the wav files should be located in each speaker like following
        ```<new_recipe>/downloads/wav/{spkr1, spkr2, ..., spkr3}/*.wav```.
        
        If you modify ```downaload.sh```,
        
        ```sh
        $ vim local/download.sh
        ```
        
        If you put wav files,
        
        ```sh
        $ mkdir downloads
        $ mv <path_to_your_wav_directory> downloads/wav
        $ touch downloads/.done
        ```
        
        ### Run initialization
        
        The initialization process generates kaldi-like scp files.
        
        ```sh
        $ ./run.sh --stage 0 --stop_stage 1
        ```
        
        Then you modify speaker-dependent parameters in ```conf/spkr.yml``` using generated figures.
        Page 20~22 in [slide](https://www.slideshare.net/NU_I_TODALAB/hands-on-voice-conversion) help you how to set these parameters.
        
        
        ### Run feature extraction, train, reconstruction, and evaluation
        
        After preparing configuration, you run it.
        
        ```sh
        $ ./run.sh --stage 2 --stop_stage 7
        ```
        
        ## Citation
        
        Please cite this paper when you use crank.
        
        ```
        K. Kobayashi, W-C. Huang, Y-C. Wu, P.L. Tobing, T. Hayashi, T. Toda,
        "crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder",
        Proc. ICASSP, 2021. (accepted)
        ```
        
        ## Achknowledgements
        
        Thank you [@kan-bayashi](https://github.com/kan-bayashi) for lots of contributions and encouragement helps.
        
        ## Who we are
        
        - Kazuhiro Kobayashi [@k2kobayashi](https://github.com/k2kobayashi) [maintainer, design and development]
        
        - Wen-Chin Huang [@unilight](https://github.com/unilight) [maintainer, design and development]
        
        - [Tomoki Toda](https://sites.google.com/site/tomokitoda/) [advisor]
        
        
Platform: UNKNOWN
Description-Content-Type: text/markdown
