Metadata-Version: 2.1
Name: zga
Version: 0.0.7b2
Summary: Prokaryotic genome assembly and annotation pipeline
Home-page: https://github.com/laxeye/zga
Author: Aleksei Korzhenkov
Author-email: oscypek@ya.ru
License: UNKNOWN
Description: # ZGA - prokaryotic genome assembly and annotation pipeline
        
        [![version status](https://img.shields.io/pypi/v/zga.svg)](https://pypi.python.org/pypi/zga)
        [![Anaconda Cloud](https://anaconda.org/laxeye/zga/badges/installer/conda.svg)](https://anaconda.org/laxeye/zga/)
        
        ## Installation
        
        ZGA is written in Python and tested with Python 3.6 and Python 3.7. ZGA uses several software and libraries including:
        
        * [fastp](https://github.com/OpenGene/fastp)
        * [BBmap](https://sourceforge.net/projects/bbmap/)
        * [NxTrim](https://github.com/sequencing/NxTrim)
        * [mash](https://mash.readthedocs.io/en/latest/)
        * [SPAdes](http://cab.spbu.ru/software/spades/) (>= 3.12 to support merged paired-end reads, >= 3.5.0 to support Nanopore reads)
        * [Unicycler](https://github.com/rrwick/Unicycler/)
        * [Flye](https://github.com/fenderglass/Flye) >= 2.6
        * [racon](https://github.com/lbcb-sci/racon)
        * [CheckM](https://github.com/Ecogenomics/CheckM) >= 1.1.0
        * [BioPython](https://biopython.org/)
        * [NCBI BLAST+](https://blast.ncbi.nlm.nih.gov/Blast.cgi)
        * [DFAST](https://github.com/nigyta/dfast_core)
        
        ### Install with conda
        
        The simplest way to install ZGA and all dependencies is conda:
        
        1. You need to install conda, e.g. [**miniconda**](https://conda.io/en/latest/miniconda.html). Python 3.7 is preferred.
        
        2. After installation You should add channels - the conda's software sources:  
        `conda config --add channels defaults`  
        `conda config --add channels bioconda`  
        `conda config --add channels conda-forge`
        
        3. At the end You should install ZGA to an existing active environment (Python 3.6 or 3.7):  
        `conda install -c laxeye zga`  
        or create a fresh environment and activate it:  
        `conda create -n zga -c laxeye zga`  
        `conda activate zga`
        
        [![Anaconda latest release](https://anaconda.org/laxeye/zga/badges/latest_release_date.svg)](https://anaconda.org/laxeye/zga/)
        
        ### Installing dependencies
        
        All dependencies may be installed using **conda**:
        
        It's highly recommended to create a new conda environment:
        
        `conda create -n zga "python>=3.6" fastp "spades>=3.12" unicycler checkm-genome dfast bbmap blast biopython nxtrim "mash>=2" flye racon "samtools>=1.9"`
        
        and activate it
        
        `conda activate zga`
        
        Otherwise you may install dependencies to existing conda environment:
        
        `conda install "python>=3.6" fastp "spades>=3.12" unicycler checkm-genome dfast bbmap blast biopython nxtrim "mash>=2" flye racon "samtools>=1.9"`
        
        Of course, it's possible to use *another ways* even compile all tools from source code. In this case you should check if binaries are in your '$PATH' variable.
        
        ### Install from PyPI
        
        Run `pip install zga`. Biopython is the only one dependency installed from PyPI. All other dependencies You should install manually or using **conda** as mentioned above. CheckM is available on **PyPi**, but it's easier to install it using **conda**.
        
        ### Get source from Github
        
        You can get ZGA by cloning from the repository with `git clone https://github.com/laxeye/zga.git` or by downloading an archive.
        After downloading enter the directory and run `python3 setup.py build && python3 setup.py install`.
        
        ### Operating systems requirements
        
        ZGA was tested on Ubuntu 18.04 and 19.10. Most probably any modern 64-bit Linux distribuition is enough.
        
        Your feedback on other OS is welcome!
        
        ## Usage
        
        Run `zga -h` to get a help message.
        
        Examples:
        
        Perform all steps: read qc, read trimming and merging, assembly, CheckM assesment with default (bacterial) marker set, DFAST annotation and use 4 CPU threads where possible:
        
        `zga -1 R1.fastq.gz -2 R2.fastq.gz --threads 4 -o my_assembly`
        
        Assemble with SPAdes using paired-end and nanopore reads of archaeal genome (CheckM will use archaeal markers) altering memory limit to 16 GB:
        
        `zga -1 R1.fastq.gz -2 R2.fastq.gz --nanopore MiniION.fastq.gz -a spades --threads 4 --memory-limit 16 --domain archaea -o my_assembly`
        
        Assemble long reads with Flye skipping long read polishing and perfom short-read polishing with racon:
        
        `zga -1 R1.fastq.gz -2 R2.fastq.gz --nanopore MiniION.fastq.gz -a flye --threads 4 --domain archaea -o my_assembly --flye-short-polish --skip-flye-long-polish`
        
        Assemble from Nanopore reads using unicycler:
        
        `zga -a unicycler --nanopore MiniION.fastq -o nanopore_assembly`
        
        Perform assesment and annotation of genome assembly with 'Pectobacterium' CheckM marker set:
        
        `zga --first-step check_genome -g pectobacterium_sp.fasta --checkm_rank genus --checkm_taxon Pectobacterium -o my_output_dir`
        
        Let CheckM to infer the right marker set:
        
        `zga --first-step check_genome -g my_genome.fa --checkm_mode lineage -o my_output_dir`
        
        ## Known issues and limitations
        
        ZGA is in the stage of active development.
        
        Known issues and limitations:
        
        * It's not posible to provide multiple read libraries i.e. two sets of PE reads or two nanopore runs.
        * Unicycler doesn't use mate-pair reads.
        * It's not possible to install all dependencies with Python 3.8 via conda, please use 3.7 or 3.6.
        
        Don't hesitate to report bugs or features!
        
        ## Cite
        
        It's a great pleasure to know, that your software is useful. Please cite ZGA:
        
        Korzhenkov A. (2020). ZGA: prokaryotic genome assembly and annotation pipeline.
        
        And of course tools it's using:
        
        Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34(17), i884-i890. https://doi.org/10.1093/bioinformatics/bty560
        
        Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge–accurate paired shotgun read merging via overlap. PloS one, 12(10).
        
        Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., ... & Pyshkin, A. V. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5), 455-477.
        
        Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS computational biology, 13(6), e1005595.
        
        Vaser, R., Sović, I., Nagarajan, N., & Šikić, M. (2017). Fast and accurate de novo genome assembly from long uncorrected reads. Genome research, 27(5), 737-746.
        
        Kolmogorov, M., Yuan, J., Lin, Y., & Pevzner, P. A. (2019). Assembly of long, error-prone reads using repeat graphs. Nature biotechnology, 37(5), 540-546.
        
        Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research, 25(7), 1043-1055.
        
        Tanizawa, Y., Fujisawa, T., & Nakamura, Y. (2018). DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics, 34(6), 1037-1039.
        
        Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410.
        
        Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ... & De Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-1423.
        
        O’Connell, J., et al. (2015) NxTrim: optimized trimming of Illumina mate pair reads. Bioinformatics 31(12), 2035-2037.
        
        Ondov, B.D., Treangen, T.J., Melsted, P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016). doi: 10.1186/s13059-016-0997-x
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.6
Description-Content-Type: text/markdown
