Metadata-Version: 2.1
Name: zga
Version: 0.0.5.post1
Summary: Prokaryotic genome assembly and annotation pipeline
Home-page: https://github.com/laxeye/zga
Author: Aleksei Korzhenkov
Author-email: oscypek@ya.ru
License: UNKNOWN
Description: # ZGA - prokaryotic genome assembly and annotation pipeline
        
        [![version status](https://img.shields.io/pypi/v/zga.svg)](https://pypi.python.org/pypi/zga)
        
        ## Installation
        
        
        ### Installing dependencies
        
        ZGA is written in Python and tested with Python 3.6 and Python 3.7. ZGA uses several software and libs including:
        
        * fastqc
        * ea-utils
        * BBmap
        * SPAdes (>= 3.12 to support merged paired-end reads, >= 3.5.0 to support Nanopore reads)
        * unicycler
        * CheckM
        * DFast
        * BioPython
        * NCBI BLASTn
        * NxTrim
        * mash
        
        All of them may be installed using **conda**:
        
        It's highly recommended to create a new conda environment:
        
        `conda create -n zga "python>=3.6" fastqc ea-utils "spades>=3.12" unicycler checkm-genome dfast bbmap blast biopython nxtrim mash`
        
        and activate it
        
        `conda activate zga`
        
        
        Otherwise you may install dependencies to existing conda environment:
        
        `conda install python>=3.6 fastqc ea-utils spades unicycler checkm-genome dfast bbmap blast biopython nxtrim mash`
        
        
        Of course, it's possible to use *another ways* even compile all tools from source code. In this case you should check if binaries are in your '$PATH' variable.
        
        
        ### Get source from Github
        
        You can get ZGA by cloning from the repository with `git clone https://github.com/laxeye/zga.git` or by downloading an archive.
        
        
        ### Install from PyPi
        
        Run `pip install zga` it will check if You have Biopython and istall it if not. But all other dependencies You should install manually or using **conda**. CheckM is available on **PyPi**, but it's easier to install it using **conda**.
        
        
        ### Operating systems requirements
        
        ZGA was tested on Ubuntu 18.04 and 19.10. Most probably any modern 64-bit Linux distribuition is enough.
        
        Your feedback on other OS is welcome!
        
        
        ## Usage
        
        You should run `zga.py` if You cloned the source code or `zga` otherwise.
        
        Run 'zga.py -h' to get a help message.
        
        Examples:
        
        Perform all steps: read qc, read trimming and merging, assembly, CheckM assesment with default (bacterial) marker set, DFAST annotation and use 4 CPU threads where possible:
        
        `zga.py -1 R1.fastq.gz -2 R2.fastq.gz --threads 4 -o my_assembly`
        
        or use SPAdes and provide it with paired-end and nanopore reads of archaeal genome (CheckM will use archaeal markers)
        
        `zga.py -1 R1.fastq.gz -2 R2.fastq.gz --nanopore MiniION.fastq.gz -a spades --threads 4 --domain archaea -o my_assembly`
        
        or from Nanopore reads using only unicycler
        
        `zga.py --nanopore MiniION.fastq.gz -o nanopore_assembly`
        
        Perform genome assesment and annotation:
        
        With 'Pectobacterium' CheckM marker set: 
        
        `zga.py --step check -g pectobacterium_sp.fasta --checkm_rank genus --checkm_taxon Pectobacterium -o my_output_dir`
        
        Let CheckM to infer the right marker set: 
        
        `zga.py --first-step check -g my_genome.fa --checkm_mode lineage -o my_output_dir`
        
        
        ## Know issues and limitations
        
        Don't forget: ZGA is in the early testing...
        
        I hope to fix next issues **ASAP**:
        
        * It's not posible to provide multiple read libraries i.e. tow sets of PE reads or two nanopore runs. 
        * It's not possible to install all dependencies with Python 3.8 via conda, please use 3.7 or 3.6.
        * There is no conda package
        
        Limitations of unicycler:
        
        * Unicycler doesn't use mate-pair reads.
        
        Don't hesitate to report bug or feature!
        
        
        ## Cite
        
        It's a great pleasure to know, that your software is useful. Please cite ZAG: 
        
        Korzhenkov A. (2020). ZGA: prokaryotic genome assembly and annotation pipeline.
        
        And of course tools it's using:
        
        Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.
        
        Aronesty, E. (2013). Comparison of sequencing utility programs. The open bioinformatics journal, 7(1).
        
        Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge–accurate paired shotgun read merging via overlap. PloS one, 12(10).
        
        Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., ... & Pyshkin, A. V. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5), 455-477.
        
        Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS computational biology, 13(6), e1005595.
        
        Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research, 25(7), 1043-1055.
        
        Tanizawa, Y., Fujisawa, T., & Nakamura, Y. (2018). DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics, 34(6), 1037-1039.
        
        Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410.
        
        Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ... & De Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-1423.
        
        O’Connell, J., et al. (2015) NxTrim: optimized trimming of Illumina mate pair reads. Bioinformatics 31(12), 2035-2037.
        
        Ondov, B.D., Treangen, T.J., Melsted, P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016). doi: 10.1186/s13059-016-0997-x
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.6
Description-Content-Type: text/markdown
