Metadata-Version: 2.1
Name: NanoPlot
Version: 1.32.1
Summary: Plotting suite for Oxford Nanopore sequencing data and alignments
Home-page: https://github.com/wdecoster/NanoPlot
Author: Wouter De Coster
Author-email: decosterwouter@gmail.com
License: GPLv3
Description: # NanoPlot
        Plotting tool for long read sequencing data and alignments.   
        
        [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/wouter_decoster.svg?style=social&label=Follow%20%40wouter_decoster)](https://twitter.com/wouter_decoster)
        [![conda badge](https://anaconda.org/bioconda/nanoplot/badges/installer/conda.svg)](https://anaconda.org/bioconda/nanoplot)
        [![Build Status](https://travis-ci.org/wdecoster/NanoPlot.svg?branch=master)](https://travis-ci.org/wdecoster/NanoPlot)
        
        ### NanoPlot is also available as a [web service](http://nanoplot.bioinf.be).
        
        ![Example plot](https://github.com/wdecoster/NanoPlot/blob/master/examples/scaled_Log_Downsampled_LengthvsQualityScatterPlot_kde.png)
        
        The example plot above shows a bivariate plot comparing log transformed read length with average basecall Phred quality score. More examples can be found in the [gallery on my blog 'Gigabase Or Gigabyte'.](https://gigabaseorgigabyte.wordpress.com/2017/06/01/example-gallery-of-nanoplot/)
        
        In addition to various plots also a NanoStats file is created summarizing key features of the dataset.
        
        This script performs data extraction from Oxford Nanopore sequencing data in the following formats:  
        - fastq files  
        (can be bgzip, bzip2 or gzip compressed)  
        - fastq files generated by albacore, guppy or MinKNOW containing additional information  
        (can be bgzip, bzip2 or gzip compressed)  
        - sorted bam files  
        - sequencing_summary.txt output table generated by albacore, guppy or MinKnow basecalling
        (can be gzip, bz2, zip and xz compressed)
        - fasta files
        (can be bgzip, bzip2 or gzip compressed)  
        Multiple files of the same type can be offered simultaneously
        
        ### INSTALLATION
        
        `pip install NanoPlot`  
        
        Upgrade to a newer version using:  
        `pip install NanoPlot --upgrade`
        
        or
        
        [![conda badge](https://anaconda.org/bioconda/nanoplot/badges/installer/conda.svg)](https://anaconda.org/bioconda/nanoplot)   
        `conda install -c bioconda nanoplot`
        
        The script is written for python3.
        
        ### OUTPUT
        NanoPlot creates:
        - a statistical summary
        - a number of plots
        - a html summary file
        
        
        
        ### USAGE
        ```
        NanoPlot [-h] [-v] [-t THREADS] [--verbose] [--store] [--raw]
                        [-o OUTDIR] [-p PREFIX] [--maxlength N] [--minlength N]
                        [--drop_outliers] [--downsample N] [--loglength]
                        [--percentqual] [--alength] [--minqual N]
                        [--readtype {1D,2D,1D2}] [--barcoded] [--runtime_until N]
                        [-c COLOR]
                        [-f {eps,jpeg,jpg,pdf,pgf,png,ps,raw,rgba,svg,svgz,tif,tiff}]
                        [--plots [{kde,hex,dot,pauvre} [{kde,hex,dot,pauvre} ...]]]
                        [--listcolors] [--no-N50] [--N50] [--title TITLE]
                        (--fastq file [file ...] | --fasta file [file ...] | --fastq_rich file [file ...] | --fastq_minimal file [file ...] | --summary file [file ...] | --bam file [file ...] | --cram file [file ...] | --pickle pickle)
        
        
        General options:
          -h, --help            show the help and exit
          -v, --version         Print version and exit.
          -t, --threads THREADS Set the allowed number of threads to be used by the script
          --verbose             Write log messages also to terminal.
          --store               Store the extracted data in a pickle file for future plotting.
          --raw                 Store the extracted data in tab separated file.
          -o, --outdir OUTDIR   Specify directory in which output has to be created.
          -p, --prefix PREFIX   Specify an optional prefix to be used for the output files.
        
        Options for filtering or transforming input prior to plotting:
          --maxlength N         Hide reads longer than length specified.
          --minlength N         Hide reads shorter than length specified.
          --drop_outliers       Drop outlier reads with extreme long length.
          --downsample N        Reduce dataset to N reads by random sampling.
          --loglength           Logarithmic scaling of lengths in plots.
          --percentqual         Use qualities as theoretical percent identities.
          --alength             Use aligned read lengths rather than sequenced length (bam mode)
          --minqual N           Drop reads with an average quality lower than specified.
          --runtime_until N     Only take the N first hours of a run
          --readtype            Which read type to extract information about from a summary file.
                                One of 1D (default), 2D, 1D2
          --barcoded            Use if you want to split the summary file by barcode
        
        Options for customizing the plots created:
          -c, --color COLOR     Specify a color for the plots, must be a valid matplotlib color
          -f, --format          Specify the output format of the plots.
                                One of png [default], eps,jpeg,jpg,pdf,pgf,ps,raw,rgba,svg,svgz,tif,tiff
          --plots               Specify which bivariate plots have to be made.
                                One or more of 'dot' (default), 'kde' (default), 'hex' and 'pauvre'
          --listcolors          List the colors which are available for plotting and exit.
          --no-N50              Hide the N50 mark in the read length histogram
          --N50                 Show the N50 mark in the read length histogram
          --title TITLE         Add a title to all plots, requires quoting if using spaces
        
        Input data sources, one of these is required.:
          --fastq file [file ...]
                                Data is in one or more default fastq file(s).
          --fasta file [file ...]
                                Data is in one or more default fasta file(s).
          --fastq_rich file [file ...]
                                Data is in one or more fastq file(s) generated by albacore or MinKNOW with
                                additional information concerning channel and time.
          --fastq_minimal file [file ...]
                                Data is in one or more fastq file(s) generated by albacore or MinKNOW with
                                additional information concerning channel and time. Minimal data is extracted
                                swiftly without elaborate checks.
          --summary file [file ...]
                                Data is in one or more summary file(s) generated by albacore or guppy.
          --bam file [file ...]
                                Data is in one or more sorted bam file(s).
          --cram file [file ...]
                                Data is in one or more sorted cram file(s).
          --pickle pickle       Data is a pickle file stored earlier.
        ```
        
        ### NOTES
         - `--downsample` won't save you tons of time, as down sampling is only done after collecting all data and probably would only make a difference for a huge amount of data. If you want to save time you could down sample your data upfront. Note also that extracting information from a summary file is faster than other formats, and that you can extract from multiple files simultaneously (which will happen in parallel then). Some plot types (especially kde) are slower than others and you can take a look at the input for `--plots` to speed things up (default is to make both kde and dot plot). If you are only interested in say the read length histogram it is possible to write a script to just get you that and avoid wasting time on the rest. Let me know if you need any help here.
        
        
        ### EXAMPLE USAGE
        ```bash
        Nanoplot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed  
        NanoPlot -t 2 --fastq reads1.fastq.gz reads2.fastq.gz --maxlength 40000 --plots hex dot
        NanoPlot -t 12 --color yellow --bam alignment1.bam alignment2.bam alignment3.bam --downsample 10000 -o bamplots_downsampled
        ```
        This script now also provides read length vs mean quality plots in the '[pauvre](https://github.com/conchoecia/pauvre)'-style from [@conchoecia](https://github.com/conchoecia).
        
        
        ## ACKNOWLEDGMENTS/CONTRIBUTORS
        - Andreas Sjödin for building and maintaining conda recipes
        - Darrin Schultz [@conchoecia](https://github.com/conchoecia) for Pauvre code
        - [@alexomics](https://github.com/alexomics) for fixing the indentation of the printed stats
        - Botond Sipos [@bsipos](https://github.com/bsipos) for speeding up the calculation of average quality scores
        
        
        ## CONTRIBUTING
        I welcome all suggestions, bug reports, feature requests and contributions. Please leave an [issue](https://github.com/wdecoster/NanoPlot/issues) or open a pull request. I will usually respond within a day, or rarely within a few days.
        
        ## PLOTS GENERATED
        Plot|Fastq|Fastq_rich|Fastq_minimal|Bam|Summary|Options|Style
        ----|----|----|----|----|----|----|----
        Histogram of read length|x|x|x|x|x|N50|
        Histogram of (log transformed) read length|x|x|x|x|x|N50|
        Bivariate plot of length against base call quality|x|x||x|x|log transformation|dot, hex, kde, pauvre
        Heatmap of reads per channel||x|||x||
        Cumulative yield plot||x|x||x||
        Violin plot of read length over time||x|x||x||
        Violin plot of base call quality over time||x|||x||
        Bivariate plot of aligned read length against sequenced read length||||x|||dot, hex, kde
        Bivariate plot of percent reference identity against read length||||x||log transformation|dot, hex, kde
        Bivariate plot of percent reference identity against base call quality||||x|||dot, hex, kde
        Bivariate plot of mapping quality against read length||||x||log transformation|dot, hex, kde
        Bivariate plot of mapping quality against basecall quality||||x|||dot, hex, kde
        
        
        ## COMPANION SCRIPTS
        - [NanoComp](https://github.com/wdecoster/nanocomp): comparing multiple runs  
        - [NanoStat](https://github.com/wdecoster/nanostat): statistic summary report of reads or alignments  
        - [NanoFilt](https://github.com/wdecoster/nanofilt): filtering and trimming of reads  
        - [NanoLyse](https://github.com/wdecoster/nanolyse): removing contaminant reads (e.g. lambda control DNA) from fastq
        
        
        ## CITATION
        If you use this tool, please consider citing our [publication](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty149/4934939).
        
        Copyright: 2016-2020 Wouter De Coster <decosterwouter@gmail.com>
        
Keywords: nanopore sequencing plotting quality control
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Requires-Python: >=3
Description-Content-Type: text/markdown
