Metadata-Version: 1.2
Name: NGSpeciesID
Version: 0.0.9
Summary: Reconstructs viral consensus sequences from a set of ONT reads.
Home-page: https://github.com/ksahlin/NGSpeciesID
Author: Kristoffer Sahlin
Author-email: ksahlin@math.su.se
License: UNKNOWN
Description: NGSpeciesID
        ===========
        
        NGSpeciesID is a tool for clustering and consensus forming of targeted ONT reads. This repository is a modified version of [isONclust](https://github.com/ksahlin/isONclust), where consensus and polishing feautures have been added.
        
        NGSpeciesID is distributed as a python package supported on Linux / OSX with python v3.6. [![Build Status](https://travis-ci.org/ksahlin/NGSpeciesID.svg?branch=master)](https://travis-ci.org/ksahlin/NGSpeciesID).
        
        Table of Contents
        =================
        
          * [INSTALLATION](#INSTALLATION)
            * [Using conda](#Using-conda)
            * [Testing installation](#testing-installation)
          * [USAGE](#USAGE)
            * [Output](#Output)
            * [Parameters](#Parameters)
          * [CREDITS](#CREDITS)
          * [LICENCE](#LICENCE)
        
        
        
        INSTALLATION
        ----------------
        
        **NOTE**: If you are experiencing issues (e.g. [this one](https://github.com/rvaser/spoa/issues/26)) with the third party tools  [spoa](https://github.com/rvaser/spoa) or [medaka](https://github.com/nanoporetech/medaka) in the all-in-one installation instructions below, please install the tools manually with their respective installation instructions [here](https://github.com/rvaser/spoa#installation) and [here](https://github.com/nanoporetech/medaka#installation).  
        
        ### Using conda
        Conda is the preferred way to install NGSpeciesID.
        
        1. Create and activate a new environment called NGSpeciesID
        
        ```
        conda create -n NGSpeciesID python=3.6 pip 
        conda activate NGSpeciesID
        ```
        
        2. Install NGSpeciesID 
        
        ```
        pip install NGSpeciesID
        conda install --yes -c conda-forge -c bioconda medaka==0.11.5 openblas==0.3.3 spoa racon minimap2
        ```
        3. You should now have 'NGSpeciesID' installed; try it:
        ```
        NGSpeciesID --help
        ```
        
        Upon start/login to your server/computer you need to activate the conda environment "NGSpeciesID" to run NGSpeciesID as:
        ```
        conda activate NGSpeciesID
        ```
        
        
        
        ### Testing installation
        
        Assuming you are in the NGSpeciesID directory, you can try the installation with
        
        ``` 
        python NGSpeciesID --ont  --fastq  test/sample_h1.fastq --outfolder  ~/tmp/sample_h1 --consensus --medaka
        ```
        
        
        USAGE
        -------
        
        NGSpeciesID needs a fastq file generated by an Oxford Nanopore basecaller.
        
        ```
        NGSpeciesID --ont --consensus --medaka --fastq [reads.fastq] --outfolder [/path/to/output] 
        ```
        The argument `--ont` simply means `--k 13 --w 20`. These arguments can be set manually without the `--ont` flag. Specify number of cores with `--t`. 
        
        
        NGSpeciesID can also run with racon as polisher. For example
        
        ```
        NGSpeciesID --ont --consensus --racon --racon_iter 3 --fastq [reads.fastq] --outfolder [/path/to/output] 
        ```
        will polish the consensus sequences with racon three times.
        
        ### Output
        
        The output consists of the polished consensus sequences along with some information about clustering.
        
        * Polished consensus sequence(s). A folder named “medaka_cl_id_X”[/"racon_cl_id_X"] is created for each predicted consensus. Each such folder contains a sequence “consensus.fasta” which is the final output of NGSpeciesID. 
        * Draft spoa consensus sequences of each of the clusters are given as consensus_reference_X.fasta (where X is a number).
        * The final cluster information is given in a tsv file `final_clusters.tsv` present in the specified output folder.
        
        
        In the cluster TSV-file, the first column is the cluster ID and the second column is the read accession. For example:
        
        ```
        0 read_X_acc
        0 read_Y_acc
        ...
        n read_Z_acc
        ```
        if there are n reads there will be n rows. Some reads might be singletons. The rows are ordered with respect to the size of the cluster (largest first).
        
        
        
        CREDITS
        ----------------
        
        Please cite [1] when using NGSpeciesID.
        
        1. TBA
        
        
        
        LICENCE
        ----------------
        
        GPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/NGSpeciesID/blob/master/LICENCE.txt).
        
        
        
Keywords: viral sequeces ONT Oxford Nanopore Technologies long reads
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
