Metadata-Version: 2.1
Name: genbank_to
Version: 0.3
Summary: Convert GenBank format files to a swath of other formats
Home-page: https://github.com/linsalrob/genbank_to
Author: Rob Edwards
Author-email: raedwards@gmail.com
License: The MIT License (MIT)
Description: # genbank_to
        
        [![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au/)
        [![DOI](https://www.zenodo.org/badge/481464683.svg)](https://www.zenodo.org/badge/latestdoi/481464683)
        [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
        ![GitHub language count](https://img.shields.io/github/languages/count/linsalrob/genbank_to)
        [![PyPi](https://img.shields.io/pypi/pyversions/genbank_to.svg?style=flat-square&label=PyPi%20Versions)](https://pypi.org/project/genbank_to/)
        
        A straightforward application to convert NCBI GenBank format files to a swath of other formats. Hopefully we have the 
        format you need, but if not either post [an issue](https://github.com/linsalrob/genbank_to/issues) using our template,
        or if you have already got it working, post [a PR](https://github.com/linsalrob/genbank_to/pulls) so we can add it and
        add you to the project.
        
        # What it does
        
        Read an NCBI GenBank format file (like our [test data](test/NC_001417.gbk)) and convert it to one of many
        different formats.
        
        # Input formats
        
        At the moment we only support NCBI GenBank format. If you want us to read other common formats, 
        [let us know](https://github.com/linsalrob/genbank_to/issues) and we'll add them.
        
        # Output formats
        
        Here are the output formats you can request. You can request as many of these at once as you like!
        
        These outputs are assuming you provide a (for example) genome file that contains ORFs, Proteins, and Genomes.
        
        ## Nucleotide output
        
         - `-n` or `--nucleotide` outputs the whole DNA sequence (e.g. the genome)
         - `-o` or `--orfs` outputs the DNA sequence of the open reading frames
        
        ## Protein output
        
         - `-a` or `--aminoacids` outputs the protein sequence for each of the open reading frames
        
        ## Complex formats
        
         - `-p` or `--ptt` NCBI ptt protein table. This is a somewhat deprecated NCBI format from their genomes downloads
         - `-f` or `--functions` outputs tab separated data of `protein ID` and `protein function` (also called the `product`)
         - `--gff3` outputs GFF3 format
         - `--amr`  outputs a GFF file and an amino acid fasta file as required by [AMR Finder Plus](https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus#examples). Note that this format checks for validity that often crashes AMRFinderPlus
         - `--phage_finder` outputs a unique format required by [phage_finder](http://phage-finder.sourceforge.net/)
        
        ## Output options
        
         - `-i` or `--seqid` only output this sequence, or these sequences if you specify more than one `-i`/`--seqid`
         - `-z` or `--zip` compress some of the outputs
         - `--log` write logs to a different file
        
        ## Separate multi-GenBank files
        
        If your GenBank files contains multiple sequence records (separated with `//`), you can provide the `--separate` flag. 
        This will write each entry into its own file. This is compatible with `-n`/`--nucleotide`, `-o`/`--orfs`, and
        `-a`/`--aminoacids`. However, if you provide the `--separate` flag on its own, it will write each entry in your 
        multi-GenBank file to its own GenBank file.
        
        ## Examples
        
        All of these examples use our [test data](test/NC_001417.gbk)
        
        1. Extract a `fasta` of the genome:
        
        ```bash
        genbank_to -g test/NC_001417.gbk -n test/NC_001417.fna
        ```
        
        2. Extract the DNA sequences of the ORFs to a single file
        
        ```bash
        genbank_to -g test/NC_001417.gbk -o test/NC_001417.orfs
        ```
        
        3. Extract the protein (amino acid) sequences of the ORFs to a file
        
        ```bash
        genbank_to -g test/NC_001417.gbk -a test/NC_001417.faa
        ```
        
        4. Do all of these at once
        
        ```bash
        genbank_to -g test/NC_001417.gbk -n test/NC_001417.fna -o test/NC_001417.orfs -a test/NC_001417.faa
        ```
Keywords: genbanke bioinformatics microbiology genome genomics
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Description-Content-Type: text/markdown
