Metadata-Version: 2.1
Name: xlavir
Version: 0.6.2
Summary: Excel report from viral sequencing analysis output
Home-page: https://github.com/peterk87/xlavir
Author: Peter Kruczkiewicz
Author-email: peter.kruczkiewicz@gmail.com
License: MIT license
Keywords: xlavir
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.5
License-File: LICENSE
License-File: AUTHORS.rst

======
xlavir
======


.. image:: https://img.shields.io/pypi/v/xlavir.svg
        :target: https://pypi.python.org/pypi/xlavir

.. image:: https://github.com/peterk87/xlavir/workflows/CI/badge.svg?branch=master
        :target: https://github.com/peterk87/xlavir/actions

.. image:: https://readthedocs.org/projects/xlavir/badge/?version=latest
        :target: https://xlavir.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status


Excel report from viral sequencing data analysis output from the `nf-core/viralrecon`_ or `peterk87/nf-virontus`_ Nextflow pipelines.


* Free software: MIT license
* Documentation: https://xlavir.readthedocs.io.


Features
--------

* Collect sample results from a `nf-core/viralrecon`_ or `peterk87/nf-virontus`_ into a Excel report
    * Samtools_ read mapping stats (``flagstat``)
    * Mosdepth_ read mapping coverage information
    * Variant calling information (SnpEff_ and SnpSift_ results, VCF file information)
    * Consensus sequences
* QA/QC of sample analysis results (basic PASS/FAIL based on minimum genome coverage and depth)
* Nextflow workflow execution information
* Prepend worksheets from other Excel documents into the report (e.g. cover page/sheet, sample sheet, lab results)
* Add custom images into worksheets with custom names and descriptions (e.g. phylogenetic tree figure PNG)

Roadmap
-------

* Bcftools_ variant calling stats sheet
* Sample metadata table to merge with certain stats?
* YAML config to info sheet?
* coverage chart with controls?

Credits
-------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
.. _nf-core/viralrecon: https://github.com/nf-core/viralrecon
.. _peterk87/nf-virontus: https://github.com/peterk87/nf-virontus/
.. _Bcftools: https://www.htslib.org/doc/bcftools.html
.. _Samtools: https://samtools.github.io/
.. _SnpEff: https://pcingola.github.io/SnpEff/se_introduction/
.. _SnpSift: https://pcingola.github.io/SnpEff/ss_introduction/
.. _Mosdepth: https://github.com/brentp/mosdepth


=======
History
=======

0.6.1 (2022-02-01)
------------------

* Added more checks for Medaka VCFs from low coverage samples which may produce ValueError and ZeroDivisionError errors


0.6.0 (2022-01-05)
------------------

* Add support for reading annotated Medaka VCF files (``medaka_variant`` VCF annotated with ``medaka tools annotate``)
* Changed mutation string format to ``{gene}:{AA change} ({NT change}{extra})`` if there is a AA change
* Added low coverage filtering of variants for Medaka VCF
* "Variants Summary" table now sorted by nucleotide position

0.5.3 (2021-11-09)
------------------

* Fixed shorter consensus sequences not being written to report
* Improve nf-virontus VCF compatibility

0.5.2 (2021-11-08)
------------------

Fixes and changes from PR `#15 <https://github.com/peterk87/xlavir/issues/15>`_

Fixes:

* low coverage coordinate output off by one (``xlavir.tools.mosdepth.get_interval_coords_bed``)
* error on no Pangolin reports found (e.g. non-SARS-CoV-2 report) (``xlavir.tools.pangolin.get_info``)
* user QC thresholds not being used (``xlavir.xlavir.run``)
* not showing all QC fail comments (``xlavir.qc.create_qc_stats_dataframe``)
* consensus sequences being too long for Excel cell character limit (32,767 characters); longer sequences are chunked into 80 character segments with one segment per line in consensus sheet  (``xlavir.tools.consensus.read_fasta``)

Changes:

* Ignore and skip unsupported VCFs instead of throwing NotImplementedError (``xlavir.tools.variants.get_info``)
* In consensus sheet, only add QC comments on FASTA header rows if necessary (``xlavir.io.xl.add_comments``)


0.5.1 (2021-08-04)
------------------

* Fixed issue (`#12 <https://github.com/peterk87/xlavir/issues/12>`_) where iVar ref allele depth corresponds to depth of base before deletion. For indels, ref allele depth is taken from the total depth minus the alt allele depth.
* Fixed issue (`#14 <https://github.com/peterk87/xlavir/issues/14>`_) where the total number of reads from ``samtools flagstat`` may not be the true number of reads. The unmapped reads may be excluded from the BAM file so the ``samtools flagstat`` total number of reads may be equal to the number of mapped reads. There is now a search for fastp JSON files to get the true total number of reads.

0.5.0 (2021-07-30)
------------------

* Added support for Nanopolish VCF parsing as generated by the ARTIC pipeline
* Added deduplication of VCF and SnpSift entries since the ARTIC pipeline may produce VCF files with duplicate variant calls due to overlap between amplicons.
* Added VCF and SnpSift test data for CLI test to generate Excel report.

0.4.3 (2021-07-29)
------------------

* Fix an issue where single base positions are being reported as 0-based when all other ranges are 1-based for reporting of low/no coverage regions from Mosdepth per-base BED files (`#10 <https://github.com/peterk87/xlavir/pull/10>`_).


0.4.2 (2021-05-21)
------------------

* Add support for nf-core/viralrecon version 2.0 (requires Mosdepth ``bed.gz`` files be output; needs custom ``modules.config`` like `this one <https://gist.github.com/peterk87/495621349c1161d12047c1c8f97935af>`_)
* `Nextclade CLI <https://github.com/nextstrain/nextclade/blob/master/packages/cli/README.md>`_ per sample results parsed into sheet showing useful info like Nextstrain clade, # of mutations, # of PCR primer changes
* Added check that input directory exists and is a directory
* Added sheet with xlavir info
* Added Gene, Variant Effect, Variant Impact, Amino Acid Change to Variant Summary table


0.4.1 (2021-05-14)
------------------

* Add reference sequence length to QC stats table. Get ref seq length from max mosdepth per base BED coverage value.
* Add more conditional formatting
* Fix ``execution_report.html`` finding
* Fix version printing; add to help
* Add epilog with usage info


0.4.0 (2021-04-23)
------------------

* Adds "Variants Summary" sheet summarizing variant information across all samples
* Adds comments to AF values in "Variant Matrix" sheet
* Fixes width/height of cell comments to be based on length of comment text

0.3.0 (2021-04-23)
------------------

* Adds support for adding Ct values from a Ct values table (tab-delimited, CSV, ODS, XLSX format) into an xlavir report.

0.2.4 (2021-04-19)
------------------

* Fixes issue with SnpSift table file parsing and variable naming in variants.py (#4, #5)

0.2.3 (2021-04-19)
------------------

* Fixes issue with SnpSift table file parsing. Adds check to see if SnpSift column is dtype object/str before using .str Series methods (#4)

0.2.2 (2021-03-30)
------------------

* Fixes issue with SnpEff/SnpSift AA change parsing.

0.2.1 (2021-03-29)
------------------

* Fix division by zero error due to variants with DP values of 0

0.2.0 (2021-03-04)
------------------

* Added header comments with descriptions of field content
* Added comment to Variant Matrix sheet A1 cell describing what is shown in the matrix
* Added highlighting of samples failing QC in other sheets
* Fixed image scaling by determining image size with imageio
* Added Medaka_ / Longshot_ VCF parsing

0.1.1 (2021-02-16)
------------------

* Collect sample results from a `nf-core/viralrecon`_ or `peterk87/nf-virontus`_ into a Excel report
    * Samtools_ read mapping stats (``flagstat``)
    * Mosdepth_ read mapping coverage information
    * Variant calling information (SnpEff_ and SnpSift_ results, VCF file information)
    * Consensus sequences
* iVar VCF parsing
* QA/QC of sample analysis results (basic PASS/FAIL based on minimum genome coverage and depth)
* Nextflow workflow execution information
* Prepend worksheets from other Excel documents into the report (e.g. cover page/sheet, sample sheet, lab results)
* Add custom images into worksheets with custom names and descriptions (e.g. phylogenetic tree figure PNG)

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
.. _nf-core/viralrecon: https://github.com/nf-core/viralrecon
.. _peterk87/nf-virontus: https://github.com/peterk87/nf-virontus/
.. _Bcftools: https://www.htslib.org/doc/bcftools.html
.. _Samtools: https://samtools.github.io/
.. _SnpEff: https://pcingola.github.io/SnpEff/se_introduction/
.. _SnpSift: https://pcingola.github.io/SnpEff/ss_introduction/
.. _Mosdepth: https://github.com/brentp/mosdepth
.. _Longshot: https://github.com/pjedge/longshot
.. _Medaka: https://github.com/nanoporetech/medaka


