# H.E.L.E.N.
H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)


[![Build Status](https://travis-ci.com/kishwarshafin/helen.svg?branch=master)](https://travis-ci.com/kishwarshafin/helen)
___________________________________________________________
Pre-print of a paper describing the methods and overview of a suggested `de novo assembly` pipeline is now available:
#### [Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit](https://www.biorxiv.org/content/10.1101/715722v1)
__________________________________________________________

## Overview
`HELEN` uses a Recurrent-Neural-Network (RNN) based Multi-Task Learning (MTL) model that can predict a base and a run-length for each genomic position using the weights generated by `MarginPolish`.

© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten. <br/>
Computational Genomics Lab (CGL), University of California, Santa Cruz.

## Why MarginPolish-HELEN ?
* `MarginPolish-HELEN` outperforms other graph-based and Neural-Network based polishing pipelines.
* Simple installation steps.
* `HELEN` can use multiple GPUs at the same time.
* Highly optimized pipeline that is faster than any other available polishing tool.
* We have <b>sequenced-assembled-polished 11 samples</b> to ensure robustness, runtime-consistency and cost-efficiency.
* We tested GPU usage on `Amazon Web Services (AWS)` and `Google Cloud Platform (GCP)` to ensure scalability.
* Open source [(MIT License)](LICENSE).


## Installation
`MarginPolish-HELEN` is supported on  <b>`Ubuntu 16.10/18.04`</b> or any other Linux-based system.

##### Install prerequisites
```bash
sudo apt-get -y install git cmake make gcc g++ autoconf bzip2 lzma-dev zlib1g-dev \
libcurl4-openssl-dev libpthread-stubs0-dev libbz2-dev liblzma-dev libhdf5-dev \
python3-pip python3-virtualenv virtualenv
```

##### Method 1: Install MarginPolish-HELEN from GitHub
```bash
git clone https://github.com/kishwarshafin/helen.git
cd helen
make install
. ./venv/bin/activate

marginPolish --version
helen --version
helen --help
marginPolish --help
```
Each time you want to use it, activate the virtualenv:
```bash
source <path/to/helen/venv/bin/activate>
```

##### Method 2: Install using PyPi
```bash
python3 -m pip install helen --user
echo 'export PATH="$(python3 -m site --user-base)/bin":"$(python3 -m site --user-site)/bin":$PATH' >> ~/.bashrc
source ~/.bashrc

marginPolish --version
helen --version
helen --help
marginPolish --help
```

## Usage
`MarginPolish` requires a draft assembly and a mapping of reads to the draft assembly. We commend using `Shasta` as the initial assembler and `MiniMap2` for the mapping.

#### Step 1: Generate an initial assembly
Generate an assembly using one of the ONT assemblers:
* [Shasta long read assembler](https://github.com/chanzuckerberg/shasta).
* [Flye assembler](https://github.com/fenderglass/Flye)
* [Canu assembler](https://github.com/marbl/canu)
* [WTDBG2 assembler](https://github.com/ruanjue/wtdbg2)

#### Step 2: Create an alignment between reads and shasta assembly
We recommend using `MiniMap2` to generate the mapping between the reads and the assembly.
```bash
# we recommend using FASTQ as marginPolish uses quality values
# This command can run MiniMap2 with 32 threads, you can change the number as you like.
minimap2 -ax map-ont -t 32 shasta_assembly.fa reads.fq | samtools sort -@ 32 | samtools view -hb -F 0x104 > reads_2_assembly.bam
samtools index -@32 reads_2_assembly.bam

#  the -F 0x104 flag removes unaligned and secondary sequences
```
#### Step 3: Generate images using MarginPolish
You can generate images using MarginPolish by running:
```bash
marginPolish reads_2_assembly.bam \
Assembly.fa \
</path/to/model_name.json> \
-t <number_of_threads> \
-o <path/to/marginpolish_images> \
-f
```

You can get the `params.json` from `path/to/marginpolish/params/`.

#### Step 4: Run HELEN
##### Download Model
```bash
helen download_models \
--output_dir <path/to/helen_models/>
```

##### Run HELEN
```bash
helen polish \
--image_dir </path/to/marginpolish_images/> \
--model_path </path/to/model.pkl> \
--batch_size 256 \
--num_workers 4 \
--threads <num_of_threads> \
--output_dir </path/to/output_dir> \
--output_prefix <output_filename.fa> \
--gpu
```

If you are using `CPUs` then remove the `--gpu` argument.

## Help
Please open a github issue if you face any difficulties.

## Acknowledgement
We are thankful to [Segey Koren](https://github.com/skoren) and [Karen Miga](https://github.com/khmiga) for their help with `CHM13` data and evaluation.

We downloaded our data from [Telomere-to-telomere consortium](https://github.com/nanopore-wgs-consortium/CHM13) to evaluate our pipeline against `CHM13`.

We acknowledge the work of the developers of these packages: </br>
* [Shasta](https://github.com/chanzuckerberg/shasta/commits?author=paoloczi)
* [pytorch](https://pytorch.org/)
* [ssw library](https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library)
* [hdf5 python (h5py)](https://www.h5py.org/)
* [pybind](https://github.com/pybind/pybind11)
* [hyperband](https://github.com/zygmuntz/hyperband)

## Fun Fact
<img src="https://vignette.wikia.nocookie.net/marveldatabase/images/e/eb/Iron_Man_Armor_Model_45_from_Iron_Man_Vol_5_8_002.jpg/revision/latest?cb=20130420194800" alt="guppy235" width="240p"> <img src="https://vignette.wikia.nocookie.net/marveldatabase/images/c/c0/H.E.L.E.N._%28Earth-616%29_from_Iron_Man_Vol_5_19_002.jpg/revision/latest?cb=20140110025158" alt="guppy235" width="120p"> <br/>

The name "HELEN" is inspired from the A.I. created by Tony Stark in the  Marvel Comics (Earth-616). HELEN was created to control the city Tony was building named "Troy" making the A.I. "HELEN of Troy".

READ MORE: [HELEN](https://marvel.fandom.com/wiki/H.E.L.E.N._(Earth-616))



© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.
