Metadata-Version: 2.1
Name: atligator
Version: 0.0.6
Summary: Atlas based editing of protein interactions
Home-page: https://github.com/Hoecker-Lab/atligator
Author: Josef Kynast
Author-email: josef.kynast@uni-bayreuth.de
License: UNKNOWN
Project-URL: Bug Tracker, https://github.com/Hoecker-Lab/atligator/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# ATLIGATOR

ATLIGATOR is created to analyse protein-protein or protein-peptide interactions.

# Installation Notes

You can find the atligator in the Python Packaging Index.
Thus, to install atligator in your python 3.7+ environment just type:

```pip install atligator```

#### Notes
For the compilation of scipy and numpy you might need to install the compilers first 
(or try to use only wheel files, see below! This might depend on your OS environment):
```
sudo apt-get install gfortran libopenblas-dev liblapack-dev
```
Additionally, if installation fails because of the build process of dependencies use the '--only-binary=:all:' flag:
```
python -m pip install --only-binary=:all: atligator
```

# Usage

You can use ATLIGATOR either with the predefined scripts or the API. 
Most use cases can be accessed via both options.
Note: Most scripts will also give you a help option using the `-h` flag.

### Why?

Why you should use ATLIGATOR?

#### Atlases & Alarms
With ATLIGATOR you can build up interaction databases (called atlases) which store the interactions between pairs of
amino acid residues. By combining many of these pairwise interactions you can get a better understanding of how amino 
acids tend to interact with each other.

All the input structures can be selected manually or by using fold-specific SCOPe-identifiers to only use protein 
structures with similar architecture. Additionally, input structures can be preprocessed to only contain relevant 
protein chains (a central binder chain and ligand chains in close distance) and filtered for a certain secondary
structure content.

#### Pockets
Atlas information can be exploited further by using the pocket mining functionality. Pockets are frequent groups of
interactions which can be detected within an atlas. With Pockets you can directly see potential binding pockets for
designing amino acid binders in a structural context.

#### Design
Pockets can also be grafted onto scaffold proteins to get starting points for new protein or peptide binding abilities 
in existing proteins.

## Structures and Preprocessing

To be able to build a new atlas database prepare a directory with input pdb structures.

#### Downloading Structures
If you want to use ATLIGATORS SCOPe-matching functionality, either use the predefined script:

*to download all files matched with SCOPe id a.118.11 to the directory ./structures*
```shell script
python3 download_pdbs_by_scop_query.py -o ./structures a.118.11
```
or write a python script/use the interpreter:

```python
from atligator.pdb_util import download_pdbs_by_scop_query

download_pdbs_by_scop_query(query="a.118.11",
                            dest_dir="./structures")
# If you have a local copy of the pdb database you can try setting the parameter: pdb_db_path
# If you have a local copy of the SCOPe database you can try setting the parameter: scop_dir_cla_file
# Both will prevent downloading those files from the web.
```

or for single PDB files:

```python
from atligator.pdb_util import get_pdb_structure

get_pdb_structure("2GDN", dest_dir="./structures/")
```

#### Preprocessing Structures
To preprocess the input structure files you can use:

```shell script
python3 process_chains.py -o ./structures/processed -pro -d 8.0 -mb 30 -ml 3 structures/*.pdb
```

*The -d option defines the maximum distance two atoms of a residue pair can have to still be included. It's recommended
to use at least the maximum distance you use for atlas generation lateron.*

*The options -mb and -ml define the length of the polypeptide chains to be considered for binder (central chain) or 
ligand (any other chain it interacts with).*

or write a python script/use the interpreter:

```python
from alarms.chain_processing import MultiProcessChainProcessing
from glob import iglob

pdbs = [x for x in iglob("./structures/*.pdb")]
proc = MultiProcessChainProcessing(pdbs=pdbs, output_path="./structures/processed/", min_binder_len=30, 
                                   min_ligand_len=3, max_distance=8.0, n_workers=2, progress=True, quiet=False)
proc.main()
```

To process single files: 

```python
from alarms.chain_processing import process_pdb

process_pdb(pdb="./structures/7ev7.pdb", output_path="./structures/processed/", min_binder_len=30, 
            min_ligand_len=3, max_distance=8.0, quiet=False)
```

#### Filtering Structures

You can filter structures by secondary structure content:
1. total alpha helix
2. total beta sheet
3. alpha helix/beta sheet ratio
4. beta sheet/alpha-helix ratio

*E.g. at least 60% alpha helical residues*

```shell script
python3 select_by_sec_struc.py -o ./structures/processed/selected -at 0.6 -cp structures/processed/*.pdb
```

*the `-cp` flag enables copying the filtered files into the output directory. Otherwise there will be 
only a text file 'filtered_objects.txt' including all selected residues.*

Within python:

```python
from alarms.construct_selection import SelectionProcess
from glob import iglob

pdbs = list(x for x in iglob("./structures/processed/*.pdb"))
select = SelectionProcess(pdbs=pdbs)

# You can retrieve the secondary structure content:
print(select.check_pdbs_for_2s_content())

# Or you can directly filter the files
filtered = select.filter_pdbs(alphatotal=0.8)
print("These fall under the criteria:", filtered)
```

Or to only retrieve the secondary structure content:

```python
from alarms.construct_selection import check_pdbs_for_2s_content
pdbs = list(x for x in iglob("./structures/processed/*.pdb"))
check_pdbs_for_2s_content(pdbs=pdbs, quiet=False)
```


## Atlas

### Atlas Generation

To generate an atlas from pdb input files, just:

```shell script
python3 generate_atlas.py ./atlas.atlas ./structures/processed/*.pdb -b
```

*The `-b` flag excludes interactions of ligand backbone atoms and is recommended to strenghen the influence of the 
corresponding ligand residue side chain.*

Within python:

```python
from atligator.atlas import generate_atlas
from glob import iglob

pdbs = list(x for x in iglob("./structures/processed/*.pdb"))
atlas = generate_atlas(filenames=pdbs, skip_bb_atoms=True)
```

### Atlas Objects

Apart from visualisation or further processing atlas object can be already inspected in python:

```python
# Find a list of AtlasDatapoints in:
atlas.datapoints
# These datapoints contain all atoms of the residue pair (ligand and binder) as well as their residue type and their
# origin

# You can also filter an atlas to only contain specific residue type in ligand and binder position.
filtered_atlas = atlas.filter(ligand_restype = None, binder_restype = None)
```

### Atlas Visualisation

Visualisation of an atlas can be of two types: structural or statistical. 

##### Statistics

The statistical visualization contains a plot of the atlas content, where the ligand residue type is plotted against
the binder residue type and vice versa. 

To generate an atlas from pdb input files, just:


```shell script
python3 visualize_atlas_stats.py ./atlas.atlas -m plotly
```

Within python:
```python
from atligator.visualization import visualize_atlas_stats

visualize_atlas_stats(atlas, method="plotly", ligand_per_binder=False)
```

*There is two ways of plotting atlas statistics: plotly and matplotlib. Plotly is using the browser,
matplotlib creates a new window.*
*To transform the plot to see which ligand residue types interact with the binder residue types, set ligand_per_binder 
to True or use the `-i` flag with the script*.


##### Structural Visualization: Atlas or Atlas subset

The structural representation delivers a good overview about what an atlas is composed of.
The default representation will always draw a 3D plot where only the Calpha and Cbeta atoms of the ligand residue
and the binder residues are present as bigger (Calpha) and smaller (Cbeta) bubbles. 

The ligand residue is typically positioned in the center of the plot to demonstrate where the binder residues are
relative to the ligand residue.


```shell script
python3 visualize_atlas.py ./atlas.atlas -m plotly -l ALA -b TYR
```

Within python:

```python
from atligator.visualization import visualize_atlas

# You can define the ligand residue type
visualize_atlas(atlas, method="plotly", ligand_restype="ALA")

# Or both residue types
visualize_atlas(atlas, method="plotly", ligand_restype="ALA", binder_restype="TYR")
```

## Pockets

### Pocket Mining

To mine pockets from an existing atlas:

```shell script
python3 apply_pocket_miner.py ./atlas.atlas -o ./
```

Within python:
```python
from atligator.pocket_miner import mine_pockets

pockets = mine_pockets(atlas)
```
*Additional parameters can be used for example to define how important the size (cardinality) of the pocket is, or 
how many pockets per ligand residue type are found at maximum.*

### Pocket Handling

Apart from visualization or grafting Pockets can be already inspected in python:

```python
# pockets is a dictionary of ligand residue types. To access the underlying list of pockets just:
pockets['TYR']
# or print all content:
for residue_type, pocket_list in pockets.items():
    print(residue_type, pocket_list)
```

### Pocket Visualization

To visualize pockets you can choose between plotting the pocket atlas or a single pocket.
- The pocket atlas is a filtered version of the original atlas. It only includes datapoints that 
can be found as a part of the pocket.   
For example: You mined a Ile to REST pocket - including Ile as the ligand and Arg, Glu, Ser and Thr as binder residues. 
The corresponding pocket atlas will only contain atlas datapoints that include Ile as a ligand and one of the four 
binder residues as a binder residue. Additionally, only those datapoints that are found in combination with the three 
other datapoints (the other three binder residue types) are taken into account (and filtered based on the clustering 
parameters during pocket mining).

- A single pocket is a multi-residue structure including one ligand residue and all binder residues surrounding it. 
It will only contain those residues that natively interact in one of the input structures. Thus, it is just a strictly
filtered representation of one input structure.

**A pocket atlas** is plotted as an atlas, including Calpha and Cbeta bubbles for the central ligand residue
as well as binder residues around.

```shell script
python3 visualize_pocket.py ./pockets.pockets -l TYR -p RLD
```
*The `-l` flag defines the  ligand residue type and the `-p` flag defines the itemset of the desired pocket 
in one-letter-code (In this case Arg, Leu and Asp).*

Within python:

```python
from atligator.visualization import visualize_pocket_atlas

# First we could check which Tyr pockets we have
print(pockets["TYR"])

tyr_pocket = pockets["TYR"][0]
# You can define the ligand residue type
visualize_pocket_atlas(tyr_pocket, method="plotly")
```



**A single pocket** is plotted with all atoms to enable viewing the interaction in all details.

Visualizing a single pocket is not yet available as a script.

Within python:

```python
from atligator.visualization import visualize_single_pocket

tyr_pocket = pockets["TYR"][0]
# First we have to pick one datapoint of this pocket. We could for example determine the most representative member of 
# one cluster (in this case the first cluster):
# Let's first see which binder residue type is present in this cluster
print(tyr_pocket.clusters[0].binder_restype)
# Let's pick the most representative datapoint
most_representative_member = tyr_pocket.clusters[0].get_most_representative_member(2.0, 1.0, 1.0)

# we could also just pick any member of any cluster
any_member = tyr_pocket.clusters[0].members[0]

# Let's visualize this single pocket with plotly
visualize_single_pocket(pocket=tyr_pocket, datapoint=most_representative_member)
```

