Metadata-Version: 2.1
Name: orion-ml
Version: 0.4.0
Summary: Orion is a machine learning library built for unsupervised time series anomaly detection.
Home-page: https://github.com/sintel-dev/Orion
Author: MIT Data To AI Lab
Author-email: dailabmit@gmail.com
License: MIT license
Keywords: orion
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6,<3.9
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: dev
License-File: LICENSE
License-File: AUTHORS.rst

<p align="left">
<img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt=“DAI-Lab” />
<i>An open source project from Data to AI Lab at MIT.</i>
</p>

<p align="left">
<img width=20% src="https://dai.lids.mit.edu/wp-content/uploads/2018/08/orion.png" alt=“Orion” />
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![Python](https://img.shields.io/badge/Python-3.6%20%7C%203.7%20%7C%203.8-blue)](https://badge.fury.io/py/orion-ml) 
[![PyPi Shield](https://img.shields.io/pypi/v/orion-ml.svg)](https://pypi.python.org/pypi/orion-ml)
[![Tests](https://github.com/sintel-dev/Orion/workflows/Run%20Tests/badge.svg)](https://github.com/sintel-dev/Orion/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![Downloads](https://pepy.tech/badge/orion-ml)](https://pepy.tech/project/orion-ml)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sintel-dev/Orion/master?filepath=tutorials)

# Orion

A machine learning library for unsupervised time series anomaly detection.

| Important Links                     |                                                                      |
| ----------------------------------- | -------------------------------------------------------------------- |
| :computer: **[Website]**            | Check out the Sintel Website for more information about the project. |
| :book: **[Documentation]**          | Quickstarts, User and Development Guides, and API Reference.         |
| :star: **[Tutorials]**              | Checkout our notebooks                                               |
| :octocat: **[Repository]**          | The link to the Github Repository of this library.                   |
| :scroll: **[License]**              | The repository is published under the MIT License.                   |
| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage.                             |
| [![][Slack Logo] **Community**][Community]    | Join our Slack Workspace for announcements and discussions.          |

[Website]: https://sintel.dev/
[Documentation]: https://sintel-dev.github.io/Orion
[Tutorials]: https://github.com/sintel-dev/Orion/tree/master/tutorials
[Repository]: https://github.com/sintel-dev/Orion
[License]: https://github.com/sintel-dev/Orion/blob/master/LICENSE
[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha
[Community]: https://join.slack.com/t/sintel-space/shared_invite/zt-q147oimb-4HcphcxPfDAM0O9_4PaUtw
[Slack Logo]: https://github.com/sintel-dev/Orion/blob/master/docs/images/slack.png

# Overview

Orion is a machine learning library built for *unsupervised time series anomaly detection*. With a given time series data, we provide a number of “verified” ML pipelines (a.k.a Orion pipelines) that identify rare patterns and flag them for expert review.

The library makes use of a number of **automated machine learning** tools developed under [Data to AI Lab at MIT](https://dai.lids.mit.edu/).

Read about using an Orion pipeline on NYC taxi dataset in a blog series:

[Part 1: Learn about unsupervised time series anomaly detection](https://t.co/yIFVM1oRwQ?amp=1) | [Part 2: Learn how we use GANs to solving the problem? ](https://link.medium.com/cGsBD0Fevbb) | [Part 3: How does one evaluate anomaly detection pipelines?](https://link.medium.com/FqCrFXMevbb)
:--------------------------------------:|:---------------------------------------------:|:--------------------------------------------:
![](docs/images/tulog-part-1.png)       |  ![](docs/images/tulog-part-2.png)            | ![](docs/images/tulog-part-3.png)

**Notebooks:** Discover *Orion* through colab by launching our [notebooks](https://drive.google.com/drive/folders/1FAcCEiE1JDsqaMjGcmiw5a5XuGh13c9Q?usp=sharing)!

# Quickstart

## Install with pip

The easiest and recommended way to install **Orion** is using [pip](https://pip.pypa.io/en/stable/):

```bash
pip install orion-ml
```

This will pull and install the latest stable release from [PyPi](https://pypi.org/).


In the following example we show how to use one of the **Orion Pipelines**.

## Fit an Orion pipeline

We will load a demo data for this example:

```python3
from orion.data import load_signal

train_data = load_signal('S-1-train')
train_data.head()
```

which should show a signal with `timestamp` and `value`.
```
    timestamp     value
0  1222819200 -0.366359
1  1222840800 -0.394108
2  1222862400  0.403625
3  1222884000 -0.362759
4  1222905600 -0.370746
```

In this example we use `lstm_dynamic_threshold` pipeline and set some hyperparameters (in this case training epochs as 5).

```python3
from orion import Orion

hyperparameters = {
    'keras.Sequential.LSTMTimeSeriesRegressor#1': {
        'epochs': 5,
        'verbose': True
    }
}

orion = Orion(
    pipeline='lstm_dynamic_threshold',
    hyperparameters=hyperparameters
)

orion.fit(train_data)
```

## Detect anomalies using the fitted pipeline
Once it is fitted, we are ready to use it to detect anomalies in our incoming time series:

```python3
new_data = load_signal('S-1-new')
anomalies = orion.detect(new_data)
```
> :warning: Depending on your system and the exact versions that you might have installed some *WARNINGS* may be printed. These can be safely ignored as they do not interfere with the proper behavior of the pipeline.

The output of the previous command will be a ``pandas.DataFrame`` containing a table of detected anomalies:

```
        start         end  severity
0  1394323200  1399701600  0.673494
```

# Leaderboard
In every release, we run Orion benchmark. We maintain an up-to-date leaderboard with the current scoring of the verified pipelines according to the benchmarking procedure.

We run the benchmark on **11** datasets with their known grounth truth. We record the score of the pipelines on each datasets. To compute the leaderboard table, we showcase the number of wins each pipeline has over the ARIMA pipeline.

| Pipeline                  |  Outperforms ARIMA |
|---------------------------|--------------------|
| AER                       |         11         |
| TadGAN                    |          8         |
| LSTM Dynamic Thresholding |          7         |
| LSTM Autoencoder          |          7         |
| Dense Autoencoder         |          6         |
| Azure                     |          0         |


You can find the scores of each pipeline on every signal recorded in the [details Google Sheets document](https://docs.google.com/spreadsheets/d/1HaYDjY-BEXEObbi65fwG0om5d8kbRarhpK4mvOZVmqU/edit?usp=sharing). The summarized results can also be browsed in the following [summary Google Sheets document](https://docs.google.com/spreadsheets/d/1ZPUwYH8LhDovVeuJhKYGXYny7472HXVCzhX6D6PObmg/edit?usp=sharing).

# Resources

Additional resources that might be of interest:
* Learn about [benchmarking pipelines](BENCHMARK.md).
* Read about [pipeline evaluation](orion/evaluation/README.md).
* Find out more about [TadGAN](https://arxiv.org/pdf/2009.07769v3.pdf).

# Citation

If you use **Orion** which is part of the **Sintel** ecosystem for your research, please consider citing the following paper:

Sarah Alnegheimish, Dongyu Liu, Carles Sala, Laure Berti-Equille, Kalyan Veeramachaneni. [Sintel: A Machine Learning Framework to Extract Insights from Signals](https://dl.acm.org/doi/pdf/10.1145/3514221.3517910).
```
@inproceedings{alnegheimish2022sintel,
  title={Sintel: A Machine Learning Framework to Extract Insights from Signals},
  author={Alnegheimish, Sarah and Liu, Dongyu and Sala, Carles and Berti-Equille, Laure and Veeramachaneni, Kalyan},  
  booktitle={Proceedings of the 2022 International Conference on Management of Data},
  pages = {1855–1865},
  numpages = {11},
  publisher={Association for Computing Machinery},
  doi = {10.1145/3514221.3517910},
  series = {SIGMOD '22},
  year={2022}
}
```


If you use **TadGAN** for your research, please consider citing the following paper:

Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. [TadGAN - Time Series Anomaly Detection Using Generative Adversarial Networks](https://arxiv.org/pdf/2009.07769v3.pdf).

```
@inproceedings{geiger2020tadgan,
  title={TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks},
  author={Geiger, Alexander and Liu, Dongyu and Alnegheimish, Sarah and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},
  booktitle={2020 IEEE International Conference on Big Data (IEEE BigData)},
  pages={33-43},
  doi={10.1109/BigData50022.2020.9378139},
  organization={IEEE},
  year={2020}
}
```


History
=======

## 0.4.0 - 2022-11-08

This version introduces several new enhancements:

* Support to python 3.8
* Migrating to Tensorflow 2.0
* New pipeline, namely ``VAE``, a Variational AutoEncoder model.

### Issues resolved

* Add python 3.8 – [Issue #342](https://github.com/signals-dev/Orion/issues/342) by @sarahmish
* VAE (Variational Autoencoders) pipeline implementation – [Issue #349](https://github.com/signals-dev/Orion/issues/349) by @dyuliu
* Add masking option for ``regression_errors`` – [Issue #352](https://github.com/signals-dev/Orion/issues/352) by @dyuliu
* Changes in TadGAN for tensorflow 2.0 – [Issue #161](https://github.com/signals-dev/Orion/issues/161) by @lcwong0928
* Add an automatic dependency checker – [Issue #320](https://github.com/signals-dev/Orion/issues/320) by @sarahmish
* TadGAN ``batch_size`` cannot be changed – [Issue #313](https://github.com/signals-dev/Orion/issues/313) by @sarahmish


## 0.3.2 - 2022-07-04

This version fixes some of the issues in ``aer``, ``ae``, and ``tadgan`` pipelines.

### Issues resolved

* Fix AER model predict error after loading – [Issue #304](https://github.com/signals-dev/Orion/issues/304) by @lcwong0928
* Update AE to work with any `window_size` – [Issue #300](https://github.com/signals-dev/Orion/issues/300) by @sarahmish
* Updated tadgan_viz.json – [Issue #292](https://github.com/signals-dev/Orion/issues/292) by @Hramir


## 0.3.1 - 2022-04-26

This version introduce a new pipeline, namely ``AER``, an AutoEncoder Regressor model.

### Issues resolved
* Add AER Model - [Issue #286](https://github.com/signals-dev/Orion/issues/286) by @lcwong0928


## 0.3.0 - 2022-03-31

This version deprecates the support of ``OrionDBExplorer``, which has been migrated to
[sintel](https://github.com/signals-dev/Orion). As a result, ``Orion`` no longer requires
mongoDB as a dependency.

### Issues resolved
* Update dependency  - [Issue #283](https://github.com/signals-dev/Orion/issues/283) by @sarahmish
* General housekeeping  - [Issue #278](https://github.com/signals-dev/Orion/issues/278) by @sarahmish
* Fix tutorial testing issue - [Issue #276](https://github.com/signals-dev/Orion/issues/276) by @sarahmish
* Migrate OrionExplorer to Sintel - [Issue #275](https://github.com/signals-dev/Orion/issues/275) by @dyuliu
* LSTM viz JSON pipeline added - [Issue #271](https://github.com/signals-dev/Orion/issues/271) by @Hramir


## 0.2.1 - 2022-02-18

This version introduces improvements and more testing.

### Issues resolved
* Adjusting builds for TadGAN - [Issue #261](https://github.com/signals-dev/Orion/issues/261) by @sarahmish
* Testing tutorials, dependencies, and OS - [Issue #251](https://github.com/signals-dev/Orion/issues/251) by @sarahmish


## 0.2.0 - 2021-10-11

This version supports multivariate timeseries as input. In addition to minor improvements
and maintenance.

### Issues resolved
* `setuptools` no longer supports `lib2to3` breaking `mongoengine` - [Issue #252](https://github.com/signals-dev/Orion/issues/252) by @sarahmish
* Supporting multivariate input - [Issue #248](https://github.com/signals-dev/Orion/issues/248) by @sarahmish
* TadGAN pipeline with visualization option - [Issue #240](https://github.com/signals-dev/Orion/issues/240) by @sarahmish
* Support saving absolute path for add_signals and add_signal when using dbExplorer - [Issue #202](https://github.com/signals-dev/Orion/issues/202) by @sarahmish
* dynamic scalability of TadGAN primitive based on `window_size` - [Issue #87](https://github.com/signals-dev/Orion/issues/87) by @sarahmish


## 0.1.7 - 2021-05-04

This version adds new features to the benchmark function where users can now save pipelines, view results as they are being calculated, and allow a single evaluation to be compared multiple times.

### Issues resolved
* Dask issues in benchmark function & improvements - [Issue #225](https://github.com/signals-dev/Orion/issues/225) by @sarahmish
* Numerical overflow when using contextual metrics - [Issue #212](https://github.com/signals-dev/Orion/issues/212) by @kronerte


## 0.1.6 - 2021-03-08

This version introduces two new pipelines: LSTM AE and Dense AE.
In addition to minor improvements, a bit of code refactoring took place to introduce
a new primtive: ``reconstruction_errors``.

### Issues resolved
* Comparison of DTW library performance - [Issue #205](https://github.com/signals-dev/Orion/issues/205) by @sarahmish
* Not able to pickle dump tadgan pipeline - [Issue #200](https://github.com/signals-dev/Orion/issues/200) by @sarahmish
* New pipeline LSTM and Dense autoencoders - [Issue #194](https://github.com/signals-dev/Orion/issues/194) by @sarahmish
* Readme - [Issue #192](https://github.com/signals-dev/Orion/issues/192) by @pvk-developer
* Unable to launch cli - [Issue #186](https://github.com/signals-dev/Orion/issues/186) by @sarahmish
* bullet points not formatted correctly in index.rst - [Issue #178](https://github.com/signals-dev/Orion/issues/178) by @micahjsmith
* Update notebooks - [Issue #176](https://github.com/signals-dev/Orion/issues/176) by @sarahmish
* Inaccuracy in README.md file in orion/evaluation/ - [Issue #157](https://github.com/signals-dev/Orion/issues/157) by @sarahmish
* Dockerfile -- docker does not find orion primitives automatically - [Issue #155](https://github.com/signals-dev/Orion/issues/155) by @sarahmish
* Primitive documentation - [Issue #151](https://github.com/signals-dev/Orion/issues/151) by @sarahmish
* Variable name inconsistency in tadgan - [Issue #150](https://github.com/signals-dev/Orion/issues/150) by @sarahmish
* Sync leaderboard tables between `BENCHMARK.md` and the docs - [Issue #148](https://github.com/signals-dev/Orion/issues/148) by @sarahmish


## 0.1.5 - 2020-12-25

This version includes the new style of documentation and a revamp of the `README.md`. In addition to some minor improvements
in the benchmark code and primitives. This release includes the transfer of `tadgan` pipeline to `verified`.

### Issues resolved
* Link with google colab - [Issue #144](https://github.com/signals-dev/Orion/issues/144) by @sarahmish
* Add `timeseries_anomalies` unittests - [Issue #136](https://github.com/signals-dev/Orion/issues/136) by @sarahmish
* Update `find_sequences` in converting series to arrays - [Issue #135](https://github.com/signals-dev/Orion/issues/135) by @sarahmish
* Definition of error/critic smooth window in score anomalies primitive - [Issue #132](https://github.com/signals-dev/Orion/issues/132) by @sarahmish
* Train-test split in benchmark enhancement - [Issue #130](https://github.com/signals-dev/Orion/issues/130) by @sarahmish


## 0.1.4 - 2020-10-16

Minor enhancements to benchmark

* Load ground truth before try-catch - [Issue #124](https://github.com/signals-dev/Orion/issues/124) by @sarahmish
* Converting timestamp to datetime in Azure primitive - [Issue #123](https://github.com/signals-dev/Orion/issues/123) by @sarahmish
* Benchmark exceptions - [Issue #120](https://github.com/signals-dev/Orion/issues/120) by @sarahmish


## 0.1.3 - 2020-09-29

New benchmark and Azure primitive.

* Implement a benchmarking function new feature - [Issue #94](https://github.com/signals-dev/Orion/issues/94) by @sarahmish
* Add azure anomaly detection as primitive new feature - [Issue #97](https://github.com/signals-dev/Orion/issues/97) by @sarahmish
* Critic and reconstruction error combination - [Issue #99](https://github.com/signals-dev/Orion/issues/99) by @sarahmish
* Fixed threshold for `find_anomalies` - [Issue #101](https://github.com/signals-dev/Orion/issues/101) by @sarahmish
* Add an option to have window size and window step size as percentages of error size - [Issue #102](https://github.com/signals-dev/Orion/issues/102) by @sarahmish
* Organize pipelines into verified and sandbox - [Issue #105](https://github.com/signals-dev/Orion/issues/105) by @sarahmish
* Ground truth parameter name enhancement - [Issue #114](https://github.com/signals-dev/Orion/issues/114) by @sarahmish
* Add benchmark dataset list and parameters to s3 bucket enhancement - [Issue #118](https://github.com/signals-dev/Orion/issues/118) by @sarahmish

## 0.1.2 - 2020-07-03

New Evaluation sub-package and refactor TadGAN.

* Two bugs when saving signalrun if there is no event detected - [Issue #92](https://github.com/signals-dev/Orion/issues/92) by @dyuliu 
* File encoding/decoding issues about `README.md` and `HISTORY.md` - [Issue #88](https://github.com/signals-dev/Orion/issues/88) by @dyuliu
* Fix bottle neck of `score_anomaly` in Cyclegan primitive - [Issue #86](https://github.com/signals-dev/Orion/issues/86) by @dyuliu
* Adjust `epoch` meaning in Cyclegan primitive - [Issue #85](https://github.com/signals-dev/Orion/issues/85) by @sarahmish
* Rename evaluation to benchmark and metrics to evaluation - [Issue #83](https://github.com/signals-dev/Orion/issues/83) by @sarahmish
* Scoring function for intervals of size one - [Issue #76](https://github.com/signals-dev/Orion/issues/76) by @sarahmish

## 0.1.1 - 2020-05-11

New class and function based interfaces.

* Implement the Orion Class - [Issue #79](https://github.com/D3-AI/Orion/issues/79) by @csala
* Implement new functional interface - [Issue #80](https://github.com/D3-AI/Orion/issues/80) by @csala

## 0.1.0 - 2020-04-23

First Orion release to PyPI: https://pypi.org/project/orion-ml/
