Metadata-Version: 2.1
Name: zephyr-ml
Version: 0.0.0
Summary: Prediction engineering methods for Draco.
Home-page: https://github.com/sintel-dev/gpe
Author: MIT Data To AI Lab
Author-email: dai-lab@mit.edu
Keywords: zephyr Draco Prediction Engineering
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.7,<3.9
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: dev
License-File: AUTHORS.rst

<p align="left">
<img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
<i>A project from Data to AI Lab at MIT.</i>
</p>

<!-- Uncomment these lines after releasing the package to PyPI for version and downloads badges -->
<!--[![PyPI Shield](https://img.shields.io/pypi/v/gpe.svg)](https://pypi.python.org/pypi/gpe)-->
<!--[![Downloads](https://pepy.tech/badge/gpe)](https://pepy.tech/project/gpe)-->
<!--[![Travis CI Shield](https://travis-ci.org/signals-dev/gpe.svg?branch=master)](https://travis-ci.org/signals-dev/gpe)-->
<!--[![Coverage Status](https://codecov.io/gh/signals-dev/gpe/branch/master/graph/badge.svg)](https://codecov.io/gh/signals-dev/gpe)-->

# GreenGuard Prediction Engineering

Prediction engineering methods for GreenGuard.

- Homepage: https://github.com/signals-dev/gpe

# Overview

The **GreenGuard Prediction Engineering** library is a framework designed to assist in the
generation of machine learning problems for wind farms operations data by analyzing past
occurrences of events.

The main features of **GPE** are:

* **EntitySet creation**: tools designed to represent wind farm data and the relationship
between different tables. We have functions to create EntitySets for datasets with PI data
and datasets using SCADA data.
* **Labeling Functions**: a collection of functions, as well as tools to create custom versions
of them, ready to be used to analyze past operations data in the search for occurrences of
specific types of events in the past.
* **Prediction Engineering**: a flexible framework designed to apply labeling functions on
wind turbine operations data in a number of different ways to create labels for custom
Machine Learning problems.
* **Feature Engineering**: a guide to using Featuretools to apply automated feature engineerinig
to wind farm data.

# Install

## Requirements

**GPE** has been developed and runs on Python 3.6 and 3.7.

Also, although it is not strictly required, the usage of a [virtualenv](
https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid interfering
with other software installed in the system where you are trying to run **GPE**.

## Download and Install

**GPE** can be installed locally using [pip](https://pip.pypa.io/en/stable/) with
the following command:

```bash
pip install --extra-index-url https://pypi.dailab.ml/ gpe
```

This will pull and install the latest stable release from the [DAI-Lab private PyPi Instance](
https://pypi.dailab.ml/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](CONTRIBUTING.rst).

## Docker usage

Alternatively, **GPE** is prepared to be run inside a docker environment. Please check the
[docker documentation](docker/README.md) for details about how to run **GPE** using docker.

# Quickstart

In this short tutorial we will guide you through a series of steps that will help you
getting started with **GPE**.

## 1. Loading the data

The first step we will be to use preprocessed data to create an EntitySet. Depending on the
type of data, we will either the `gpe.create_pidata_entityset` or `gpe.create_scada_entityset`
functions.

**NOTE**: if you cloned the **GPE** repository, you will find some demo data inside the
`notebooks/data` folder which has been preprocessed to fit the `create_entityset` data
requirements.

```python3
import os
import pandas as pd
from gpe import create_scada_entityset

data_path = 'notebooks/data'

data = {
  'turbines': pd.read_csv(os.path.join(data_path, 'turbines.csv')),
  'alarms': pd.read_csv(os.path.join(data_path, 'alarms.csv')),
  'work_orders': pd.read_csv(os.path.join(data_path, 'work_orders.csv')),
  'stoppages': pd.read_csv(os.path.join(data_path, 'stoppages.csv')),
  'notifications': pd.read_csv(os.path.join(data_path, 'notifications.csv')),
  'scada': pd.read_csv(os.path.join(data_path, 'scada.csv'))
}

scada_es = create_scada_entityset(data)
```

This will load the turbine, alarms, stoppages, work order, notifications, and SCADA data, and return it
as an EntitySet.

```
Entityset: SCADA data
  DataFrames:
    turbines [Rows: 1, Columns: 10]
    alarms [Rows: 2, Columns: 9]
    work_orders [Rows: 2, Columns: 20]
    stoppages [Rows: 2, Columns: 16]
    notifications [Rows: 2, Columns: 15]
    scada [Rows: 2, Columns: 5]
  Relationships:
    alarms.COD_ELEMENT -> turbines.COD_ELEMENT
    stoppages.COD_ELEMENT -> turbines.COD_ELEMENT
    work_orders.COD_ELEMENT -> turbines.COD_ELEMENT
    scada.COD_ELEMENT -> turbines.COD_ELEMENT
    notifications.COD_ORDER -> work_orders.COD_ORDER
```

## 2. Selecting a Labeling Function

The second step will be to choose an adequate **Labeling Function**.

We can see the list of available labeling functions using the `gpe.labeling.get_labeling_functions`
function.

```python3
from gpe import labeling

labeling.get_labeling_functions()
```

This will return us a dictionary with the name and a short description of each available
function.

```
{'brake_pad_presence': 'Calculates the total power loss over the data slice.',
 'converter_replacement_presence': 'Calculates the converter replacement presence.',
 'total_power_loss': 'Calculates the total power loss over the data slice.'}
 ```

In this case, we will choose the `total_power_loss` function, which calculates the total
amount of power lost over a slice of time.

## 3. Generate Target Times

Once we have loaded the data and the Labeling Function, we are ready to start using
the `gpe.generate_labels` function to generate a Target Times table.

```python3
from gpe import DataLabeler

data_labeler = DataLabeler(labeling.labeling_functions.total_power_loss)
target_times, metadata = data_labeler.generate_label_times(scada_es)
```

This will return us a `compose.LabelTimes` containing the three columns required to start
working on a Machine Learning problem: the turbine ID (COD_ELEMENT), the cutoff time (time) and the label.

```
   COD_ELEMENT       time    label
0            0 2022-01-01  45801.0
```

# What's Next?

If you want to continue learning about **GreenGuard Prediction Engineering** and all its
features please have a look at the tutorials found inside the [notebooks folder](
https://github.com/signals-dev/gpe/tree/master/notebooks).


# History

## 0.2.3 - 2020-09-20

- Update test environment and make test commands.

## 0.2.2 - 2020-02-12

- Add github actions and perform tests over the readme example.

## 0.2.1 - 2020-02-12

- Slight user API improvements.
- Removal of unused code.
- Improved documentation and tutorials.
- Setup to run GPE on a Docker container.

## 0.2.0 - 2020-02-06

First full release

- Data Preprocessing
- Prediction Engineering Framework
- First Labeling functions

## 0.1.0 - 2019-10-31

First Pre-Release
