Metadata-Version: 2.1
Name: airflow-metrics-gbq
Version: 0.0.4a0
Summary: Airflow metrics to Google BigQuery
Home-page: https://github.com/abyssnlp/airflow-metrics-gbq
License: BSD2
Author: Shaurya Rawat
Author-email: rawatshaurya1994@gmail.com
Requires-Python: >=3.8.1,<4.0.0
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Framework :: Apache Airflow
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: google-api-python-client (>=2.85.0,<3.0.0)
Requires-Dist: google-cloud-bigquery[pandas] (>=3.10.0,<4.0.0)
Requires-Dist: google-cloud-logging (>=3.5.0,<4.0.0)
Requires-Dist: pandas (>=2.0.0,<3.0.0)
Project-URL: Repository, https://github.com/abyssnlp/airflow-metrics-gbq
Description-Content-Type: text/markdown

Airflow Metrics to BigQuery
===

<p align="center">
    <a href="https://github.com/abyssnlp/airflow-metrics-gbq/actions/workflows/ci.yaml"><img alt="build" src="https://github.com/abyssnlp/airflow-metrics-gbq/actions/workflows/ci.yaml/badge.svg"/></a>
    <a href="https://github.com/abyssnlp/airflow-metrics-gbq/actions/workflows/release.yaml"><img alt="release" src="https://github.com/abyssnlp/airflow-metrics-gbq/actions/workflows/release.yaml/badge.svg"/></a>
    <a href="https://pypi.org/project/airflow-metrics-gbq"><img alt="PyPI" src="https://img.shields.io/pypi/v/airflow-metrics-gbq?style=plastic"></a>
    <img alt="PyPI - License" src="https://img.shields.io/pypi/l/airflow-metrics-gbq?color=blue&style=plastic">
</p>

Sends airflow metrics to Bigquery

---

### Installation
```bash
pip install airflow-metrics-gbq
```

### Usage
1. Activate statsd metrics in `airflow.cfg`
```ini
[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
```
2. Restart the webserver and the scheduler
```bash
systemctl restart airflow-webserver.service
systemctl restart airflow-scheduler.service
```
3. Check that airflow is sending out metrics:
```bash
nc -l -u localhost 8125
```
4. Install this package
5. Create required tables (counters, gauges and timers), an example is shared [here](./scripts/sql/create_monitoring_tables.sql)
6. Create materialized views which refresh when the base table changes, as describe [here](./scripts/sql/mat_views.sql)
7. Create a simple python script `monitor.py` to provide configuration:
```python
from airflow_metrics_gbq.metrics import AirflowMonitor

if __name__ == '__main__':
    monitor = AirflowMonitor(
        host="localhost", # Statsd host (airflow.cfg)
        port=8125, # Statsd port (airflow.cfg)
        gcp_credentials="path/to/service/account.json",
        dataset_id="monitoring", # dataset where the monitoring tables are
        counts_table="counts", # counters table
        last_table="last", # gauges table
        timers_table="timers" # timers table
    )
    monitor.run()
```
8. Run the program, ideally in the background to start sending metrics to BigQuery:
```bash
python monitor.py &
```
9. The logs can be viewed in the GCP console under the `airflow_monitoring` app_name in Google Cloud Logging.


**Future releases**
- [ ] Add a buffer (pyzmq or mp queue)
- [ ] Run sending metrics to GBQ in another process
- [ ] Add proper typing and mypy support and checks
- [ ] Provide more configurable options
- [ ] Provide better documentation

