Metadata-Version: 2.1
Name: peakina
Version: 0.6.0a0
Summary: pandas readers on steroids (remote files, glob patterns, cache, etc.)
Home-page: https://github.com/ToucanToco/peakina
License: BSD-3-Clause
Author: Toucan Toco
Author-email: dev@toucantoco.com
Requires-Python: >=3.8,<4.0
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Typing :: Typed
Requires-Dist: certifi (>=2021.10.8,<2022.0.0)
Requires-Dist: chardet (>=4.0.0,<5.0.0)
Requires-Dist: fastparquet (>=0.8.0,<0.9.0)
Requires-Dist: jq (>=1.2.1,<2.0.0)
Requires-Dist: pandas (>=1.4.0,<2.0.0)
Requires-Dist: paramiko (>=2.9.2,<3.0.0)
Requires-Dist: pydantic (>=1.9.0,<2.0.0)
Requires-Dist: python-slugify (>=5.0.2,<6.0.0)
Requires-Dist: python-snappy (>=0.6.0,<0.7.0)
Requires-Dist: s3fs (>=2022.1.0,<2023.0.0)
Requires-Dist: tables (>=3.7.0,<4.0.0)
Requires-Dist: urllib3 (>=1.26.8,<2.0.0)
Requires-Dist: xlrd (>=2.0.1,<3.0.0)
Requires-Dist: xmltodict (>=0.12.0,<0.13.0)
Project-URL: Documentation, https://toucantoco.github.io/peakina
Project-URL: Repository, https://github.com/ToucanToco/peakina
Description-Content-Type: text/markdown

[![Pypi-v](https://img.shields.io/pypi/v/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![Pypi-pyversions](https://img.shields.io/pypi/pyversions/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![Pypi-l](https://img.shields.io/pypi/l/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![Pypi-wheel](https://img.shields.io/pypi/wheel/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![GitHub Actions](https://github.com/ToucanToco/peakina/workflows/CI/badge.svg)](https://github.com/ToucanToco/peakina/actions?query=workflow%3ACI)
[![codecov](https://codecov.io/gh/ToucanToco/peakina/branch/main/graph/badge.svg)](https://codecov.io/gh/ToucanToco/peakina)

# Pea Kina _aka 'Giant Panda'_

Wrapper around `pandas` library, which detects separator, encoding
and type of the file. It allows to get a group of files with a matching pattern (python or glob regex).
It can read both local and remote files (HTTP/HTTPS, FTP/FTPS/SFTP or S3/S3N/S3A).

The supported file types are `csv`, `excel`, `json`, `parquet` and `xml`.

:information_source: If the desired type is not yet supported, feel free to open an issue or to directly open a PR with the code !

Please, read the [documentation](https://doc-peakina.toucantoco.com) for more information

# Installation

`pip install peakina`

# Usage
Considering a file `file.csv`
```
a;b
0;0
0;1
```

Just type
```python
>>> import peakina as pk
>>> pk.read_pandas('file.csv')
   a  b
0  0  0
1  0  1
```

Or files on a FTPS server:
- my_data_2015.csv
- my_data_2016.csv
- my_data_2017.csv
- my_data_2018.csv

You can just type

```python
>>> pk.read_pandas('ftps://<path>/my_data_\\d{4}\\.csv$', match='regex', dtype={'a': 'str'})
    a   b     __filename__
0  '0'  0  'my_data_2015.csv'
1  '0'  1  'my_data_2015.csv'
2  '1'  0  'my_data_2016.csv'
3  '1'  1  'my_data_2016.csv'
4  '3'  0  'my_data_2017.csv'
5  '3'  1  'my_data_2017.csv'
6  '4'  0  'my_data_2018.csv'
7  '4'  1  'my_data_2018.csv'
```

## Using cache

You may want to keep the last result in cache, to avoid downloading and extracting the file if it didn't change:

```python
>>> from peakina.cache import Cache
>>> cache = Cache.get_cache('memory')  # in-memory cache
>>> df = pk.read_pandas('file.csv', expire=3600, cache=cache)
```

In this example, the resulting dataframe will be fetched from the cache, unless `file.csv` modification time has changed on disk, or unless the cache is older than 1 hour.

For persistent caching, use: `cache = Cache.get_cache('hdf', cache_dir='/tmp')`


## Use only downloading feature

If you just want to download a file, without converting it to a pandas dataframe:

```python
>>> uri = 'https://i.imgur.com/V9x88.jpg'
>>> f = pk.fetch(uri)
>>> f.get_str_mtime()
'2012-11-04T17:27:14Z'
>>> with f.open() as stream:
...     print('Image size:', len(stream.read()), 'bytes')
...
Image size: 60284 bytes
```

## Installation on macOS M1 chipset

## install everything
```console
brew install hdf5 snappy
HDF5_DIR="/opt/homebrew/Cellar/hdf5/1.12.1/" CPPFLAGS="-I/opt/homebrew/Cellar/snappy/1.1.9/include -L/opt/homebrew/Cellar/snappy/1.1.9/lib" poetry install
```

For more details, here is what is needed:

### install pytables
```console
brew install hdf5
HDF5_DIR="/opt/homebrew/Cellar/hdf5/1.12.1/" poetry run pip install tables
```

### install python-snappy
```console
brew install snappy
CPPFLAGS="-I/opt/homebrew/Cellar/snappy/1.1.9/include -L/opt/homebrew/Cellar/snappy/1.1.9/lib" poetry run pip install python-snappy
```

