Metadata-Version: 2.1
Name: light-text-prepro
Version: 0.3.1
Summary: Light Text Pre-processing permits to apply a chain of built-in regex rules to a input string.
Home-page: https://github.com/Arfius/light-text-prepro
License: MIT
Keywords: pre processing,regex,nlp
Author: Alfonso Farruggia
Requires-Python: >=3.6,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyYAML (>=5.4.1,<6.0.0)
Requires-Dist: flake8 (>=3.9.1,<4.0.0)
Project-URL: Bug Tracker, https://github.com/Arfius/light-text-prepro/issues
Project-URL: Repository, https://github.com/Arfius/light-text-prepro
Description-Content-Type: text/markdown

# Light Text Pre-processing

`Light Text Pre-processing` is an easy-to-use python module that permits to apply a chain of built-in regex rules to a input string. Regex rules are stored in a separate YML file and compiled at run-time. The compiling mechanism and how to add a custom regex are described below.

![ci/cd](https://github.com/Arfius/light-text-prepro/actions/workflows/light-text-prepro.yml/badge.svg)

## How it works

Package reads a list of regex from `light_text_prepro/rules/regex.yml`.  Each row in `regex.yml` identifies a regex rule such as `user_tag: '"@[0-9a-z](\.?[0-9a-z])*"'`. In this row, `user_tag` is the `key` of the regex, whereas the `'"@[0-9a-z](\.?[0-9a-z])*"'`is its `value`.

At run-time, the package reads the `regex.yml` and compiles a method for each regex, the method is named as the the `key` of the row. For example, at the end of the process, you will be able to call the `user_tag()`method, that permit to match the user tagged. Each method has the optional parameter `replace_with` that allow you to replace the string matched by regex rule with an arbitrary text.

## Package installation

### List of Regex 
```yaml
user_tag: '"@[0-9a-z](\.?[0-9a-z])*"'
email: '"^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$"'
url: '"(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})"'
special_chars: '"[-!$%^&*()_+|~=`{}<>?,.\"\[\]:;/\\]"'
ip_address: '"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"'
html_tag: '"^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$"'
tab_new_line: '"(\n|\t|\r)"'
multiple_space: '"[ ]+"'
```

If you are happy wiht the list above, you can install the package via pip.

```
pip install light-text-prepro
```

## How to use

```python
from light_text_prepro.lprepro import LPrePro
...
obj = LPrePro()
...
result = obj.set_text('Hey @username, this is my email my@email.com') \
		 .user_tag(replace_with='[user]') \
		 .email(replace_with='[email]') \
    	.get_text()
# result -> Hey [user], this is my email [email]
```


Otherwise, if you want to contribute to enrich the package adding your regex rule, please follow section below.

## How to add a regex rules

### Setup project

````
$> git clone https://github.com/Arfius/light-text-prepro.git
$> cd light-text-prepro
$> pip install poetry flake8
$> poetry install
````

### Add  new regex

1. Open `light_text_prepro/rules/regex.yml` and add a new row. Make sure to use a unique key for the rule. If  you get issue adding the regex rule, use any online regex validation tool and export the regex rule for python. (i.e. https://regex101.com/ => FLAVOR python => Copy to clipboard )
2. Add a `unit tests` under the  `tests` folder and make all test passed.  Use`$> poetry run pytest` to run unit tests.
3. Update the  section `List of Regex` at the end of this file.
4. Create a Pull Request



