Metadata-Version: 2.1
Name: loomchild-segment
Version: 2.0.4.2
Summary: Python wrapper for Loomchild segmenter
Home-page: https://github.com/bitextor/loomchild-segment-py
Author: Prompsit Language Engineering
Author-email: info@prompsit.com
Maintainer: Marta Bañon, Elsa Sarrías
Maintainer-email: mbanon@prompsit.com, esarrias@dlsi.ua.es
License: GNU General Public License v3.0
Project-URL: loomchild-segment-py on GitHub, https://github.com/bitextor/loomchild-segment-py
Project-URL: Loomchild segment on GitHub, https://github.com/mbanon/segment
Project-URL: Bifixer on GitHub, https://github.com/bitextor/bifixer
Project-URL: Prompsit Language Engineering, http://www.prompsit.com
Project-URL: Paracrawl, https://paracrawl.eu/
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
License-File: LICENSE

# loomchild-segment

A python module for interfacing with  Java sentence splitter [Loomchild](https://github.com/mbanon/segment). This package is aimed to be used in [Bifixer](https://github.com/bitextor/bifixer) and/or [Bitextor](https://github.com/bitextor/bitextor)

System dependencies to build and use this package are `Maven` and `Java`.

## Installation

This package can be installed with `pip` from pypi:

```bash
pip install loomchild-segment
```

## Usage

Splitting a text into sentences:

```python
from loomchild.segmenter import LoomchildSegmenter

segmenter = LoomchildSegmenter(lang)
# segmenting a single line:
segments = segmenter.get_segmentation(input_line)
print("\n".join(segments))

# segmenting a document (i.e. multiple line breaks in the input)
segments = segmenter.get_document_segmentation(input_text)
print("\n".join(segments))
```

A command line tool is provided to work with base64 encoded documents.

```bash
cat b64encoded_input | py-segment -l $LANG > b64encoded_output
```
