Metadata-Version: 2.1
Name: corpuscula
Version: 1.0.39
Summary: Toolkit that simplifies corpus processing
Home-page: https://github.com/fostroll/corpuscula
Author: Sergei Ternovykh
Author-email: fostroll@gmail.com
License: BSD
Description: <div align="right"><strong>RuMor: Russian Morphology project</strong></div>
        <h2 align="center">Corpuscula: a python NLP library for corpus processing</h2>
        
        [![PyPI Version](https://img.shields.io/pypi/v/corpuscula?color=blue)](https://pypi.org/project/corpuscula/)
        [![Python Version](https://img.shields.io/pypi/pyversions/corpuscula?color=blue)](https://www.python.org/)
        [![License: BSD-3](https://img.shields.io/badge/License-BSD-brightgreen.svg)](https://opensource.org/licenses/BSD-3-Clause)
        
        A part of ***RuMor*** project. It contains tools to simplify corpus
        processing. Highlights are:
        
        * full [*CoNLL-U*](https://universaldependencies.org/format.html) support
        (includes [*CoNLL-U Plus*](https://universaldependencies.org/ext-format.html))
        * wrappers for known corpora of Russian language
        * parser and wrapper for Russian part of *Wikipedia*
        * *Corpus Dictionary* that can be used for further morphology processing
        * simple database to keep named entities
        
        ## Installation
        
        ### pip
        
        ***Corpuscula*** supports *Python 3.5* or later. To install it via *pip*, run:
        ```sh
        $ pip install corpuscula
        ```
        
        If you currently have a previous version of ***Corpuscula*** installed, use:
        ```sh
        $ pip install corpuscula -U
        ```
        
        ### From Source
        
        Alternatively, you can also install ***Corpuscula*** from source of this *git
        repository*:
        ```sh
        $ git clone https://github.com/fostroll/corpuscula.git
        $ cd corpuscula
        $ pip install -e .
        ```
        This gives you access to examples and data that are not included to the
        *PyPI* package.
        
        ## Setup
        
        After installation, you need to specify a directory where you prefer to store
        downloaded corpora:
        ```python
        >>> import corpuscula.corpus_utils as cu
        >>> cu.set_root_dir(<path>)  # We will keep corpora here
        ```
        **NB:** it will create/update config file `.rumor` in your home directory.
        
        If you won't set the root directory, ***Corpuscula*** will keep corpora
        in the directory where it's installed.
        
        ## Usage
        
        [*CoNLL-U* Support](https://github.com/fostroll/corpuscula/blob/master/doc/README_CONLLU.md)
        
        [Management of Corpora](https://github.com/fostroll/corpuscula/blob/master/doc/README_CORPORA.md)
        
        [Wrapper for *Wikipedia*](https://github.com/fostroll/corpuscula/blob/master/doc/README_WIKIPEDIA.md)
        
        [*Corpus Dictionary*](https://github.com/fostroll/corpuscula/blob/master/doc/README_CDICT.md)
        
        [Utilities](https://github.com/fostroll/corpuscula/blob/master/doc/README_UTILS.md)
        
        [*Items* database](https://github.com/fostroll/corpuscula/blob/master/doc/README_ITEMS.md)
        
        ## Examples
        
        You can find examples in the directory `examples` of our ***Corpuscula*** github
        repository.
        
        ## License
        
        ***Corpuscula*** is released under the BSD License. See the
        [LICENSE](https://github.com/fostroll/corpuscula/blob/master/LICENSE) file for
        more details.
        
Keywords: natural-language-processing nlp conllu corpora
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.5
Description-Content-Type: text/markdown
