Metadata-Version: 2.3
Name: sparv-word-prediction-kb-bert-plugin
Version: 0.5.2
Summary: A sparv plugin for computing word neighbours using a BERT model.
Project-URL: Homepage, https://spraakbanken.gu.se
Project-URL: Repository, https://github.com/spraakbanken/sparv-word-prediction-plugin
Project-URL: Bug Tracker, https://github.com/spraakbanken/sparv-word-prediction-plugin/labels/project%3Aword-prediction--kb-bert
Author-email: Kristoffer Andersson <kristoffer.andersson@gu.se>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Requires-Dist: sparv-pipeline>=5.2.0
Requires-Dist: tabulate>=0.9.0
Requires-Dist: transformers>=4.34.1
Description-Content-Type: text/markdown

# sparv-word-prediction--kb-bert-plugin

[![PyPI version](https://badge.fury.io/py/sparv-word-prediction-kb-bert-plugin.svg)](https://pypi.org/project/sparv-word-prediction-kb-bert-plugin)

Plugin for applying bert masking as a [Sparv](https://github.com/spraakbanken/sparv-pipeline) annotation.

## Install

First, install Sparv, as suggested:

```bash
pipx install sparv-pipeline
```

Then install install `sparv-word-prediction-kb-bert-plugin` with

```bash
pipx inject sparv-pipeline sparv-word-prediction-kb-bert-plugin
```

## Usage

Depending on how many explicit exports of annotations you have you can decide to use this
annotation exclusively by adding it as the only annotation to export under `xml_export`:

```yaml
xml_export:
    annotations:
        - <token>:word_prediction_kb_bert.word-prediction--kb-bert
```

To use it together with other annotations you might add it under `export`:

```yaml
export:
    annotations:
        - <token>:word_prediction_kb_bert.word-prediction--kb-bert
        ...
```

### Configuration

You can configure this plugin by the number of neighbours to generate.

#### Number of Neighbours

The number of neighbours defaults to `5` but can be configured in `config.yaml`:

```yaml
word_prediction_kb_bert:
    num_neighbours: 5
```

#### Number of Decimals

The number of decimals defaults to `3` but can be configured in `config.yaml`:

```yaml
word_prediction_kb_bert:
    num_decimals: 3
```

> [!NOTE] This also controls the cut-off, so all values where the score round to 0.000 (or the number of decimals) is discarded.

### Metadata

#### Model

Type | HuggingFace Model | Revision
--- | --- | ---
Model | [`KBLab/bert-base-swedish-cased`](https://huggingface.co/KBLab/bert-base-swedish-cased) | c710fb8dff81abb11d704cd46a8a1e010b2b022c
Tokenizer | same as Model  | same as Model

## Changelog

This project keeps a [changelog](./CHANGELOG.md).
