Metadata-Version: 2.1
Name: kaldialign
Version: 0.2
Summary: Kaldi alignment methods wrapped into Python
Home-page: UNKNOWN
Author: Piotr Żelasko
Author-email: pzelasko@jhu.edu
License: Apache 2.0
Description: # kaldialign
        
        A small package that exposes edit distance computation functions from [Kaldi](https://github.com/kaldi-asr/kaldi). It uses the original Kaldi code and wraps it using Cython.
        
        ## Examples
        
        - `align(seq1, seq2, epsilon)` - used to obtain the alignment between two string sequences. `epsilon` should be a null symbol (indicating deletion/insertion) that doesn't exist in either sequence.
        
        ```python
        from kaldialign import align
        
        EPS = '*'
        a = ['a', 'b', 'c']
        b = ['a', 's', 'x', 'c']
        ali = align(a, b, EPS)
        assert ali == [('a', 'a'), (EPS, 's'), ('b', 'x'), ('c', 'c')]
        ```
        
        - `edit_distance(seq1, seq2)` - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions. 
        
        ```python
        from kaldialign import edit_distance
        
        a = ['a', 'b', 'c']
        b = ['a', 's', 'x', 'c']
        results = edit_distance(a, b)
        assert results == {
            'ins': 1,
            'del': 0,
            'sub': 1,
            'total': 2
        }
        ```
        
        ## Motivation
        
        The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using Cython, avoiding the issue altogether.
Keywords: natural language processing,speech recognition,machine learning
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Description-Content-Type: text/markdown
Provides-Extra: dev
