Metadata-Version: 2.1
Name: anyks-lm
Version: 2.1.5
Summary: Smart language model
Home-page: https://github.com/anyks/alm
Author: Yuriy Lobarev
Author-email: forman@anyks.com
License: UNKNOWN
Download-URL: https://github.com/anyks/alm/archive/release.tar.gz
Description: # ANYKS Language Model (ALM)
        
        ## Requirements
        
        - [Zlib](http://www.zlib.net)
        - [OpenSSL](https://www.openssl.org)
        - [Python3](https://www.python.org/download/releases/3.0)
        - [NLohmann::json](https://github.com/nlohmann/json)
        - [BigInteger](http://mattmccutchen.net/bigint)
        
        ## Install PyBind11
        
        ```bash
        $ python3 -m pip install pybind11
        ```
        
        ## Description of Methods
        
        ### Methods:
        - **idw** - Word ID Retrieval Method
        - **idt** - Token ID retrieval method
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.idt("1424")
        2
        >>> alm.idt("hello")
        0
        >>> alm.idw("hello")
        1794085167
        >>> alm.idw("<s>")
        1
        >>> alm.idw("</s>")
        19
        >>> alm.idw("<unk>")
        3
        ```
        
        ### Description
        | Name    | Description                                                                            |
        |---------|----------------------------------------------------------------------------------------|
        |〈s〉     | Sentence beginning token                                                               |
        |〈/s〉    | Sentence end token                                                                     |
        |〈url〉   | URL-address token                                                                      |
        |〈num〉   | Number (arabic or roman) token                                                         |
        |〈unk〉   | Unknown word token                                                                     |
        |〈date〉  | Date token (18.07.2004 ¦ 07/18/2004)                                                   |
        |〈time〉  | Time token (15:44:56)                                                                  |
        |〈abbr〉  | Abbreviation token (1-й ¦ 2-е ¦ 20-я ¦ p.s ¦ p.s.)                                     |
        |〈anum〉  | Pseudo-number token (combination of numbers and other symbols) (T34 ¦ 895-M-86 ¦ 39km) |
        |〈math〉  | Mathematical operation token (+ ¦ - ¦ = ¦ / ¦ * ¦ ^)                                   |
        |〈range〉 | Range of numbers token (1-2 ¦ 100-200 ¦ 300-400)                                       |
        |〈aprox〉 | Approximate number token (~93 ¦ ~95.86 ¦ 10~20)                                        |
        |〈score〉 | Score count token (4:3 ¦ 01:04)                                                        |
        |〈dimen〉 | Dimensions token (200x300 ¦ 1920x1080)                                                 |
        |〈fract〉 | Fraction token (5/20 ¦ 192/864)                                                        |
        |〈punct〉 | Punctuation token (. ¦ ... ¦ , ¦ ! ¦ ? ¦ : ¦ ;)                                        |
        |〈specl〉 | Special character token (~ ¦ @ ¦ # ¦ № ¦ % ¦ & ¦ $ ¦ § ¦ © )                           |
        |〈isolat〉| Isolation/quotation token (" ¦ ' ¦ « ¦ » ¦ „ ¦ “ ¦ ` ¦ ( ¦ ) ¦ [ ¦ ] ¦ { ¦ })          |
        
        ---
        
        ### Methods:
        - **setZone** - User zone set method
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setZone("com")
        ```
        
        ---
        
        ### Methods:
        - **clear** - Method clear all data
        - **setAlphabet** - Method set alphabet
        - **getAlphabet** - Method get alphabet
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.getAlphabet()
        'abcdefghijklmnopqrstuvwxyz'
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.getAlphabet()
        'abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя'
        >>> alm.clear()
        >>> alm.getAlphabet()
        'abcdefghijklmnopqrstuvwxyz'
        ```
        
        ---
        
        ### Methods:
        - **setUnknown** - Method set unknown word
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setUnknown("word")
        ```
        
        ---
        
        ### Methods:
        - **getUnknown** - Method extraction unknown word
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setUnknown("word")
        >>> alm.getUnknown()
        'word'
        ```
        
        ---
        
        ### Methods:
        - **sentences** - Sentences generation method
        - **readLM** - Method for reading data from arpa file
        - **sentencesToFile** - Method for assembling a specified number of sentences and writing to a file
        
        ### Example:
        ```python
        >>> import alm
        >>> def sentencesFn(text):
        ...     print("Sentences:", text)
        ...     return True
        ...
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.sentences(sentencesFn)
        Sentences: <s> В общем </s>
        Sentences: <s> С лязгом выкатился и остановился возле мальчика </s>
        Sentences: <s> У меня нет </s>
        Sentences: <s> Я вообще не хочу </s>
        Sentences: <s> Да и в общем </s>
        Sentences: <s> Не могу </s>
        Sentences: <s> Ну в общем </s>
        Sentences: <s> Так что я вообще не хочу </s>
        Sentences: <s> Потому что я вообще не хочу </s>
        Sentences: <s> Продолжение следует </s>
        Sentences: <s> Неожиданно из подворотни в олега ударил яркий прожектор патрульный трактор </s>
        >>> alm.sentencesToFile(5, "./result.txt")
        ```
        
        ---
        
        ### Methods:
        - **findNgram** - N-gram search method in text
        
        ### Example:
        ```python
        >>> import alm
        >>> def callbackFn(text):
        ...     print(text)
        ... 
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.findNgram("Особое место занимает чудотворная икона Лобзание Христа Иудою", callbackFn)
        <s> Особое
        Особое место
        место занимает
        занимает чудотворная
        чудотворная икона
        икона Лобзание
        Лобзание Христа
        Христа Иудою
        Иудою </s>
        
        
        >>>
        ```
        
        ---
        
        ### Methods:
        - **setOption** - Method for set module options
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.debug)
        >>> alm.setOption(alm.options_t.mixdicts)
        >>> alm.setOption(alm.options_t.onlyGood)
        >>> alm.setOption(alm.options_t.confidence)
        ```
        
        ---
        
        ### Methods:
        - **unsetOption** - Disable module option method
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.unsetOption(alm.options_t.debug)
        >>> alm.unsetOption(alm.options_t.mixdicts)
        >>> alm.unsetOption(alm.options_t.onlyGood)
        >>> alm.unsetOption(alm.options_t.confidence)
        ```
        
        ### Description
        | Name       | Description                                                     |
        |------------|-----------------------------------------------------------------|
        | debug      | Flag debug mode                                                 |
        | mixdicts   | Flag allowing the use of words consisting of mixed dictionaries |
        | onlyGood   | Flag allowing to consider words from the white list only        |
        | confidence | Flag arpa file loading without pre-processing the words         |
        
        ---
        
        ### Methods:
        - **size** - Method of obtaining the size of the N-gram
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.size()
        3
        ```
        
        ---
        
        ### Methods:
        - **textToJson** - Method to convert text to JSON
        - **isAllowApostrophe** - Apostrophe permission check method
        - **switchAllowApostrophe** - Method for permitting or denying an apostrophe as part of a word
        
        ### Example:
        ```python
        >>> import alm
        >>> def callbackFn(text):
        ...     print(text)
        ... 
        >>> alm.isAllowApostrophe()
        False
        >>> alm.switchAllowApostrophe()
        >>> alm.isAllowApostrophe()
        True
        >>> alm.textToJson("«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie", callbackFn)
        [["«","On","nous","dit","qu'aujourd'hui","c'est","le","cas",",","encore","faudra-t-il","l'évaluer","»","l'astronomie"]]
        ```
        
        ---
        
        ### Methods:
        - **jsonToText** - Method to convert JSON to text
        
        ### Example:
        ```python
        >>> import alm
        >>> def callbackFn(text):
        ...     print(text)
        ... 
        >>> alm.jsonToText('[["«","On","nous","dit","qu\'aujourd\'hui","c\'est","le","cas",",","encore","faudra-t-il","l\'évaluer","»","l\'astronomie"]]', callbackFn)
        «On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie
        ```
        
        ---
        
        ### Methods:
        - **restore** - Method for restore text from context
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.restore(["«","On","nous","dit","qu\'aujourd\'hui","c\'est","le","cas",",","encore","faudra-t-il","l\'évaluer","»","l\'astronomie"])
        "«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie"
        ```
        
        ---
        
        ### Methods:
        - **addBadword** - Method add bad word
        - **setBadwords** - Method set words to blacklist
        - **getBadwords** - Method get words in blacklist
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setBadwords(["hello", "world", "test"])
        >>> alm.getBadwords()
        {24227504, 1219922507, 1794085167}
        >>> alm.addBadword("test2")
        >>> alm.getBadwords()
        {24227504, 5035487504, 1219922507, 1794085167}
        ```
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setBadwords({24227504, 1219922507, 1794085167})
        >>> alm.getBadwords()
        {24227504, 1219922507, 1794085167}
        ```
        
        ---
        
        ### Methods:
        - **addGoodword** - Method add good word
        - **setGoodwords** - Method set words to whitelist
        - **getGoodwords** - Method get words in whitelist
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setGoodwords(["hello", "world", "test"])
        >>> alm.getGoodwords()
        {24227504, 1219922507, 1794085167}
        >>> alm.addGoodword("test2")
        >>> alm.getGoodwords()
        {24227504, 5035487504, 1219922507, 1794085167}
        ```
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setGoodwords({24227504, 1219922507, 1794085167})
        >>> alm.getGoodwords()
        {24227504, 1219922507, 1794085167}
        ```
        
        ---
        
        ### Methods:
        - **setUserToken** - Method for adding user token
        - **getUserTokens** - User token list retrieval method
        - **getUserTokenId** - Method for obtaining user token identifier
        - **getUserTokenWord** - Method for obtaining a custom token by its identifier
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setUserToken("usa")
        >>> alm.setUserToken("russia")
        >>> alm.getUserTokenId("usa")
        4188610529
        >>> alm.getUserTokenId("russia")
        47207634939
        >>> alm.getUserTokens()
        ['usa', 'russia']
        >>> alm.getUserTokenWord(4188610529)
        'usa'
        >>> alm.getUserTokenWord(47207634939)
        'russia'
        ```
        
        ---
        
        ### Methods:
        - **setUserTokenMethod** - Method for set a custom token processing function
        
        ### Example:
        ```python
        >>> import alm
        >>> def fn(token, word):
        ...     if token and (token == "<usa>"):
        ...         if word and (word.lower() == "usa"):
        ...             return True
        ...     elif token and (token == "<russia>"):
        ...         if word and (word.lower() == "russia"):
        ...             return True
        ...     return False
        ... 
        >>> alm.setUserToken("usa")
        >>> alm.setUserToken("russia")
        >>> alm.setUserTokenMethod("usa", fn)
        >>> alm.setUserTokenMethod("russia", fn)
        >>> alm.idw("usa")
        346562990
        >>> alm.idw("russia")
        3602214519
        >>> alm.getUserTokenWord(346562990)
        'usa'
        >>> alm.getUserTokenWord(3602214519)
        'russia'
        ```
        
        ---
        
        ### Methods:
        - **setWordPreprocessingMethod** - Method for set the word preprocessing function
        
        ### Example:
        ```python
        >>> import alm
        >>> def run(word, context):
        ...     if word == "возле": word = "около"
        ...     return word
        ... 
        >>> alm.setOption(alm.options_t.debug)
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.setWordPreprocessingMethod(run)
        >>> a = alm.perplexity("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
        info: <s> Неожиданно из подворотни в олега ударил яркий прожектор патрульный трактор <punct> <punct> <punct> </s>
        
        info: p( неожиданно | <s> ) 	= [2gram] 0.00250617 [ -2.60098900 ] / 0.99999999
        info: p( из | неожиданно ...) 	= [3gram] 0.84584931 [ -0.07270700 ] / 1.00000081
        info: p( подворотни | из ...) 	= [3gram] 0.73518561 [ -0.13360300 ] / 0.99999924
        info: p( в | подворотни ...) 	= [3gram] 0.93193581 [ -0.03061400 ] / 0.99999960
        info: p( олега | в ...) 	= [3gram] 0.72047846 [ -0.14237900 ] / 1.00000026
        info: p( ударил | олега ...) 	= [3gram] 0.89971301 [ -0.04589600 ] / 1.00000043
        info: p( яркий | ударил ...) 	= [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
        info: p( прожектор | яркий ...) 	= [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
        info: p( патрульный | прожектор ...) 	= [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
        info: p( трактор | патрульный ...) 	= [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
        info: p( <punct> | трактор ...) 	= [OOV] 0.00000000 [ -inf ] / 0.99999999
        info: p( <punct> | <punct> ...) 	= [OOV] 0.00000000 [ -inf ] / 1.00000011
        info: p( <punct> | <punct> ...) 	= [OOV] 0.00000000 [ -inf ] / 1.00000011
        info: p( </s> | <punct> ...) 	= [1gram] 0.07816800 [ -1.10697100 ] / 1.00000011
        
        info: 1 sentences, 13 words, 0 OOVs
        info: 3 zeroprobs, logprob= -4.25945900 ppl= 2.01487019 ppl1= 2.12642805
        
        info: <s> С лязгом выкатился и остановился около мальчика <punct> <punct> <punct> <punct> </s>
        
        info: p( с | <s> ) 	= [2gram] 0.01301973 [ -1.88539800 ] / 0.99999999
        info: p( лязгом | с ...) 	= [3gram] 0.21850984 [ -0.66052900 ] / 1.00000061
        info: p( выкатился | лязгом ...) 	= [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
        info: p( и | выкатился ...) 	= [3gram] 0.93211608 [ -0.03053000 ] / 0.99999926
        info: p( остановился | и ...) 	= [3gram] 0.72065433 [ -0.14227300 ] / 0.99999975
        info: p( около | остановился ...) 	= [1gram] 0.00003415 [ -4.46662200 ] / 1.00000027
        info: p( мальчика | около ...) 	= [1gram] 0.00023364 [ -3.63146100 ] / 0.99999938
        info: p( <punct> | мальчика ...) 	= [OOV] 0.00000000 [ -inf ] / 0.99999965
        info: p( <punct> | <punct> ...) 	= [OOV] 0.00000000 [ -inf ] / 1.00000011
        info: p( <punct> | <punct> ...) 	= [OOV] 0.00000000 [ -inf ] / 1.00000011
        info: p( <punct> | <punct> ...) 	= [OOV] 0.00000000 [ -inf ] / 1.00000011
        info: p( </s> | <punct> ...) 	= [1gram] 0.07816800 [ -1.10697100 ] / 1.00000011
        
        info: 1 sentences, 11 words, 0 OOVs
        info: 4 zeroprobs, logprob= -11.95535900 ppl= 9.91470774 ppl1= 12.21380039
        >>> print(a.logprob)
        -16.214818
        ```
        
        ---
        
        ### Methods:
        - **initScripts** - Python script initialization method
        - **setWordScript** - Method set script of word processing
        - **getWordScript** - Method get script of word processing
        - **setUserTokenScript** - Method set script user token processing
        - **getUserTokenScript** - Method for extracting a user token processing script
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setUserToken("usa")
        >>> alm.setUserToken("russia")
        >>> alm.setUserTokenScript("./script1.py")
        >>> alm.getUserTokenScript()
        './script1.py'
        >>> alm.setWordScript("./script2.py")
        >>> alm.getWordScript()
        './script2.py'
        >>> alm.initScripts()
        ```
        
        ### The python script format to preprocess the received words
        ```python
        # -*- coding: utf-8 -*-
        
        def init():
            """
            Initialization Method: Runs only once at application startup
            """
        
        def run(word, context):
            """
            Processing start method: starts when a word is extracted from text
            @word    word for processing
            @context sequence of previous words as an array
            """
            return word
        ```
        
        ### The python script format to define the word features
        ```python
        # -*- coding: utf-8 -*-
        
        def init():
            """
            Initialization Method: Runs only once at application startup
            """
        
        def run(token, word):
            """
            Processing start method: starts when a word is extracted from text
            @token word token name
            @word  word for processing
            """
            if token and (token == "<usa>"):
                if word and (word.lower() == "usa"): return "ok"
            elif token and (token == "<russia>"):
                if word and (word.lower() == "russia"): return "ok"
            return "no"
        ```
        
        ---
        
        ### Methods:
        - **setLogfile** - Method of set the file for log output
        - **setOOvFile** - Method set file for saving OOVs words
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setLogfile("./log.txt")
        >>> alm.setOOvFile("./oov.txt")
        ```
        
        ---
        
        ### Methods:
        - **perplexity** - Perplexity calculation
        - **pplConcatenate** - Method of combining perplexia
        - **pplByFiles** - Method for reading perplexity calculation by file or group of files
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> a = alm.perplexity("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
        >>> print(a.logprob)
        -8.238353
        >>> print(a.oovs)
        0
        >>> print(a.words)
        24
        >>> print(a.sentences)
        2
        >>> print(a.zeroprobs)
        7
        >>> print(a.ppl)
        2.135669866658319
        >>> print(a.ppl1)
        2.204269585673276
        >>> b = alm.pplByFiles("./text.txt")
        >>> c = alm.pplConcatenate(a, b)
        >>> print(c.ppl)
        7.384123548831112
        ```
        
        ### Description
        | Name      | Description                                                                 |
        |-----------|-----------------------------------------------------------------------------|
        | ppl       | The meaning of perplexity without considering the beginning of the sentence |
        | ppl1      | The meaning of perplexion taking into account the beginning of the sentence |
        | oovs      | Count of oov words                                                          |
        | words     | Count of words in sentence                                                  |
        | logprob   | Word sequence frequency                                                     |
        | sentences | Count of sequences                                                          |
        | zeroprobs | Count of zero probs                                                         |
        
        ---
        
        ### Methods:
        - **tokenization** - Method for breaking text into tokens
        
        ### Example:
        ```python
        >>> import alm
        >>> def tokensFn(word, context, reset, stop):
        ...     print(word, " => ", context)
        ...     return True
        ...
        >>> alm.switchAllowApostrophe()
        >>> alm.tokenization("«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie", tokensFn)
        «  =>  []
        On  =>  ['«']
        nous  =>  ['«', 'On']
        dit  =>  ['«', 'On', 'nous']
        qu'aujourd'hui  =>  ['«', 'On', 'nous', 'dit']
        c'est  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui"]
        le  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est"]
        cas  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le']
        ,  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas']
        encore  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',']
        faudra-t-il  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore']
        l'évaluer  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore', 'faudra-t-il']
        »  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore', 'faudra-t-il', "l'évaluer"]
        l'astronomie  =>  ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore', 'faudra-t-il', "l'évaluer", '»']
        ```
        
        ---
        
        ### Methods:
        - **fixUppers** - Method for correcting registers in the text
        - **fixUppersByFiles** - Method for correcting text registers in a text file
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.fixUppers("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
        'Неожиданно из подворотни в олега ударил яркий прожектор патрульный трактор??? С лязгом выкатился и остановился возле мальчика....'
        >>> alm.fixUppersByFiles("./corpus", "./result.txt", "txt")
        ```
        
        ---
        
        ### Methods:
        - **checkHypLat** - Hyphen and latin character search method
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.checkHypLat("Hello-World")
        (True, True)
        >>> alm.checkHypLat("Hello")
        (False, True)
        >>> alm.checkHypLat("Привет")
        (False, False)
        >>> alm.checkHypLat("так-как")
        (True, False)
        ```
        
        ---
        
        ### Methods:
        - **getUppers** - Method for extracting registers for each word
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.readLM('./lm.arpa')
        >>> alm.idw("Living")
        48384019276
        >>> alm.idw("in")
        2833
        >>> alm.idw("the")
        175734
        >>> alm.idw("USA")
        147770
        >>> alm.getUppers([48384019276, 2833, 175734, 147770])
        [1, 0, 0, 7]
        ```
        
        ---
        
        ### Methods:
        - **urls** - Method for extracting URL address coordinates in a string
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.urls("This website: example.com was designed with ...")
        {14: 25}
        >>> alm.urls("This website: https://a.b.c.example.net?id=52#test-1 was designed with ...")
        {14: 52}
        >>> alm.urls("This website: https://a.b.c.example.net?id=52#test-1 and 127.0.0.1 was designed with ...")
        {14: 52, 57: 66}
        ```
        
        ---
        
        ### Methods:
        - **roman2Arabic** - Method for translating Roman numerals to Arabic
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.roman2Arabic("XVI")
        16
        ```
        
        ---
        
        ### Methods:
        - **rest** - Method for correction and detection of words with mixed alphabets
        - **setSubstitutes** - Method for set letters to correct words from mixed alphabets
        - **getSubstitutes** - Method of extracting letters to correct words from mixed alphabets
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.setSubstitutes({'p':'р','c':'с','o':'о','t':'т','k':'к','e':'е','a':'а','h':'н','x':'х','b':'в','m':'м'})
        >>> alm.getSubstitutes()
        {'a': 'а', 'b': 'в', 'c': 'с', 'e': 'е', 'h': 'н', 'k': 'к', 'm': 'м', 'o': 'о', 'p': 'р', 't': 'т', 'x': 'х'}
        >>> str = "ПPИBETИК"
        >>> str.lower()
        'пpиbetик'
        >>> alm.rest(str)
        'приветик'
        ```
        
        ---
        
        ### Methods:
        - **setTokensDisable** - Method for set the list of forbidden tokens
        - **setTokensUnknown** - Method for set the list of tokens cast to 〈unk〉
        - **setTokenDisable** - Method for set the list of unidentifiable tokens
        - **setTokenUnknown** - Method of set the list of tokens that need to be identified as 〈unk〉
        - **getTokensDisable** - Method for retrieving the list of forbidden tokens
        - **getTokensUnknown** - Method for extracting a list of tokens reducible to 〈unk〉
        - **setAllTokenDisable** - Method for set all tokens as unidentifiable
        - **setAllTokenUnknown** - The method of set all tokens identified as 〈unk〉
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.idw("<date>")
        6
        >>> alm.idw("<time>")
        7
        >>> alm.idw("<abbr>")
        5
        >>> alm.idw("<math>")
        9
        >>> alm.setTokenDisable("date|time|abbr|math")
        >>> alm.getTokensDisable()
        {9, 5, 6, 7}
        >>> alm.setTokensDisable({6, 7, 5, 9})
        >>> alm.setTokenUnknown("date|time|abbr|math")
        >>> alm.getTokensUnknown()
        {9, 5, 6, 7}
        >>> alm.setTokensUnknown({6, 7, 5, 9})
        >>> alm.setAllTokenDisable()
        >>> alm.getTokensDisable()
        {2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18}
        >>> alm.setAllTokenUnknown()
        >>> alm.getTokensUnknown()
        {2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18}
        ```
        
        ---
        
        ### Methods:
        - **countAlphabet** - Method of obtaining the number of letters in the dictionary
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.getAlphabet()
        'abcdefghijklmnopqrstuvwxyz'
        >>> alm.countAlphabet()
        26
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.countAlphabet()
        59
        ```
        
        ---
        
        ### Methods:
        - **countBigrams** - Method get count bigrams
        - **countTrigrams** - Method get count trigrams
        - **countGrams** - Method get count N-gram by lm size
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.countBigrams("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
        12
        >>> alm.countTrigrams("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
        10
        >>> alm.size()
        3
        >>> alm.countGrams("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
        10
        >>> alm.idw("неожиданно")
        30444893210
        >>> alm.idw("из")
        4645
        >>> alm.idw("подворотни")
        7494072262
        >>> alm.idw("в")
        48
        >>> alm.idw("Олега")
        2431694341
        >>> alm.idw("ударил")
        54100711961
        >>> alm.countBigrams([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
        5
        >>> alm.countTrigrams([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
        4
        >>> alm.countGrams([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
        4
        ```
        
        ---
        
        ### Methods:
        - **arabic2Roman** - Convert arabic number to roman number
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.arabic2Roman(23)
        'XXIII'
        >>> alm.arabic2Roman("33")
        'XXXIII'
        ```
        
        ---
        
        ### Methods:
        - **setLocale** - Method set locale (Default: en_US.UTF-8)
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setLocale("ru_RU.UTF-8")
        ```
        
        ---
        
        ### Methods:
        - **setThreads** - Method for set the number of threads (0 - all threads)
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.setThreads(3)
        >>> a = alm.pplByFiles("./text.txt")
        >>> print(a.logprob)
        -48201.29481399994
        ```
        
        ---
        
        ### Methods:
        - **fti** - Method for removing the fractional part of a number
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.fti(5892.4892)
        5892489200000
        >>> alm.fti(5892.4892, 4)
        58924892
        ```
        
        ---
        
        ### Methods:
        - **context** - Method for assembling text context from a sequence
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.idw("неожиданно")
        30444893210
        >>> alm.idw("из")
        4645
        >>> alm.idw("подворотни")
        7494072262
        >>> alm.idw("в")
        48
        >>> alm.idw("Олега")
        2431694341
        >>> alm.idw("ударил")
        54100711961
        >>> alm.context([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
        'Неожиданно из подворотни в олега ударил'
        ```
        
        ---
        
        ### Methods:
        - **findByFiles** - Method search N-grams in a text file
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.debug)
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.findByFiles("./text.txt", "./result.txt")
        info: <s> Кукай
        сари кукай
        сари японские
        японские каллиграфы
        каллиграфы я
        я постоянно
        постоянно навещал
        навещал их
        их тайно
        тайно от
        от людей
        людей </s>
        
        
        info: <s> Неожиданно из
        Неожиданно из подворотни
        из подворотни в
        подворотни в Олега
        в Олега ударил
        Олега ударил яркий
        ударил яркий прожектор
        яркий прожектор патрульный
        прожектор патрульный трактор
        патрульный трактор
        
        <s> С лязгом
        С лязгом выкатился
        лязгом выкатился и
        выкатился и остановился
        и остановился возле
        остановился возле мальчика
        возле мальчика
        ```
        
        ---
        
        ### Methods:
        - **checkSequence** - Sequence Existence Method
        - **checkByFiles** - Method for checking if a sequence exists in a text file
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.debug)
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.checkSequence("Неожиданно из подворотни в олега ударил")
        (True, 0)
        >>> alm.checkSequence("<s> Сегодня сыграл и в Олега ударил яркий прожектор патрульный трактор с корпоративным сектором </s>")
        (True, 0)
        >>> alm.checkSequence("<s> Сегодня сыграл и в Олега ударил яркий прожектор патрульный трактор с корпоративным сектором </s>", True)
        (False, 0)
        >>> alm.checkSequence("<s> в Олега ударил яркий </s>")
        (True, 0)
        >>> alm.checkSequence("<s> в Олега ударил яркий </s>", True)
        (True, 0)
        >>> alm.checkSequence("от госсекретаря США")
        (True, 7)
        >>> alm.checkSequence("от госсекретаря США", True)
        (False, 0)
        >>> alm.idw("от")
        5586
        >>> alm.idw("госсекретаря")
        10074609004
        >>> alm.idw("США")\
        338449
        >>> alm.checkSequence([5586, 10074609004, 338449])
        (True, 7)
        >>> alm.checkSequence([5586, 10074609004, 338449], True)
        (False, 0)
        >>> alm.checkByFiles("./text.txt", "./result.txt")
        info: 1999 | YES | Какой-то период времени мы вообще не общались
        
        info: 2000 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.С лязгом выкатился и остановился возле мальчика.
        
        info: 2001 | YES | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
        
        info: 2002 | NO | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
        
        info: 2003 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
        
        info: 2004 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
        
        info: 2005 | YES | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
        
        info: 2006 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
        
        info: 2007 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
        
        All texts: 2007
        Exists texts: 1359
        Not exists texts: 648
        >>> alm.checkByFiles("./corpus", "./result.txt", False, "txt")
        info: 1999 | YES | Какой-то период времени мы вообще не общались
        
        info: 2000 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.С лязгом выкатился и остановился возле мальчика.
        
        info: 2001 | YES | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
        
        info: 2002 | NO | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
        
        info: 2003 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
        
        info: 2004 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
        
        info: 2005 | YES | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
        
        info: 2006 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
        
        info: 2007 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
        
        All texts: 2007
        Exists texts: 1359
        Not exists texts: 648
        >>> alm.checkByFiles("./corpus", "./result.txt", True, "txt")
        info: 2000 | NO | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
        
        info: 2001 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.С лязгом выкатился и остановился возле мальчика.
        
        info: 2002 | NO | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
        
        info: 2003 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
        
        info: 2004 | NO | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
        
        info: 2005 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
        
        info: 2006 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
        
        info: 2007 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
        
        All texts: 2007
        Exists texts: 0
        Not exists texts: 2007
        ```
        
        ---
        
        ### Methods:
        - **check** - String Check Method
        - **match** - String Matching Method
        - **setAbbr** - Method set abbreviation
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.setSubstitutes({'p':'р','c':'с','o':'о','t':'т','k':'к','e':'е','a':'а','h':'н','x':'х','b':'в','m':'м'})
        >>> alm.check("Дом-2", alm.check_t.home2)
        True
        >>> alm.check("Дом2", alm.check_t.home2)
        False
        >>> alm.check("Дом-2", alm.check_t.latian)
        False
        >>> alm.check("Hello", alm.check_t.latian)
        True
        >>> alm.check("прiвет", alm.check_t.latian)
        True
        >>> alm.check("Дом-2", alm.check_t.hyphen)
        True
        >>> alm.check("Дом2", alm.check_t.hyphen)
        False
        >>> alm.check("Д", alm.check_t.letter)
        True
        >>> alm.check("$", alm.check_t.letter)
        False
        >>> alm.check("-", alm.check_t.letter)
        False
        >>> alm.check("просtоквaшино", alm.check_t.similars)
        True
        >>> alm.match("my site http://example.ru, it's true", alm.match_t.url)
        True
        >>> alm.match("по вашему ip адресу 46.40.123.12 проводится проверка", alm.match_t.url)
        True
        >>> alm.match("мой адрес в формате IPv6: http://[2001:0db8:11a3:09d7:1f34:8a2e:07a0:765d]/", alm.match_t.url)
        True
        >>> alm.match("13-я", alm.match_t.abbr)
        True
        alm.match("13-я-й", alm.match_t.abbr)
        False
        alm.match("т.д", alm.match_t.abbr)
        True
        alm.match("т.п.", alm.match_t.abbr)
        True
        >>> alm.match("С.Ш.А.", alm.match_t.abbr)
        True
        >>> alm.setAbbr("сша")
        >>> alm.match("США", alm.match_t.abbr)
        True
        >>> alm.match("Hello", alm.match_t.latian)
        True
        >>> alm.match("прiвет", alm.match_t.latian)
        False
        >>> alm.match("23424", alm.match_t.number)
        True
        >>> alm.match("hello", alm.match_t.number)
        False
        >>> alm.match("23424.55", alm.match_t.number)
        False
        >>> alm.match("23424", alm.match_t.decimal)
        False
        >>> alm.match("23424.55", alm.match_t.decimal)
        True
        >>> alm.match("23424,55", alm.match_t.decimal)
        True
        >>> alm.match("-23424.55", alm.match_t.decimal)
        True
        >>> alm.match("+23424.55", alm.match_t.decimal)
        True
        >>> alm.match("+23424.55", alm.match_t.anumber)
        True
        >>> alm.match("15T-34", alm.match_t.anumber)
        True
        >>> alm.match("hello", alm.match_t.anumber)
        False
        >>> alm.match("hello", alm.match_t.allowed)
        True
        >>> alm.match("évaluer", alm.match_t.allowed)
        False
        >>> alm.match("13", alm.match_t.allowed)
        True
        >>> alm.match("Hello-World", alm.match_t.allowed)
        True
        >>> alm.match("Hello", alm.match_t.math)
        False
        >>> alm.match("+", alm.match_t.math)
        True
        >>> alm.match("=", alm.match_t.math)
        True
        >>> alm.match("Hello", alm.match_t.upper)
        True
        >>> alm.match("hello", alm.match_t.upper)
        False
        >>> alm.match("hellO", alm.match_t.upper)
        False
        >>> alm.match("a", alm.match_t.punct)
        False
        >>> alm.match(",", alm.match_t.punct)
        True
        >>> alm.match(" ", alm.match_t.space)
        True
        >>> alm.match("a", alm.match_t.space)
        False
        >>> alm.match("a", alm.match_t.special)
        False
        >>> alm.match("±", alm.match_t.special)
        True
        >>> alm.match("[", alm.match_t.isolation)
        True
        >>> alm.match("a", alm.match_t.isolation)
        False
        ```
        
        ---
        
        ### Methods:
        - **delInWord** - Method for delete letter in word
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.delInWord("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор??? с лязгом выкатился и остановился возле мальчика....", alm.wdel_t.punct)
        'неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор с лязгом выкатился и остановился возле мальчика'
        >>> alm.delInWord("hello-world-hello-world", alm.wdel_t.hyphen)
        'helloworldhelloworld'
        >>> alm.delInWord("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор??? с лязгом выкатился и остановился возле мальчика....", alm.wdel_t.broken)
        'неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор с лязгом выкатился и остановился возле мальчика'
        >>> alm.delInWord("«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie", alm.wdel_t.broken)
        'On nous dit quaujourdhui cest le cas encore faudra-t-il lvaluer lastronomie'
        ```
        
        ---
        
        ### Methods:
        - **countsByFiles** - Method for counting the number of n-grams in a text file
        
        ### Example:
        ```python
        >>> import alm
        >>> alm.setOption(alm.options_t.debug)
        >>> alm.setOption(alm.options_t.confidence)
        >>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
        >>> alm.readLM('./lm.arpa')
        >>> alm.countsByFiles("./text.txt", "./result.txt", 3)
        info: 0 | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
        
        info: 10 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
        
        info: 10 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
        
        info: 0 | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
        
        info: 10 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
        
        Counts 3grams: 471
        >>> alm.countsByFiles("./corpus", "./result.txt", 2, "txt")
        info: 19 | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
        
        info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
        
        info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
        
        info: 10 | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
        
        info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
        
        info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
        
        info: 27 | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
        
        Counts 2grams: 20270
        ```
        
        ### Description
        | N-gram size | Description         |
        |-------------|---------------------|
        | 1           | language model size |
        | 2           | bigram              |
        | 3           | trigram             |
        
Keywords: nlp,lm,alm,language-model
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: POSIX :: BSD :: FreeBSD
Requires: pybind11
Requires-Python: >=3.6
Description-Content-Type: text/markdown
