Metadata-Version: 2.1
Name: rebulk
Version: 3.0.1
Summary: Rebulk - Define simple search patterns in bulk to perform advanced matching on any string.
Home-page: https://github.com/Toilal/rebulk/
Author: Rémi Alvergnat
Author-email: toilal.dev@gmail.com
License: MIT
Download-URL: https://pypi.python.org/packages/source/r/rebulk/rebulk-3.0.1.tar.gz
Description: ReBulk
        ======
        
        [![Latest Version](http://img.shields.io/pypi/v/rebulk.svg)](https://pypi.python.org/pypi/rebulk)
        [![MIT License](http://img.shields.io/badge/license-MIT-blue.svg)](https://pypi.python.org/pypi/rebulk)
        [![Build Status](https://img.shields.io/github/workflow/status/Toilal/rebulk/ci)](https://github.com/Toilal/rebulk/actions?query=workflow%3Aci)
        [![Coveralls](http://img.shields.io/coveralls/Toilal/rebulk.svg)](https://coveralls.io/r/Toilal/rebulk?branch=master)
        [![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/relekang/python-semantic-release)
        
        
        ReBulk is a python library that performs advanced searches in strings
        that would be hard to implement using [re
        module](https://docs.python.org/3/library/re.html) or [String
        methods](https://docs.python.org/3/library/stdtypes.html#str) only.
        
        It includes some features like `Patterns`, `Match`, `Rule` that allows
        developers to build a custom and complex string matcher using a readable
        and extendable API.
        
        This project is hosted on GitHub: <https://github.com/Toilal/rebulk>
        
        Install
        =======
        
        ```sh
        $ pip install rebulk
        ```
        
        Usage
        =====
        
        Regular expression, string and function based patterns are declared in a
        `Rebulk` object. It use a fluent API to chain `string`, `regex`, and
        `functional` methods to define various patterns types.
        
        ```python
        >>> from rebulk import Rebulk
        >>> bulk = Rebulk().string('brown').regex(r'qu\w+').functional(lambda s: (20, 25))
        ```
        
        When `Rebulk` object is fully configured, you can call `matches` method
        with an input string to retrieve all `Match` objects found by registered
        pattern.
        
        ```python
        >>> bulk.matches("The quick brown fox jumps over the lazy dog")
        [<brown:(10, 15)>, <quick:(4, 9)>, <jumps:(20, 25)>]
        ```
        
        If multiple `Match` objects are found at the same position, only the
        longer one is kept.
        
        ```python
        >>> bulk = Rebulk().string('lakers').string('la')
        >>> bulk.matches("the lakers are from la")
        [<lakers:(4, 10)>, <la:(20, 22)>]
        ```
        
        String Patterns
        ===============
        
        String patterns are based on
        [str.find](https://docs.python.org/3/library/stdtypes.html#str.find)
        method to find matches, but returns all matches in the string.
        `ignore_case` can be enabled to ignore case.
        
        ```python
        >>> Rebulk().string('la').matches("lalalilala")
        [<la:(0, 2)>, <la:(2, 4)>, <la:(6, 8)>, <la:(8, 10)>]
        
        >>> Rebulk().string('la').matches("LalAlilAla")
        [<la:(8, 10)>]
        
        >>> Rebulk().string('la', ignore_case=True).matches("LalAlilAla")
        [<La:(0, 2)>, <lA:(2, 4)>, <lA:(6, 8)>, <la:(8, 10)>]
        ```
        
        You can define several patterns with a single `string` method call.
        
        ```python
        >>> Rebulk().string('Winter', 'coming').matches("Winter is coming...")
        [<Winter:(0, 6)>, <coming:(10, 16)>]
        ```
        
        Regular Expression Patterns
        ===========================
        
        Regular Expression patterns are based on a compiled regular expression.
        [re.finditer](https://docs.python.org/3/library/re.html#re.finditer)
        method is used to find matches.
        
        If [regex module](https://pypi.python.org/pypi/regex) is available, it
        can be used by rebulk instead of default [re
        module](https://docs.python.org/3/library/re.html). Enable it with `REBULK_REGEX_ENABLED=1` environment variable.
        
        ```python
        >>> Rebulk().regex(r'l\w').matches("lolita")
        [<lo:(0, 2)>, <li:(2, 4)>]
        ```
        
        You can define several patterns with a single `regex` method call.
        
        ```python
        >>> Rebulk().regex(r'Wint\wr', r'com\w{3}').matches("Winter is coming...")
        [<Winter:(0, 6)>, <coming:(10, 16)>]
        ```
        
        All keyword arguments from
        [re.compile](https://docs.python.org/3/library/re.html#re.compile) are
        supported.
        
        ```python
        >>> import re  # import required for flags constant
        >>> Rebulk().regex('L[A-Z]KERS', flags=re.IGNORECASE) \
        ...         .matches("The LaKeRs are from La")
        [<LaKeRs:(4, 10)>]
        
        >>> Rebulk().regex('L[A-Z]', 'L[A-Z]KERS', flags=re.IGNORECASE) \
        ...         .matches("The LaKeRs are from La")
        [<La:(20, 22)>, <LaKeRs:(4, 10)>]
        
        >>> Rebulk().regex(('L[A-Z]', re.IGNORECASE), ('L[a-z]KeRs')) \
        ...         .matches("The LaKeRs are from La")
        [<La:(20, 22)>, <LaKeRs:(4, 10)>]
        ```
        
        If [regex module](https://pypi.python.org/pypi/regex) is available, it
        automatically supports repeated captures.
        
        ```python
        >>> # If regex module is available, repeated_captures is True by default.
        >>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+').matches("01-02-03-04")
        >>> matches[0].children # doctest:+SKIP
        [<01:(0, 2)>, <02:(3, 5)>, <03:(6, 8)>, <04:(9, 11)>]
        
        >>> # If regex module is not available, or if repeated_captures is forced to False.
        >>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+', repeated_captures=False) \
        ...                   .matches("01-02-03-04")
        >>> matches[0].children
        [<01:(0, 2)+initiator=01-02-03-04>, <04:(9, 11)+initiator=01-02-03-04>]
        ```
        
        -   `abbreviations`
        
            Defined as a list of 2-tuple, each tuple is an abbreviation. It
            simply replace `tuple[0]` with `tuple[1]` in the expression.
        
            \>\>\> Rebulk().regex(r\'Custom-separators\',
            abbreviations=\[(\"-\", r\"\[W\_\]+\")\])\...
            .matches(\"Custom\_separators using-abbreviations\")
            \[\<Custom\_separators:(0, 17)\>\]
        
        Functional Patterns
        ===================
        
        Functional Patterns are based on the evaluation of a function.
        
        The function should have the same parameters as `Rebulk.matches` method,
        that is the input string, and must return at least start index and end
        index of the `Match` object.
        
        ```python
        >>> def func(string):
        ...     index = string.find('?')
        ...     if index > -1:
        ...         return 0, index - 11
        >>> Rebulk().functional(func).matches("Why do simple ? Forget about it ...")
        [<Why:(0, 3)>]
        ```
        
        You can also return a dict of keywords arguments for `Match` object.
        
        You can define several patterns with a single `functional` method call,
        and function used can return multiple matches.
        
        Chain Patterns
        ==============
        
        Chain Patterns are ordered composition of string, functional and regex
        patterns. Repeater can be set to define repetition on chain part.
        
        ```python
        >>> r = Rebulk().regex_defaults(flags=re.IGNORECASE)\
        ...             .defaults(children=True, formatter={'episode': int, 'version': int})\
        ...             .chain()\
        ...             .regex(r'e(?P<episode>\d{1,4})').repeater(1)\
        ...             .regex(r'v(?P<version>\d+)').repeater('?')\
        ...             .regex(r'[ex-](?P<episode>\d{1,4})').repeater('*')\
        ...             .close() # .repeater(1) could be omitted as it's the default behavior
        >>> r.matches("This is E14v2-15-16-17").to_dict()  # converts matches to dict
        MatchesDict([('episode', [14, 15, 16, 17]), ('version', 2)])
        ```
        
        Patterns parameters
        ===================
        
        All patterns have options that can be given as keyword arguments.
        
        -   `validator`
        
            Function to validate `Match` value given by the pattern. Can also be
            a `dict`, to use `validator` with pattern named with key.
        
            ```python
            >>> def check_leap_year(match):
            ...     return int(match.value) in [1980, 1984, 1988]
            >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
            ...                   .matches("In year 1982 ...")
            >>> len(matches)
            0
            >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
            ...                   .matches("In year 1984 ...")
            >>> len(matches)
            1
            ```
        
        Some base validator functions are available in `rebulk.validators`
        module. Most of those functions have to be configured using
        `functools.partial` to map them to function accepting a single `match`
        argument.
        
        -   `formatter`
        
            Function to convert `Match` value given by the pattern. Can also be
            a `dict`, to use `formatter` with matches named with key.
        
            ```python
            >>> def year_formatter(value):
            ...     return int(value)
            >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
            ...                   .matches("In year 1982 ...")
            >>> isinstance(matches[0].value, int)
            True
            ```
        
        -   `pre_match_processor` / `post_match_processor`
        
            Function to mutagen or invalidate a match generated by a pattern.
        
            Function has a single parameter which is the Match object. If
            function returns False, it will be considered as an invalid match.
            If function returns a match instance, it will replace the original
            match with this instance in the process.
        
        -   `post_processor`
        
            Function to change the default output of the pattern. Function
            parameters are Matches list and Pattern object.
        
        -   `name`
        
            The name of the pattern. It is automatically passed to `Match`
            objects generated by this pattern.
        
        -   `tags`
        
            A list of string that qualifies this pattern.
        
        -   `value`
        
            Override value property for generated `Match` objects. Can also be a
            `dict`, to use `value` with pattern named with key.
        
        -   `validate_all`
        
            By default, validator is called for returned `Match` objects only.
            Enable this option to validate them all, parent and children
            included.
        
        -   `format_all`
        
            By default, formatter is called for returned `Match` values only.
            Enable this option to format them all, parent and children included.
        
        -   `disabled`
        
            A `function(context)` to disable the pattern if returning `True`.
        
        -   `children`
        
            If `True`, all children `Match` objects will be retrieved instead of
            a single parent `Match` object.
        
        -   `private`
        
            If `True`, `Match` objects generated from this pattern are available
            internally only. They will be removed at the end of `Rebulk.matches`
            method call.
        
        -   `private_parent`
        
            Force parent matches to be returned and flag them as private.
        
        -   `private_children`
        
            Force children matches to be returned and flag them as private.
        
        -   `private_names`
        
            Matches names that will be declared as private
        
        -   `ignore_names`
        
            Matches names that will be ignored from the pattern output, after
            validation.
        
        -   `marker`
        
            If `true`, `Match` objects generated from this pattern will be
            markers matches instead of standard matches. They won\'t be included
            in `Matches` sequence, but will be available in `Matches.markers`
            sequence (see `Markers` section).
        
        Match
        =====
        
        A `Match` object is the result created by a registered pattern.
        
        It has a `value` property defined, and position indices are available
        through `start`, `end` and `span` properties.
        
        In some case, it contains children `Match` objects in `children`
        property, and each child `Match` object reference its parent in `parent`
        property. Also, a `name` property can be defined for the match.
        
        If groups are defined in a Regular Expression pattern, each group match
        will be converted to a single `Match` object. If a group has a name
        defined (`(?P<name>group)`), it is set as `name` property in a child
        `Match` object. The whole regexp match (`re.group(0)`) will be converted
        to the main `Match` object, and all subgroups (1, 2, \... n) will be
        converted to `children` matches of the main `Match` object.
        
        ```python
        >>> matches = Rebulk() \
        ...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)") \
        ...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
        >>> matches
        [<One, 1, Two, 2, Three, 3:(9, 33)>]
        >>> for child in matches[0].children:
        ...     '%s = %s' % (child.name, child.value)
        'one = 1'
        'two = 2'
        'three = 3'
        ```
        
        It\'s possible to retrieve only children by using `children` parameters.
        You can also customize the way structure is generated with `every`,
        `private_parent` and `private_children` parameters.
        
        ```python
        >>> matches = Rebulk() \
        ...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)", children=True) \
        ...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
        >>> matches
        [<1:(14, 15)+name=one+initiator=One, 1, Two, 2, Three, 3>, <2:(22, 23)+name=two+initiator=One, 1, Two, 2, Three, 3>, <3:(32, 33)+name=three+initiator=One, 1, Two, 2, Three, 3>]
        ```
        
        Match object has the following properties that can be given to Pattern
        objects
        
        -   `formatter`
        
            Function to convert `Match` value given by the pattern. Can also be
            a `dict`, to use `formatter` with matches named with key.
        
            ```python
            >>> def year_formatter(value):
            ...     return int(value)
            >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
            ...                   .matches("In year 1982 ...")
            >>> isinstance(matches[0].value, int)
            True
            ```
        
        -   `format_all`
        
            By default, formatter is called for returned `Match` values only.
            Enable this option to format them all, parent and children included.
        
        -   `conflict_solver`
        
            A `function(match, conflicting_match)` used to solve conflict.
            Returned object will be removed from matches by `ConflictSolver`
            default rule. If `__default__` string is returned, it will fallback
            to default behavior keeping longer match.
        
        Matches
        =======
        
        A `Matches` object holds the result of `Rebulk.matches` method call.
        It\'s a sequence of `Match` objects and it behaves like a list.
        
        All methods accepts a `predicate` function to filter `Match` objects
        using a callable, and an `index` int to retrieve a single element from
        default returned matches.
        
        It has the following additional methods and properties on it.
        
        -   `starting(index, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects that starts at given index.
        
        -   `ending(index, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects that ends at given index.
        
        -   `previous(match, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects that are previous and nearest to
            match.
        
        -   `next(match, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects that are next and nearest to
            match.
        
        -   `tagged(tag, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects that have the given tag defined.
        
        -   `named(name, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects that have the given name.
        
        -   `range(start=0, end=None, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects for given range, sorted from
            start to end.
        
        -   `holes(start=0, end=None, formatter=None, ignore=None, predicate=None, index=None)`
        
            Retrieves a list of *hole* `Match` objects for given range. A hole
            match is created for each range where no match is available.
        
        -   `conflicting(match, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects that conflicts with given match.
        
        -   `chain_before(self, position, seps, start=0, predicate=None, index=None)`:
        
            Retrieves a list of chained matches, before position, matching
            predicate and separated by characters from seps only.
        
        -   `chain_after(self, position, seps, end=None, predicate=None, index=None)`:
        
            Retrieves a list of chained matches, after position, matching
            predicate and separated by characters from seps only.
        
        -   `at_match(match, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects at the same position as match.
        
        -   `at_span(span, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects from given (start, end) tuple.
        
        -   `at_index(pos, predicate=None, index=None)`
        
            Retrieves a list of `Match` objects from given position.
        
        -   `names`
        
            Retrieves a sequence of all `Match.name` properties.
        
        -   `tags`
        
            Retrieves a sequence of all `Match.tags` properties.
        
        -   `to_dict(details=False, first_value=False, enforce_list=False)`
        
            Convert to an ordered dict, with `Match.name` as key and
            `Match.value` as value.
        
            It\'s a subclass of
            [OrderedDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict),
            that contains a `matches` property which is a dict with `Match.name`
            as key and list of `Match` objects as value.
        
            If `first_value` is `True` and distinct values are found for the
            same name, value will be wrapped to a list. If `False`, first value
            only will be kept and values lists can be retrieved with
            `values_list` which is a dict with `Match.name` as key and list of
            `Match.value` as value.
        
            if `enforce_list` is `True`, all values will be wrapped to a list,
            even if a single value is found.
        
            If `details` is True, `Match.value` objects are replaced with
            complete `Match` object.
        
        -   `markers`
        
            A custom `Matches` sequences specialized for `markers` matches (see
            below)
        
        Markers
        =======
        
        If you have defined some patterns with `markers` property, then
        `Matches.markers` points to a special `Matches` sequence that contains
        only `markers` matches. This sequence supports all methods from
        `Matches`.
        
        Markers matches are not intended to be used in final result, but can be
        used to implement a `Rule`.
        
        Rules
        =====
        
        Rules are a convenient and readable way to implement advanced
        conditional logic involving several `Match` objects. When a rule is
        triggered, it can perform an action on `Matches` object, like filtering
        out, adding additional tags or renaming.
        
        Rules are implemented by extending the abstract `Rule` class. They are
        registered using `Rebulk.rule` method by giving either a `Rule`
        instance, a `Rule` class or a module containing `Rule classes` only.
        
        For a rule to be triggered, `Rule.when` method must return `True`, or a
        non empty list of `Match` objects, or any other truthy object. When
        triggered, `Rule.then` method is called to perform the action with
        `when_response` parameter defined as the response of `Rule.when` call.
        
        Instead of implementing `Rule.then` method, you can define `consequence`
        class property with a Consequence classe or instance, like
        `RemoveMatch`, `RenameMatch` or `AppendMatch`. You can also use a list
        of consequence when required : `when_response` must then be iterable,
        and elements of this iterable will be given to each consequence in the
        same order.
        
        When many rules are registered, it can be useful to set `priority` class
        variable to define a priority integer between all rule executions
        (higher priorities will be executed first). You can also define
        `dependency` to declare another Rule class as dependency for the current
        rule, meaning that it will be executed before.
        
        For all rules with the same `priority` value, `when` is called before,
        and `then` is called after all.
        
        ```python
        >>> from rebulk import Rule, RemoveMatch
        
        >>> class FirstOnlyRule(Rule):
        ...     consequence = RemoveMatch
        ...
        ...     def when(self, matches, context):
        ...         grabbed = matches.named("grabbed", 0)
        ...         if grabbed and matches.previous(grabbed):
        ...             return grabbed
        
        >>> rebulk = Rebulk()
        
        >>> rebulk.regex("This match(.*?)grabbed", name="grabbed")
        <...Rebulk object ...>
        >>> rebulk.regex("if it's(.*?)first match", private=True)
        <...Rebulk object at ...>
        >>> rebulk.rules(FirstOnlyRule)
        <...Rebulk object at ...>
        
        >>> rebulk.matches("This match is grabbed only if it's the first match")
        [<This match is grabbed:(0, 21)+name=grabbed>]
        >>> rebulk.matches("if it's NOT the first match, This match is NOT grabbed")
        []
        ```
        
        
        Changelog
        =========
        
        <!--next-version-placeholder-->
        
        ## v3.0.1 (2020-12-25)
        ### Fix
        * **package:** Fix broken package `No such file or directory: 'CHANGELOG.md'` ([#24](https://github.com/Toilal/rebulk/issues/24)) ([`33895ff`](https://github.com/Toilal/rebulk/commit/33895ff358ff5051768fb98d4e840691e7af9bdf))
        
        ### Documentation
        * **readme:** Add semantic release badge ([`78baca0`](https://github.com/Toilal/rebulk/commit/78baca0c529083d7f583ffec58aeb23734d67ce5))
        * **readme:** Fix title ([`d5d4db5`](https://github.com/Toilal/rebulk/commit/d5d4db5cd7f6e2cb1308acd26bfb98838815fad4))
        
        ## v3.0.0 (2020-12-23)
        ### Feature
        * **regex:** Replace REGEX_DISABLED environment variable with REBULK_REGEX_ENABLED ([`d5a8cad`](https://github.com/Toilal/rebulk/commit/d5a8cad6281533ee549a46ca70e1a25e5777eda3))
        * Add python 3.8/3.9 support, drop python 2.7/3.4 support ([`048a15f`](https://github.com/Toilal/rebulk/commit/048a15f90833ba8d33ea84d56e9955d31b514dc3))
        
        ### Breaking
        * regex module is now disabled by default, even if it's available in the python interpreter. You have to set REBULK_REGEX_ENABLED=1 in your environment to enable it, as this module may cause some issues.  ([`d5a8cad`](https://github.com/Toilal/rebulk/commit/d5a8cad6281533ee549a46ca70e1a25e5777eda3))
        * Python 2.7 and 3.4 support have been dropped  ([`048a15f`](https://github.com/Toilal/rebulk/commit/048a15f90833ba8d33ea84d56e9955d31b514dc3))
        
Keywords: re regexp regular expression search pattern string match
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: dev
Provides-Extra: native
