Metadata-Version: 1.1
Name: fuzzysearch
Version: 0.7.1
Summary: fuzzysearch is useful for finding approximate subsequence matches
Home-page: https://github.com/taleinat/fuzzysearch
Author: Tal Einat
Author-email: taleinat@gmail.com
License: MIT
Description: ===========
        fuzzysearch
        ===========
        
        .. image:: https://img.shields.io/pypi/v/fuzzysearch.svg?style=flat
            :target: https://pypi.python.org/pypi/fuzzysearch
            :alt: Latest Version
        
        .. image:: https://img.shields.io/travis/taleinat/fuzzysearch.svg?branch=master
            :target: https://travis-ci.org/taleinat/fuzzysearch/branches
            :alt: Build & Tests Status
        
        .. image:: https://img.shields.io/coveralls/taleinat/fuzzysearch.svg?branch=master
            :target: https://coveralls.io/r/taleinat/fuzzysearch?branch=master
            :alt: Test Coverage
        
        .. image:: https://img.shields.io/pypi/wheel/fuzzysearch.svg?style=flat
            :target: https://pypi.python.org/pypi/fuzzysearch
            :alt: Wheels
        
        .. image:: https://img.shields.io/pypi/pyversions/fuzzysearch.svg?style=flat
            :target: https://pypi.python.org/pypi/fuzzysearch
            :alt: Supported Python versions
        
        .. image:: https://img.shields.io/pypi/implementation/fuzzysearch.svg?style=flat
            :target: https://pypi.python.org/pypi/fuzzysearch
            :alt: Supported Python implementations
        
        .. image:: https://img.shields.io/pypi/l/fuzzysearch.svg?style=flat
            :target: https://pypi.python.org/pypi/fuzzysearch/
            :alt: License
        
        Fuzzy search: Find parts of long text or data, allowing for some
        changes/typos.
        
        **Easy, fast, and just works!**
        
        .. code:: python
        
            >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
            [Match(start=3, end=9, dist=1, matched="PATERN")]
        
        * Two simple functions to use: one for in-memory data and one for files
        
          * Fastest search algorithm is chosen automatically
        
        * Levenshtein Distance metric with configurable parameters
        
          * Separately configure the max. allowed distance, substitutions, deletions
            and/or insertions
        
        * Advanced algorithms with optional C and Cython optimizations
        
        * Properly handles Unicode; special optimizations for binary data
        
        * Simple installation:
           * ``pip install fuzzysearch`` just works
           * pure-Python fallbacks for compiled modules
           * only one dependency (``attrs``)
        
        * Extensively tested
        
        * Free software: `MIT license <LICENSE>`_
        
        For more info, see the `documentation <http://fuzzysearch.rtfd.org>`_.
        
        
        Installation
        ------------
        
        ``fuzzysearch`` supports Python versions 2.7 and 3.5+.
        
        .. code::
        
            $ pip install fuzzysearch
        
        This will work even if installing the C and Cython extensions fails, using
        pure-Python fallbacks.
        
        
        Usage
        -----
        Just call ``find_near_matches()`` with the sub-sequence you're looking for,
        the sequence to search, and the matching parameters:
        
        .. code:: python
        
            >>> from fuzzysearch import find_near_matches
            # search for 'PATTERN' with a maximum Levenshtein Distance of 1
            >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
            [Match(start=3, end=9, dist=1, matched="PATERN")]
        
        To search in a file, use ``find_near_matches_in_file()`` similarly:
        
        .. code:: python
        
            >>> from fuzzysearch import find_near_matches_in_file
            >>> with open('data_file', 'rb') as f:
            ...     find_near_matches_in_file(b'PATTERN', f, max_l_dist=1)
            [Match(start=3, end=9, dist=1, matched="PATERN")]
        
        
        Examples
        --------
        
        *fuzzysearch* is great for ad-hoc searches of genetic data, such as DNA or
        protein sequences, before reaching for "heavier", domain-specific tools like
        BioPython:
        
        .. code:: python
        
            >>> sequence = '''\
            GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
            TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
            CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
            GGGATAGG'''
            >>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
            >>> find_near_matches(subsequence, sequence, max_l_dist=2)
            [Match(start=3, end=24, dist=1, matched="TAGCACTGTAGGGATAACAAT")]
        
        BioPython sequences are also supported:
        
        .. code:: python
        
            >>> from Bio.Seq import Seq
            >>> from Bio.Alphabet import IUPAC
            >>> sequence = Seq('''\
            GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
            TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
            CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
            GGGATAGG''', IUPAC.unambiguous_dna)
            >>> subsequence = Seq('TGCACTGTAGGGATAACAAT', IUPAC.unambiguous_dna)
            >>> find_near_matches(subsequence, sequence, max_l_dist=2)
            [Match(start=3, end=24, dist=1, matched="TAGCACTGTAGGGATAACAAT")]
        
        
        Matching Criteria
        -----------------
        The search function supports four possible match criteria, which may be
        supplied in any combination:
        
        * maximum Levenshtein distance (``max_l_dist``)
        
        * maximum # of subsitutions
        
        * maximum # of deletions ("delete" = skip a character in the sub-sequence)
        
        * maximum # of insertions ("insert" = skip a character in the sequence)
        
        Not supplying a criterion means that there is no limit for it. For this reason,
        one must always supply ``max_l_dist`` and/or all other criteria.
        
        .. code:: python
        
            >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
            [Match(start=3, end=9, dist=1, matched="PATERN")]
        
            # this will not match since max-deletions is set to zero
            >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
            []
        
            # note that a deletion + insertion may be combined to match a substution
            >>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
            [Match(start=3, end=10, dist=1, matched="PAT-ERN")] # the Levenshtein distance is still 1
        
            # ... but deletion + insertion may also match other, non-substitution differences
            >>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
            [Match(start=3, end=10, dist=2, matched="PATERRN")]
        
        
        When to Use Other Tools
        -----------------------
        
        * Use case: Search through a list of strings for almost-exactly matching
          strings. For example, searching through a list of names for possible slight
          variations of a certain name.
        
          Suggestion: Consider using `fuzzywuzzy <https://github.com/seatgeek/fuzzywuzzy>`_.
        
        
        
        
        History
        -------
        
        0.7.1 (2020-04-05)
        ++++++++++++++++++
        * Dropped support for Python 3.4.
        * Removed deprecation warning with Python 3.8.
        * Fixed a couple of nasty bugs.
        
        0.7.0 (2020-01-14)
        ++++++++++++++++++
        
        * Added ``matched`` attribue to ``Match`` objects containing the matched part
          of the sequence.
        * Added support for CPython 3.8. Now supporting CPython 2.7 and 3.4-3.8.
        
        0.6.2 (2019-04-22)
        ++++++++++++++++++
        
        * Fix calling ``search_exact()`` without passing ``end_index``.
        * Fix edge case: max. dist >= sub-sequence length.
        
        0.6.1 (2018-12-08)
        ++++++++++++++++++
        
        * Fixed some C compiler warnings for the C and Cython modules
        
        0.6.0 (2018-12-07)
        ++++++++++++++++++
        
        * Dropped support for Python versions 2.6, 3.2 and 3.3
        * Added support and testing for Python 3.7
        * Optimized the n-grams Levenshtein search for long sub-sequences
        * Further optimized the n-grams Levenshtein search
        * Cython versions of the optimized parts of the n-grams Levenshtein search
        
        0.5.0 (2017-09-05)
        ++++++++++++++++++
        
        * Fixed ``search_exact_byteslike()`` to support supplying start and end indexes
        * Added support for lists, tuples and other Sequence types to ``search_exact()``
        * Fixed a bug where ``find_near_matches()`` could return a wrong ``Match.end``
          with ``max_l_dist=0``
        * Added more tests and improved some existing ones.
        
        0.4.0 (2017-07-06)
        ++++++++++++++++++
        
        * Added support and testing for Python 3.5 and 3.6
        * Many small improvements to README, setup.py and CI testing
        
        0.3.0 (2015-02-12)
        ++++++++++++++++++
        
        * Added C extensions for several search functions as well as internal functions
        * Use C extensions if available, or pure-Python implementations otherwise
        * setup.py attempts to build C extensions, but installs without if build fails
        * Added ``--noexts`` setup.py option to avoid trying to build the C extensions
        * Greatly improved testing and coverage
        
        0.2.2 (2014-03-27)
        ++++++++++++++++++
        
        * Added support for searching through BioPython Seq objects
        * Added specialized search function allowing only subsitutions and insertions
        * Fixed several bugs
        
        0.2.1 (2014-03-14)
        ++++++++++++++++++
        
        * Fixed major match grouping bug
        
        0.2.0 (2013-03-13)
        ++++++++++++++++++
        
        * New utility function ``find_near_matches()`` for easier use
        * Additional documentation
        
        0.1.0 (2013-11-12)
        ++++++++++++++++++
        
        * Two working implementations
        * Extensive test suite; all tests passing
        * Full support for Python 2.6-2.7 and 3.1-3.3
        * Bumped status from Pre-Alpha to Alpha
        
        0.0.1 (2013-11-01)
        ++++++++++++++++++
        
        * First release on PyPI.
Keywords: fuzzysearch
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Software Development :: Libraries :: Python Modules
