Metadata-Version: 2.1
Name: lshashpy3
Version: 0.0.6
Summary: A fast Python 3 implementation of locality sensitive hashing with persistance support.
Home-page: UNKNOWN
Author: Kay Zhu
Author-email: me@kayzhu.com
Maintainer: Loreto Parisi
Maintainer-email: loretoparisi@gmail.com
License: MIT License
Description: LSHash
        ======
        
        :Version: 0.0.6
        :Python: 3.7.7
        
        A fast Python implementation of locality sensitive hashing with persistance
        support.
        
        Based on original source code https://github.com/kayzhu/LSHash
        
        Highlights
        ==========
        
        - Python3 support
        - Load & save hash tables to local disk
        - Fast hash calculation for large amount of high dimensional data through the use of `numpy` arrays.
        - Built-in support for persistency through Redis.
        - Multiple hash indexes support.
        - Built-in support for common distance/objective functions for ranking outputs.
        
        Installation
        ============
        ``LSHash`` depends on the following libraries:
        
        - numpy
        - bitarray (if hamming distance is used as distance function)
        
        Optional
        - redis (if persistency through Redis is needed)
        
        To install from sources:
        
        .. code-block:: bash
        
            $ git clone https://github.com/loretoparisi/lshash.git
            $ python setup.py install
            
        To install from PyPI:
        
        .. code-block:: bash
        
            $ pip install lshashpy3
            $ python -c "import lshashpy3 as lshash; print(lshash.__version__);"
        
        Quickstart
        ==========
        To create 6-bit hashes for input data of 8 dimensions:
        
        .. code-block:: python
        
         # create 6-bit hashes for input data of 8 dimensions:
         lsh = LSHash(6, 8)
         
         # index vector
         lsh.index([2,3,4,5,6,7,8,9])
        
         # index vector and extra data
         lsh.index([10,12,99,1,5,31,2,3], extra_data="vec1")
         lsh.index([10,11,94,1,4,31,2,3], extra_data="vec2")
        
         # query a data point
         top_n = 1
         nn = lsh.query([1,2,3,4,5,6,7,7], num_results=top_n, distance_func="euclidean")
         print(nn)
        
         # unpack vector, extra data and vectorial distance
         top_n = 3
         nn = lsh.query([10,12,99,1,5,30,1,1], num_results=top_n, distance_func="euclidean")
            for ((vec,extra_data),distance) in nn:
                print(vec, extra_data, distance)
                
                
        To save hash table to disk:
        
        .. code-block:: python
        
         lsh = LSHash(hash_size=k, input_dim=d, num_hashtables=L,
             storage_config={ 'dict': None },
             matrices_filename='weights.npz', 
             hashtable_filename='hash.npz', 
             overwrite=True)
        
         lsh.index([10,12,99,1,5,31,2,3], extra_data="vec1")
         lsh.index([10,11,94,1,4,31,2,3], extra_data="vec2")
         lsh.save()
        
        To load hash table from disk and perform a query:
        
        .. code-block:: python
        
         lsh = LSHash(hash_size=k, input_dim=d, num_hashtables=L,
             storage_config={ 'dict': None },
             matrices_filename='weights.npz', 
             hashtable_filename='hash.npz', 
             overwrite=True)
        
         top_n = 3
         nn = lsh.query([10,12,99,1,5,30,1,1], num_results=top_n, distance_func="euclidean")
         print(nn)
        
        API
        ==============
        
        - To initialize a ``LSHash`` instance:
        
        .. code-block:: python
        
         k = 6 # hash size
         L = 5  # number of tables
         d = 8 # Dimension of Feature vector
         LSHash(hash_size=k, input_dim=d, num_hashtables=L,
            storage_config={ 'dict': None },
            matrices_filename='weights.npz', 
            hashtable_filename='hash.npz', 
            overwrite=True)
        
        parameters:
        
        ``hash_size``:
            The length of the resulting binary hash.
        ``input_dim``:
            The dimension of the input vector.
        ``num_hashtables = 1``:
            (optional) The number of hash tables used for multiple lookups.
        ``storage = None``:
            (optional) Specify the name of the storage to be used for the index
            storage. Options include "redis".
        ``matrices_filename = None``:
            (optional) Specify the path to the .npz file random matrices are stored
            or to be stored if the file does not exist yet
        ``hashtable_filename = None``:
            (optional) Specify the path to the .npz file hash table are stored
            or to be stored if the file does not exist yet
        ``overwrite = False``:
            (optional) Whether to overwrite the matrices file if it already exist
        
        - To index a data point of a given ``LSHash`` instance, e.g., ``lsh``:
        
        .. code-block:: python
        
            lsh.index(input_point, extra_data=None):
        
        parameters:
        
        ``input_point``:
            The input data point is an array or tuple of numbers of input_dim.
        ``extra_data = None``:
            (optional) Extra data to be added along with the input_point.
        
        - To query a data point against a given ``LSHash`` instance, e.g., ``lsh``:
        
        .. code-block:: python
        
            lsh.query(query_point, num_results=None, distance_func="euclidean"):
        
        parameters:
        
        ``query_point``:
            The query data point is an array or tuple of numbers of input_dim.
        ``num_results = None``:
            (optional) The number of query results to return in ranked order. By
            default all results will be returned.
        ``distance_func = "euclidean"``:
            (optional) Distance function to use to rank the candidates. By default
            "euclidean" distance function will be used. Distance function can be 
            "euclidean", "true_euclidean", "centred_euclidean", "cosine", "l1norm".
            
        
        - To save the hash table currently indexed:
        
        .. code-block:: python
        
            lsh.save():
        
        
        :Version: 0.0.6
        :Python: 3.7.7
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Requires: numpy
Requires: bitarray
Description-Content-Type: text/x-rst
