Metadata-Version: 1.2
Name: hkmeans_minibatch
Version: 1.0.2
Summary: An implementation of hierarchical minibatch kmeans
Home-page: https://github.com/ammesatyajit/heirarchical-minibatch-kmeans
Author: Satyajit Kumar
Author-email: ammesatyajit@gmail.com
License: UNKNOWN
Description: Hierarchical Minibatch Kmeans
        =============================
        
        An implementation of hierarchical kmeans that uses mini-batches for
        increased efficiency for large datasets.
        
        Install
        =======
        ::
        
            pip3 install hkmeans-minibatch
        
        Usage
        =====
        
        ::
        
            $ python3 -m hkmeans_minibatch -h
            usage: __main__.py [-h] -r ROOT_FEATURE_PATH -p FEATURES_PREFIX [-b BATCH_SIZE] -s SAVE_DIR -c CENTROID_DIR -hr HIERARCHIES -k CLUSTERS [-e EPOCHS]
            optional arguments:
              -h, --help            show this help message and exit
              -r ROOT_FEATURE_PATH, --root-feature_path ROOT_FEATURE_PATH
                                    path to folder containing all the feature files
              -p FEATURES_PREFIX, --features-prefix FEATURES_PREFIX
                                    prefix that contains the desired files to read
              -b BATCH_SIZE, --batch-size BATCH_SIZE
                                    batch_size to use for the minibatch kmeans
              -s SAVE_DIR, --save-dir SAVE_DIR
                                    save directory for sorted hierarchical kmeans vectors
              -c CENTROID_DIR, --centroid-dir CENTROID_DIR
                                    directory to save the centroids in
              -hr HIERARCHIES, --hierarchies HIERARCHIES
                                    number of hierarchies to run the kmeans on
              -k CLUSTERS, --clusters CLUSTERS
                                    number of clusters for each part of the hierarchy
              -e EPOCHS, --epochs EPOCHS
                                    number of epochs to run the kmeans for each hierarchy
        
        Have the .npy files all in one root feature directory to do kmeans over (they can be in
        subdirectories). For optimal results have the batch size be larger than the number of
        vectors in each .npy file. The features prefix is the common prefix of
        the .npy files to kmeans over. The save directory should be an empty
        directory, which the program will fill with sorted vectors and delete
        after it is finished. The centroid directory should be an empty
        directory where all the centroids will be stored. Note that the
        centroids will be stored in separate files in the centroid directory.
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
