Metadata-Version: 2.1
Name: fswalk
Version: 1.3.4
Summary: An efficient multiprocessing directory walk and search tool
Home-page: https://github.com/bzizou/fs_walk
Author: Bruno Bzeznik
Author-email: Bruno.Bzeznik@univ-grenoble-alpes.fr
License: GNU GPL v3
Description: # fswalk
        An efficient multiprocessing directory walk and search tool
        
        ## Introduction
        fswalk is a simple python script that recursively walks through a filesystem 
        directory to gather files meta-data and collect them into a **json file** or
        an **Elasticsearch** database.
        It runs several processes, each responsible of doing the list of the files
        contained into a subdirectory.
        Collected meta-data are `filename, path, uid, gid, size` and `atime`.
        The output is either a json file sent on the fly to stdout, or an Elastisearch
        indexing. A simple search option is provided to retrieve files by their owner,
        group or a part of the name.
        
        The script aslo provides an option to do a quick analyze of the resulting
        output file.
        
        *warning*: When the results are sent to stdout, due to multiprocessing and not 
        to slow down the thing, the json file is printed with an extra `,` sign that might 
        break json compatibility.
        The `pyjson5` python library allows such non-standard json file to be read.
        
        ## Installation
        
        Requirements:
          - python >= 3.5
          - python packages: requests, pyjson5, elasticsearch
        
        Installing the current stable release:
        
        ```
        $ pip install fswalk
        ```
        
        Installing the latest devel snapshot:
        
        ```
        $ pip install git+https://github.com/bzizou/fs_walk.git
        ```
        
        ## Example
        
        Start a walk into the `/home/bzizou` directory with 8 process, excluding 
        the `.snapshot`subdirectory and getting the result as a gzipped json file:
        
        ```
        bzizou@f-dahu:~/git/fs_walk$ fswalk -p /home/bzizou -x '^/home/bzizou/\.snapshot/' -n 8 |gzip > /tmp/out.gz    
        ```
        
        Analyze the output from the resulting file:
        
        ```
        bzizou@f-dahu:~/git/fs_walk$ fswalk -a /tmp/out.gz
        User                                       Size            Count
        =================================================================
        bzizou                               2749804131            11125
        root                                 1030651826             1351
        1000                                  390705282              476
        11610                                    726417                7
        
        Group                                      Size            Count
        =================================================================
        realuser                             2749795275            11119
        root                                 1030660332             1356
        1000                                  390705282              476
        2222                                     726417                7
        staff                                       350                1
        
        TOTAL SIZE: 4171887656
        TOTAL FILES: 12959
        ```
        
        Same directory scan, but we index the results into an Elastisearch database:
        
        ```
        bzizou@f-dahu:~/git/fs_walk$ fswalk -p /home/bzizou -x '^/home/bzizou/\.snapshot/' -n 8 --elastic-host=http://localhost:9200 --elastic-index=fs_walk_home -g
        ```
        
        Do a search for all files with the "povray" string in their path name and belonging to the user which uid is 10000:
        
        ```
        bzizou@f-dahu:~/git/fs_walk$ fswalk --elastic-host=http://localhost:9200 --elastic-index=fs_walk_home --search="10000:*:povray:*"
        /home/bzizou/povray/OAR.cigri.14068.1251218.stderr
        /home/bzizou/povray/OAR.cigri.14068.1251220.stderr
        /home/bzizou/povray/OAR.cigri.14068.1251224.stderr
        /home/bzizou/povray/OAR.cigri.14068.1251231.stderr
        /home/bzizou/povray/OAR.cigri.14068.1251231.stdout
        /home/bzizou/povray/OAR.cigri.14068.1251233.stderr
        /home/bzizou/povray/OAR.cigri.14068.1251233.stdout
        /home/bzizou/povray/OAR.cigri.14068.1251234.stderr
        /home/bzizou/povray/OAR.cigri.14068.1251234.stdout
        /home/bzizou/povray/OAR.cigri.14068.1251237.stderr
        /home/bzizou/povray/OAR.cigri.14068.1251237.stdout
        /home/bzizou/povray/OAR.cigri.14068.1251238.stderr
        ```
        
        ## Usage
        ```
        Usage: fswalk [options]
        
        Options:
          -h, --help            show this help message and exit
          -p PATH, --path=PATH  Path to scan
          -n NPROC, --nproc=NPROC
                                Number of process to launch
          -x EXCLUDE_EXPR, --exclude=EXCLUDE_EXPR
                                Regular expression for path exclusion
          -a ANALYZE_FILE, --analyze=ANALYZE_FILE
                                Creates a summary based on a previously generated json
                                file
          -s SEARCH_STRING, --search=SEARCH_STRING
                                Search a subset of files with syntax:
                                [uid]:[gid]:[path_glob]:[hostname] (--analyze or
                                --elastic-host needed)
          --numeric             Output numeric uid/gid instead of names
          --hostname=HOSTNAME   Overwrite the value of the hostname string. Defaults
                                to local hostname.
          -e ELASTIC_HOST, --elastic-host=ELASTIC_HOST
                                Use an elasticsearch server for output. 'Ex:
                                http://localhost:9200'
          --elastic-index=ELASTIC_INDEX
                                Name of the elasticsearch index
          --elastic-bulk-size=MAX_BULK_SIZE
                                Size of the elastic indexing bulks
          -g, --elastic-purge-index
                                Purge the elasticsearch index before indexing
        ```
        
        The `ANALYZE_FILE` parameter may be a gzip compressed json file or a plain-text json file.
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Clustering
Classifier: Topic :: System :: Filesystems
Classifier: Topic :: Utilities
Requires-Python: >=3.5
Description-Content-Type: text/markdown
