Metadata-Version: 2.1
Name: libhxl
Version: 4.21
Summary: Python support library for the Humanitarian Exchange Language (HXL). See http://hxlstandard.org and https://github.com/HXLStandard/libhxl-python
Home-page: http://hxlproject.org
Author: David Megginson
Author-email: megginson@un.org
License: UNKNOWN
Description: libhxl-python
        =============
        
        Python support library for the Humanitarian Exchange Language (HXL)
        data standard.  The library requires Python 3 (versions prior to 4.6
        also supported Python 2.7).
        
        About HXL: http://hxlstandard.org
        
        More-detailed documentation is available in the wiki: https://github.com/HXLStandard/libhxl-python/wiki
        
        
        # Usage
        
        The _hxl.data()_ function reads HXL from a file object, filename, URL,
        or list of arrays and makes it available for processing, much like
        ``$()`` in JQuery:
        
        ```
        import sys
        import hxl
        
        dataset = hxl.data(sys.stdin)
        ```
        
        You can add additional methods to process the data.  This example
        shows an identity transformation in a pipeline (See "Generators",
        below):
        
        ```
        for line in hxl.data(sys.stdin).gen_csv():
            print(line)
        ```
        
        This is the Same transformation, but loading the entire dataset into
        memory as an intermediate step (see "Filters", below):
        
        ```
        for line in hxl.data(sys.stdin).cache().gen_csv():
            print(line)
        ```
        
        
        ## Filters
        
        There are a number of filters that you can apply in a stream after a
        HXL dataset.  This example uses the _with_rows()_ filter to find every
        row that has a #sector of "WASH" and print the organisation mentioned
        in the row:
        
        ```
        for row in hxl.data(sys.stdin).with_rows('#sector=WASH'):
            print('The organisation is {}'.format(row.get('#org')))
        ```
        
        This example removes the WASH sector from the results, then counts the
        number of times each organisation appears in the remaining rows:
        
        ```
        url = 'http://example.org/data.csv'
        result = hxl.data(url).with_rows('#sector!=WASH').count('#org')
        ```
        
        The following filters are available:
        
        <table>
          <thead>
            <th>Filter method</th>
            <th>Description</th>
          </thead>
          <tbody>
            <tr>
              <td><code>.append(append_sources, add_columns=True, queries=[])</code></td>
              <td>Append a second HXL dataset to the current one, lining up columns.</td>
            </tr>
            <tr>
              <td><code>.cache()</code></td>
              <td>Cache an in-memory version of the dataset (for processing multiple times).</td>
            </tr>
            <tr>
              <td><code>.dedup(patterns=[], queries=[])</code></td>
              <td>Deduplicate the rows in a dataset, optionally looking only at specific columns.</td>
            </tr>
            <tr>
              <td><code>.with_columns(whitelist)</code></td>
              <td>Include only columns that match the whitelisted tag pattern(s), e.g. "#org+impl".</td>
            </tr>
            <tr>
              <td><code>.without_columns(blacklist)</code></td>
              <td>Include all columns <em>except</em> those that match the blacklisted tag patterns.</td>
            </tr>
            <tr>
              <td><code>.with_rows(queries, mask=[])</code></td>
              <td>Include only rows that match at least one of the queries, e.g. "#sector=WASH". Optionally ignore rows that don't match a mask pattern.</td>
            </tr>
            <tr>
              <td><code>.without_rows(queries, mask=[])</code></td>
              <td>Exclude rows that match at least one of the queries, e.g. "#sector=WASH". Optionally ignore rows that don't match a mask pattern.</td>
            </tr>
            <tr>
              <td><code>.sort(keys=None, reverse=False)</code></td>
              <td>Sort the rows, optionally using the pattern(s) provided as sort keys. Set _reverse_ to True for a descending sort.</td>
            </tr>
            <tr>
              <td><code>.count(patterns=[], aggregators=None, queries=[])</code></td>
              <td>Count the number of value combinations that appear for the pattern(s), e.g. ['#sector', '#org']. Optionally perform other aggregations, such as sums or averages.</td>
            </tr>
            <tr>
              <td><code>.replace_data(original, replacement, pattern=None, use_regex=False, queries=[])</code></td>
              <td>Replace values in a HXL dataset.</td>
            </tr>
            <tr>
              <td><code>.replace_data_map(map_source, queries=[])</code></td>
              <td>Replace values in a HXL dataset using a replacement map in another HXL dataset.</td>
            </tr>
            <tr>
              <td><code>.add_columns(specs, before=False)</code></td>
              <td>Add columns with fixed values to the dataset, e.g. "Country#country=Kenya" to add a new column #country with the text header "Country" and the value "Kenya" in every row.</td>
            </tr>
            <tr>
              <td><code>.rename_columns(specs)</code></td>
              <td>Change the header text and HXL hashtags for one or more columns.</td>
            </tr>
            <tr>
              <td><code>.clean_data(whitespace=[], upper=[], lower=[], date=[], number=[], queries=[])</code></td>
              <td>Clean whitespace, normalise dates and numbers, etc., optionally limited to specific columns.</td>
            </tr>
            <tr>
              <td><code>.merge_data(merge_source, keys, tags, replace=False, overwrite=False, queries=[])</code></td>
              <td>Merge values horizontally from a second dataset, based on shared keys (like a SQL join).</td>
            </tr>
            <tr>
              <td><code>.explode(header_attribute='header', value_attribute='value')</code></td>
              <td>Explode a "wide" dataset into a "long" dataset, using the HXL +label attribute.</td>
            </tr>
          </tbody>
        </table>
        
        ## Sinks
        
        Sinks take a HXL stream and convert it into something that's not HXL.
        
        ### Validation
        
        To validate a HXL dataset against a schema (also in HXL), use the ``validate`` sink:
        
        ```
        is_valid = hxl.data(url).validate('my-schema.csv')
        ```
        
        If you don't specify a schema, the library will use a simple, built-in schema:
        
        ```
        is_valid = hxl.data(url).validate()
        ```
        
        If you include a callback, you can collect details about the errors and warnings:
        
        ```
        def my_callback(error_info):
            # error_info is a HXLValidationException
            sys.stderr.write(error_info)
        
        is_valid = hxl.data(url).validate(schema='my-schema.csv', callback=my_callback)
        ```
        
        ### Generators
        
        Generators allow the re-serialising of HXL data, returning something that works like an iterator.  Example:
        
        ```
        for line in hxl.data(url).gen_csv():
            print(line)
        ```
        
        The following generators are available (you can use the parameters to turn the text headers and HXL tags on or off):
        
        <table>
          <thead>
            <th>Generator method</th>
            <th>Description</th>
          </thead>
          <tbody>
            <tr>
              <td><code>Dataset.gen_raw(show_headers=True, show_tags=True)</code></td>
              <td>Generate arrays of strings, one row at a time.</td>
            </tr>
            <tr>
              <td><code>Dataset.gen_csv(show_headers=True, show_tags=True)</code></td>
              <td>Generate encoded CSV rows, one row at a time.</td>
            </tr>
            <tr>
              <td><code>Dataset.gen_json(show_headers=True, show_tags=True)</code></td>
              <td>Generate encoded JSON rows, one row at a time.</td>
            </tr>
          </tbody>
        </table>
        
        
        ## Caching
        
        libhxl uses the Python
        [requests](http://docs.python-requests.org/en/master/) library for
        opening URLs. If you want to enable caching (for example, to avoid
        beating up on your source with repeated requests), your code can use
        the [requests_cache](https://pypi.python.org/pypi/requests-cache)
        plugin, like this:
        
            import requests_cache
            requests_cache.install_cache('demo_cache', expire_after=3600)
        
        The default caching backend is a sqlite database at the location specied.
        
        
        # Installation
        
        This repository includes a standard Python `setup.py` script for
        installing the library and scripts (applications) on your system. In a
        Unix-like operating system, you can install using the following
        command:
        
        ```
        python setup.py install
        ```
        
        If you don't need to install from source, try simply
        
        ```
        pip install libhxl
        ```
        
        Once you've installed, you will be able to include the HXL libraries
        from any Python application, and will be able to call scripts like
        _hxlvalidate_ from the command line.
        
        
        # Makefile
        
        There is also a generic Makefile that automates many tasks, including
        setting up a Python virtual environment for testing. The Python3 venv
        module is required for most of the targets.
        
        
        ```
        make build-venv
        ```
        
        Set up a local Python virtual environment for testing, if it doesn't
        already exist. Will recreate the virtual environment if setup.py has
        changed.
        
        ```
        make test
        ```
        
        Set up a virtual environment (if missing) and run all the unit tests
        
        ```
        make test-install
        ```
        
        Test a clean installation to verify there are no missing dependencies,
        etc.
        
        ```
        make close-issue
        ```
        
        Merge the current git issue branch into the dev branch and delete the
        issue branch.
        
        ```
        make push-dev
        ```
        
        Push the git dev branch to upstream.
        
        ```
        make merge-test
        ```
        
        Merge the git dev branch into the test branch and push to upstream.
        
        ```
        make merge-master
        ```
        
        Merge the git test branch into the master branch and push to upstream.
        
        ```
        make etags
        ```
        
        (Re)build the TAGS file that Emacs uses for global search and replace.
        
Platform: UNKNOWN
Description-Content-Type: text/markdown
