Metadata-Version: 2.1
Name: fastwlk
Version: 0.2.3
Summary: fastwlk is a Python package that implements a fast version of the Weisfeiler-Lehman kernel.
License: BSD-3 Clause License
Author: Philip Hartout
Author-email: philip.hartout@protonmail.com
Requires-Python: >=3.9,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: networkx (>=2.6.3,<3.0.0)
Requires-Dist: numpy (>=1.22.1,<2.0.0)
Requires-Dist: pandas (>=1.4.0,<2.0.0)
Requires-Dist: tqdm (>=4.62.3,<5.0.0)
Description-Content-Type: text/x-rst

=============================
FastWLK
=============================

.. image:: https://github.com/pjhartout/fastwlk/actions/workflows/main.yml/badge.svg
        :target: https://github.com/pjhartout/fastwlk/


.. image:: https://img.shields.io/pypi/v/fastwlk.svg
        :target: https://pypi.python.org/pypi/fastwlk


.. image:: https://codecov.io/gh/pjhartout/fastwlk/branch/main/graph/badge.svg?token=U054MJONED
      :target: https://codecov.io/gh/pjhartout/fastwlk

.. image:: https://img.shields.io/website-up-down-green-red/http/shields.io.svg
   :target: https://pjhartout.github.io/fastwlk/


Quick Links
-------------------------
`Documentation`_
`Installation`_
`Usage`_
`Contributing`_


What does ``fastwlk`` do?
-------------------------


``fastwlk`` is a Python package that implements a fast version of the
Weisfeiler-Lehman kernel. It manages to outperform current state-of-the-art
implementations on sparse graphs by implementing a number of improvements
compared to vanilla implementations:

1. It parallelizes the execution of Weisfeiler-Lehman hash computations since
   each graph's hash can be computed independently prior to computing the
   kernel.

2. It parallelizes the computation of similarity of graphs in RKHS by computing
   batches of the inner products independently.

3. On sparse graphs, lots of computations are spent processing positions/hashes
   that do not actually overlap between graph representations. As such, we
   manually loop over the overlapping keys, outperforming numpy dot
   product-based implementations.

This implementation works best when graphs have relatively few connections and
are reasonably dissimilar from one another. If you are not sure the graphs you
are using are either sparse or dissimilar enough, try to benchmark this package
with others out there.

How fast is ``fastwlk``?
-------------------------

Running the benchmark script in ``examples/benchmark.py`` shows that for the
graphs in ``data/graphs.pkl``, we get an approximately 80% speed improvement
over other implementations like `grakel`_.

To see how much faster this implementation is for your use case:

.. code-block:: console

   $ git clone git://github.com/pjhartout/fastwlk
   $ poetry install
   $ poetry run python examples/benchmark.py

You will need to swap out the provided graphs.pkl with with an iterable of graphs of your own.

.. _Documentation: https://pjhartout.github.io/fastwlk/
.. _Installation: https://pjhartout.github.io/fastwlk/installation.html
.. _Usage: https://pjhartout.github.io/fastwlk/usage.html
.. _Contributing: https://pjhartout.github.io/fastwlk/contributing.html
.. _grakel: https://github.com/ysig/GraKeL

