Metadata-Version: 2.1
Name: avalon-generator
Version: 1.0.0
Summary: Extendable scalable high-performance streaming test data generator
Home-page: https://github.com/admirito/avalon
Author: Mohammad Razavi, Mohammad Reza Moghaddas
Author-email: mrazavi64@gmail.com
License: GPLv3+
Keywords: test,data generation,fake data,simulation
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Education :: Testing
Classifier: Topic :: Software Development :: Testing :: Traffic Generation
Description-Content-Type: text/x-rst
Provides-Extra: all
Provides-Extra: grpc
Provides-Extra: kafka
Provides-Extra: soap
Provides-Extra: sql
License-File: LICENSE.org

Avalon
======

``Avalon`` is a extendable scalable high-performance streaming data
generator that can be used to simulate the real-time input for various
systems.

Installation
------------

To install ``avalon`` with all of its dependencies yon can use ``pip``:

.. code:: shell

   pip install avalon-generator[all]

Avalon supports a lot of command-line arguments, so you probably want to
enable its `argcomplete <https://github.com/kislyuk/argcomplete>`__
support for tab completion of arguments. Just run the following command
for a single use or add it to your ``~/.bashrc`` to preserve it for the
future uses:

.. code:: shell

   eval "$(avalon --completion-script=bash)"

Usage
-----

At the most simple from you can name a ``model`` as the command line
argument of ``avalon`` and it will produce data for the specified model
on the standard output. The following command uses the ==–rawlog==
shortcut to generate logs similar to `snort <https://www.snort.org/>`__
IDS:

.. code:: shell

   avalon snort --rawlog

Multiple models could be used at the same time. You can also see the
available models by the following command:

.. code:: shell

   avalon --list-models

The default output format (without ``--rawlog``) is ``json-lines`` which
output a JSON document on each line. Other formats like ``csv`` is also
supported. To see the supported formats you can use the ``--help``
argument and checkout the options for ``--output-format``, or just
enable auto-complete and press <tab> key to see the available options.

Besides ``--output-format``, the output media could also be specified
via ``--output-media``. A lot of output mediums like ``file``, ``http``,
``grpc``, ``kafka``, direct insert on ``sql`` databases are also
supported out of the box.

Also, the number and the rate of the outputs could be controlled via
``--number`` and ``--rate`` arguments.

For high rates, you might want to utilize your multiple CPU cores. To do
so, just prefix your model name with the number of instances you want to
run at the same time, e.g. ``10snort`` to run 10 ``snort`` instances
(with 10 Python processes that could utilize up to 10 CPU cores).

You can utilize multiple models at the same time. You can also provide a
ratio for the output of each model, e.g. ``10snort1000 5asa20``. That
means 10 instances of ``snort`` model and 5 instances of ``asa`` model
with the ratio 1000 output for ``snort`` producers to 20 for ``asa``
producers.

The other important parameter to archived high resource utilization is
by increasing the batch size by ``--batch-size`` argument.

Also, ``--output-writers`` argument determines the simultaneous writes
to the output media. So if your sink is a ``file`` or a ``http`` server
or any other forms of mediums that supports concurrent writes it is
advisable to increase ``--output-writers`` to gain higher utilization.

Here is an example that use multiple processes to write to a CSV file,
10000 items per second.

.. code:: shelll

   avalon 20snort 5asa \
       --batch-size=1000 --rate=10000 --number=1000000 --output-writers=25 \
       --output-format=headered-csv --output-media=file \
       --output-file-name=test.csv

Avalon command line supports many more options that you could explore
them via ``--help`` argument or auto-complete by pressing <tab> key in
the command line.

Architecture
------------

Avalon architecture consists of three main abstractions that give it
great flexibility:

Model
   Each model is responsible to generate a specific kind of data. For
   example a model might generate data similar to logs of a specific
   application or appliance while another model might generate network
   flows or packets.

   Model output is usually an unlimited iteration of Python
   dictionaries.

Format
   Each format (or formatter) is responsible for converting a batch of
   model data to a specific format, e.g. JSON or CSV.

   Format output is usually a string or bytes array, although other
   types could also be used according to the output media.

   Media
      Each media is responsible for transferring the batched formatted
      data to a specific data sink. For example it could write data to a
      file or send it to a remote server via network.

Avalon architecture is not limited to these abstractions. For example,
**Processor** is responsible for forking processes to run concurrent
data producers or **Mapping** classes could be defined to transform
model data for more flexibility.

Extension
---------

Models
~~~~~~

Avalon supports third-party models. So, you can develop your own models
to generate data for your specific use cases and you can also publish
them publicly. More information is available at `model extensions
README <./avalon/models/ext/README.org>`__.

Mappings
~~~~~~~~

Although models gives the user full flexibility to generate desired
data, but sometimes different data models are required for just a simple
transformation. For example one might want to use different key names in
a JSON or different column names in CSV or SQL database tables and
creating multiple models just for changing a key name might be
cumbersome.

A mapping could modify the model output dictionary before being used by
the formatter and it does not require the complex structure of a model.
Avalon supports a couple of useful mappings out of the box, but new
mappings could also be defined in a simple Python script without the
requirement for installation.

For example, the following script if put in a ``mymap.py`` file could be
used as a mapping:

.. code:: python

   # Any valid name for the class is acceptable.
   class MyMap:
       def map(self, item):
           # Item is the dictionary generated by the models

           # Rename "foo" key to "bar"
           item["bar"] = item.pop("foo", None)

           item["new"] = "a whole new key value"

           # Don't forget to reutrn the item
           return item

And now it could be passed to Avalon with ``--map`` as a URL:

.. code:: shell

   avalon --map=file:///path/to/mymap.py

Mappings could be very simple like the above example or they could be as
complex as one to regenerate the whole model data. Although as Avalon
supports passing multiple ``--map`` arguments and all the provided
mappings will be applied in the specified order, one particular useful
use-case is to define many simple mappings and combine them do achieve
the desired goal.

Also using curly braces you can pass a mapping to only a specific model
when combining multiple models. Here is an example:

.. code:: python

   # mymap.py will applied to the first snort, the internal jsoncolumn
   # mapping will be applied to asa and the last snort will be used
   # without any mappings.
   avalon "snort{file:///path/to/mymap.py} asa{jsoncolumn} snort"

Etymology
---------

The ``Avalan`` name is based on the name of a legendary island featured
in the Arthurian legend and it has nothing to do with the proprietary
`Spirent
Avalanche <https://www.spirent.com/products/avalanche-security-testing>`__
traffic generator.

Authors
-------

-  Mohammad Razavi
-  Mohammad Reza Moghaddas


