Metadata-Version: 2.1
Name: hearbaseline
Version: 2021.0.1
Summary: Holistic Evaluation of Audio Representations (HEAR) 2021 -- Baseline Model
Home-page: https://github.com/neuralaudio/hear-baseline
Author: HEAR 2021 NeurIPS Competition Committee
Author-email: deep@neuralaudio.ai
License: Apache-2.0
Project-URL: Bug Tracker, https://github.com/neuralaudio/hear-baseline/issues
Project-URL: Source Code, https://github.com/neuralaudio/hear-baseline
Description: ![HEAR2021](https://neuralaudio.ai/assets/img/hear-header-sponsor.jpg)
        # HEAR 2021 Baseline
        
        A simple DSP-based audio embedding consisting of a Mel-frequency spectrogram followed
        by a random projection. Serves as the naive baseline model for the HEAR 2021 and implements
        the [common API](https://neuralaudio.ai/hear2021-holistic-evaluation-of-audio-representations.html#common-api)
        required by the competition evaluation.
        
        For full details on the HEAR 2021 NeurIPS competition and for information on how to
        participate, please visit the
        [competition website.](https://neuralaudio.ai/hear2021-holistic-evaluation-of-audio-representations.html)
        
        ### Installation
        
        **Method 1: pypi**
        ```python
        pip install hearbaseline
        ```
        
        **Method 2: pip local source tree**
        
        This is the same method that will be used to by competition organizers when installing
        submissions to HEAR 2021.
        ```python
        git clone https://github.com/neuralaudio/hear-baseline.git
        python3 -m pip install ./hear-baseline
        ```
        
        ### Naive Baseline Model
        The naive baseline model produces log-scaled Mel-frequency spectrograms using a
        256-band Mel filter. Each frame of the spectrogram is then projected to 4096
        dimensions using a random projection matrix. Weights for the projection matrix were
        generated by sampling a normal distribution and are stored in this repository in the
        file `saved_models/naive_baseline.pt`.
        
        Using a random projection is less efficient
        than a CNN but is one of the simplest models to implement from a coding perspective.
        
        ### Usage
        
        Audio embeddings can be computed using one of two methods: 1)
        `get_scene_embeddings`, or 2) `get_timestamp_embeddings`.
        
        `get_scene_embeddings` accepts a batch of audio clips and produces a single embedding
        for each audio clip. This can be computed like so:
        ```python
        import torch
        import hearbaseline
        
        # Load model with weights - located in the root directory of this repo
        model = hearbaseline.load_model("saved_models/naive_baseline.pt")
        
        # Create a batch of 2 white noise clips that are 2-seconds long
        # and compute scene embeddings for each clip
        audio = torch.rand((2, model.sample_rate * 2))
        embeddings = hearbaseline.get_scene_embeddings(audio, model)
        ```
        
        The `get_timestamp_embeddings` method works exactly the same but returns an array
        of embeddings computed every 25ms over the duration of the input audio. An array
        of timestamps corresponding to each embedding is also returned.
        
        See the [common API](https://neuralaudio.ai/hear2021-holistic-evaluation-of-audio-representations.html#common-api)
        for more details.
        
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: dev
