Metadata-Version: 2.4
Name: azstoragetorch
Version: 0.1.0
Summary: Azure Storage Connector for PyTorch
Author-email: Microsoft Corporation <ascl@microsoft.com>
License: MIT License
        
        Copyright (c) 2024 Microsoft Azure
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/Azure/azure-storage-for-pytorch
Project-URL: Issues, https://github.com/Azure/azure-storage-for-pytorch/issues
Project-URL: Repository, https://github.com/Azure/azure-storage-for-pytorch
Keywords: azure,pytorch
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: azure-identity<2
Requires-Dist: azure-storage-blob<=12.24.1,>=12.24.0
Requires-Dist: torch<3,>=2.6.0
Requires-Dist: typing-extensions<5,>=4.13.2
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: check-manifest; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: furo; extra == "dev"
Requires-Dist: sphinx-copybutton; extra == "dev"
Dynamic: license-file

# Azure Storage Connector for PyTorch (`azstoragetorch`)

The Azure Storage Connector for PyTorch (`azstoragetorch`) is a library that provides
seamless, performance-optimized integrations between [Azure Storage] and  [PyTorch].
Use this library to easily access and store data in Azure Storage while using PyTorch. The
library currently offers:

- [File-like object for saving and loading PyTorch models (i.e., checkpointing) with Azure Blob Storage][user guide checkpointing]
- [PyTorch datasets for loading data samples from Azure Blob Storage][user guide datasets]

## Documentation

For detailed documentation on `azstoragetorch`, we recommend visiting its
[official documentation]. It includes both a [user guide] and [API references]
for the project. Content in this README is scoped to a high-level overview of the
project and its GitHub repository policies.

## Getting started

### Prerequisites

- Python 3.9 or later installed
- Have an [Azure subscription] and an [Azure storage account]

### Installation

Install the library with [pip]:

```shell
pip install azstoragetorch
```

### Configuration

`azstoragetorch` should work without any explicit credential configuration.

`azstoragetorch` interfaces default to [`DefaultAzureCredential`][defaultazurecredential guide]
for credentials which automatically retrieves [Microsoft Entra ID tokens] based on
your current environment. For more information on using credentials with
`azstoragetorch`, see the [user guide][user guide configuration].


## Features
This section highlights core features of `azstoragetorch`. For more details, see the [user guide].

### Saving and loading PyTorch models (Checkpointing)
PyTorch [supports saving and loading trained models][pytorch checkpoint tutorial]
(i.e., checkpointing). The core PyTorch interfaces for saving and loading models are
[`torch.save()`][pytorch save] and [`torch.load()`][pytorch load] respectively.
Both of these functions accept a file-like object to be written to or read from.

`azstoragetorch` offers the [`azstoragetorch.io.BlobIO`][blobio reference] file-like
object class to save and load models directly to and from Azure Blob Storage when
using `torch.save()` and `torch.load()`:

```python
import torch
import torchvision.models  # Install separately: ``pip install torchvision``
from azstoragetorch.io import BlobIO

# Update URL with your own Azure Storage account and container name
CONTAINER_URL = "https://<my-storage-account-name>.blob.core.windows.net/<my-container-name>"

# Model to save. Replace with your own model.
model = torchvision.models.resnet18(weights="DEFAULT")

# Save trained model to Azure Blob Storage. This saves the model weights
# to a blob named "model_weights.pth" in the container specified by CONTAINER_URL.
with BlobIO(f"{CONTAINER_URL}/model_weights.pth", "wb") as f:
    torch.save(model.state_dict(), f)

# Load trained model from Azure Blob Storage.  This loads the model weights
# from the blob named "model_weights.pth" in the container specified by CONTAINER_URL.
with BlobIO(f"{CONTAINER_URL}/model_weights.pth", "rb") as f:
    model.load_state_dict(torch.load(f))
```

### PyTorch Datasets

PyTorch offers the [Dataset and DataLoader primitives][pytorch dataset tutorial] for
loading data samples. `azstoragetorch` provides implementations for both types
of PyTorch datasets, [map-style and iterable-style datasets][pytorch dataset types],
to load data samples from Azure Blob Storage:

- [`azstoragetorch.datasets.BlobDataset`][blobdataset reference] - [Map-style dataset][pytorch dataset map-style]
- [`azstoragetorch.datasets.IterableBlobDataset`][iterableblobdataset reference] - [Iterable-style dataset][pytorch dataset iterable-style]

Data samples returned from both datasets map directly one-to-one to blobs in Azure Blob
Storage. When instantiating these dataset classes, use one of their class methods:

- `from_container_url()` - Instantiate dataset by listing blobs from an Azure Storage container.
- `from_blob_urls()` - Instantiate dataset from provided blob URLs


```python
from azstoragetorch.datasets import BlobDataset, IterableBlobDataset

# Update URL with your own Azure Storage account and container name
CONTAINER_URL = "https://<my-storage-account-name>.blob.core.windows.net/<my-container-name>"

# Create an iterable-style dataset by listing blobs in the container specified by CONTAINER_URL.
dataset = IterableBlobDataset.from_container_url(CONTAINER_URL)

# Print the first blob in the dataset. Default output is a dictionary with
# the blob URL and the blob data. Use `transform` keyword argument when
# creating dataset to customize output format.
print(next(iter(dataset)))

# List of blob URLs to create dataset from. Update with your own blob names.
blob_urls = [
    f"{CONTAINER_URL}/<blob-name-1>",
    f"{CONTAINER_URL}/<blob-name-2>",
    f"{CONTAINER_URL}/<blob-name-3>",
]

# Create a map-style dataset from the list of blob URLs
blob_list_dataset = BlobDataset.from_blob_urls(blob_urls)

print(blob_list_dataset[0])  # Print the first blob in the dataset
```
Once instantiated, `azstoragetorch` datasets can be provided directly to a PyTorch
[`DataLoader`][pytorch dataloader] for loading samples:

```python
from torch.utils.data import DataLoader

# Create a DataLoader to load data samples from the dataset in batches of 32
dataloader = DataLoader(dataset, batch_size=32)

for batch in dataloader:
    print(batch["url"])  # Prints blob URLs for each 32 sample batch
```

## Backwards compatibility

While the project is major version `0` (i.e., version is `0.x.y`), public interfaces are not stable. 
Backwards incompatible changes may be introduced between minor version bumps (e.g., upgrading from
`0.1.0` to `0.2.0`). If backwards compatibility is needed while using the library,
we recommend pinning to a specific minor version of the library (e.g., `azstoragetorch==0.1`).


## Contributing

This project welcomes contributions and suggestions.  Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a
CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided
by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

[azure storage]: https://learn.microsoft.com/azure/storage/common/storage-introduction
[azure storage account]: https://learn.microsoft.com/azure/storage/common/storage-account-overview
[azure subscription]: https://azure.microsoft.com/free/
[microsoft entra id tokens]: https://learn.microsoft.com/azure/storage/blobs/authorize-access-azure-active-directory
[defaultazurecredential guide]: https://learn.microsoft.com/azure/developer/python/sdk/authentication/credential-chains?tabs=dac#defaultazurecredential-overview

[pip]: https://pypi.org/project/pip/

[pytorch]: https://pytorch.org/
[pytorch checkpoint tutorial]: https://pytorch.org/tutorials/beginner/saving_loading_models.html
[pytorch save]: https://pytorch.org/docs/stable/generated/torch.save.html
[pytorch load]: https://pytorch.org/docs/stable/generated/torch.load.html
[pytorch dataloader]: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
[pytorch dataset iterable-style]: https://pytorch.org/docs/stable/data.html#iterable-style-datasets
[pytorch dataset map-style]: https://pytorch.org/docs/stable/data.html#map-style-datasets
[pytorch dataset tutorial]: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#datasets-dataloaders
[pytorch dataset types]: https://pytorch.org/docs/stable/data.html#dataset-types

[official documentation]: https://azure.github.io/azure-storage-for-pytorch/
[user guide]: https://azure.github.io/azure-storage-for-pytorch/user-guide.html
[user guide configuration]: https://azure.github.io/azure-storage-for-pytorch/user-guide.html#configuration
[user guide checkpointing]: https://azure.github.io/azure-storage-for-pytorch/user-guide.html#saving-and-loading-pytorch-models-checkpointing
[user guide datasets]: https://azure.github.io/azure-storage-for-pytorch/user-guide.html#pytorch-datasets
[api references]: https://azure.github.io/azure-storage-for-pytorch/api.html
[blobio reference]: https://azure.github.io/azure-storage-for-pytorch/api.html#azstoragetorch.io.BlobIO
[blobdataset reference]: https://azure.github.io/azure-storage-for-pytorch/api.html#azstoragetorch.datasets.BlobDataset
[iterableblobdataset reference]: https://azure.github.io/azure-storage-for-pytorch/api.html#azstoragetorch.datasets.IterableBlobDataset
