Metadata-Version: 2.1
Name: argilla
Version: 1.16.0
Summary: Open-source tool for exploring, labeling, and monitoring data for NLP projects.
Author-email: argilla <contact@argilla.io>
Maintainer-email: argilla <contact@argilla.io>
License: Apache-2.0
Project-URL: homepage, https://www.argilla.io
Project-URL: documentation, https://docs.argilla.io
Project-URL: repository, https://github.com/argilla-io/argilla
Keywords: data-science,natural-language-processing,text-labeling,data-annotation,artificial-intelligence,knowledged-graph,developers-tools,human-in-the-loop,mlops
Requires-Python: <3.12,>=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx<0.24,>=0.15
Requires-Dist: deprecated~=1.2.0
Requires-Dist: packaging>=20.0
Requires-Dist: pandas<2.0.0,>=1.0.0
Requires-Dist: pydantic<2.0,>=1.10.7
Requires-Dist: wrapt<1.15,>=1.13
Requires-Dist: numpy<1.24.0
Requires-Dist: tqdm>=4.27.0
Requires-Dist: backoff
Requires-Dist: monotonic
Requires-Dist: rich!=13.1.0
Requires-Dist: typer<0.8.0,>=0.6.0
Provides-Extra: server
Requires-Dist: fastapi>=0.103.1; extra == "server"
Requires-Dist: opensearch-py~=2.0.0; extra == "server"
Requires-Dist: elasticsearch8[async]~=8.7.0; extra == "server"
Requires-Dist: uvicorn[standard]<0.21.0,>=0.15.0; extra == "server"
Requires-Dist: smart-open; extra == "server"
Requires-Dist: brotli-asgi<1.3,>=1.1; extra == "server"
Requires-Dist: alembic~=1.9.0; extra == "server"
Requires-Dist: SQLAlchemy~=2.0.0; extra == "server"
Requires-Dist: greenlet>=2.0.0; extra == "server"
Requires-Dist: aiosqlite>=0.19.0; extra == "server"
Requires-Dist: luqum<0.13,>=0.11; extra == "server"
Requires-Dist: scikit-learn>=0.24.2; extra == "server"
Requires-Dist: aiofiles<22.2,>=0.6; extra == "server"
Requires-Dist: PyYAML<6.1.0,>=5.4.1; extra == "server"
Requires-Dist: python-multipart~=0.0.5; extra == "server"
Requires-Dist: python-jose[cryptography]<3.4,>=3.2; extra == "server"
Requires-Dist: passlib[bcrypt]~=1.7.4; extra == "server"
Requires-Dist: psutil<5.10,>=5.8; extra == "server"
Requires-Dist: segment-analytics-python==2.2.0; extra == "server"
Provides-Extra: postgresql
Requires-Dist: psycopg2~=2.9.5; sys_platform != "darwin" and extra == "postgresql"
Requires-Dist: psycopg2-binary~=2.9.5; sys_platform == "darwin" and extra == "postgresql"
Requires-Dist: asyncpg>=0.27.0; extra == "postgresql"
Provides-Extra: listeners
Requires-Dist: schedule~=1.1.0; extra == "listeners"
Requires-Dist: prodict~=0.8.0; extra == "listeners"
Provides-Extra: integrations
Requires-Dist: PyYAML<6.1.0,>=5.4.1; extra == "integrations"
Requires-Dist: cleanlab~=2.0.0; extra == "integrations"
Requires-Dist: datasets!=2.3.2,>1.17.0; extra == "integrations"
Requires-Dist: huggingface_hub<0.13,>=0.5.0; extra == "integrations"
Requires-Dist: flair>=0.12.2; extra == "integrations"
Requires-Dist: faiss-cpu; extra == "integrations"
Requires-Dist: flyingsquid; extra == "integrations"
Requires-Dist: pgmpy; extra == "integrations"
Requires-Dist: plotly>=4.1.0; extra == "integrations"
Requires-Dist: snorkel>=0.9.7; extra == "integrations"
Requires-Dist: spacy==3.5.3; extra == "integrations"
Requires-Dist: spacy-transformers>=1.2.5; extra == "integrations"
Requires-Dist: transformers[torch]>=4.30.0; extra == "integrations"
Requires-Dist: evaluate; extra == "integrations"
Requires-Dist: seqeval; extra == "integrations"
Requires-Dist: sentence-transformers; extra == "integrations"
Requires-Dist: setfit; extra == "integrations"
Requires-Dist: span_marker; extra == "integrations"
Requires-Dist: openai>=0.27.10; extra == "integrations"
Requires-Dist: peft; extra == "integrations"
Requires-Dist: trl>=0.5.0; extra == "integrations"
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Requires-Dist: pytest-mock; extra == "tests"
Requires-Dist: pytest-asyncio; extra == "tests"
Requires-Dist: factory_boy~=3.2.1; extra == "tests"


<h1 align="center">
  <a href=""><img src="https://github.com/dvsrepo/imgs/raw/main/rg.svg" alt="Argilla" width="150"></a>
  <br>
  ✨ Argilla ✨
  <br>
</h1>
<p align="center">
<a  href="https://pypi.org/project/argilla/">
<img  alt="CI"  src="https://img.shields.io/pypi/v/argilla.svg?style=flat-square&logo=pypi&logoColor=white">
</a>
<!--a  href="https://anaconda.org/conda-forge/rubrix">
<img  alt="CI"  src="https://img.shields.io/conda/vn/conda-forge/rubrix?logo=anaconda&style=flat&color=orange">
</!a-->
<img alt="Codecov" src="https://codecov.io/gh/argilla-io/argilla/branch/main/graph/badge.svg?token=VDVR29VOMG"/>
<a href="https://pepy.tech/project/argilla">
<img  alt="CI"  src="https://static.pepy.tech/personalized-badge/argilla?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month">
</a>
<a  href="https://huggingface.co/new-space?template=argilla/argilla-template-space">
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-sm.svg" />
</a>
</p>

<h2 align="center">Open-source data curation platform for LLMs</h2>
<br>


https://github.com/argilla-io/argilla/assets/1107111/49e28d64-9799-4cac-be49-19dce0f6bd86

<p align="center">
<a  href="https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g">
<img src="https://img.shields.io/badge/JOIN US ON SLACK-4A154B?style=for-the-badge&logo=slack&logoColor=white" />
</a>
<a href="https://linkedin.com/company/argilla-io">
<img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" />
</a>
<a  href="https://twitter.com/argilla_io">
<img src="https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white" />
</a>
</p>

<br>

<h3>
<p align="center">
<a href="https://docs.argilla.io">📄 Documentation</a> | </span>
<a href="#-quickstart">🚀 Quickstart</a> <span> | </span>
<a href="#-cheatsheet">🎼 Cheatsheet</a> <span> | </span>
<a href="#-project-architecture">🛠️ Architecture</a> <span> | </span>
<a href="#-contribute">🫱🏾‍🫲🏼 Contribute</a>
</p>
</h3>

## 🚀 Quickstart

Argilla is an open-source data curation platform for LLMs. Using Argilla, everyone can build robust language models through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, from data labeling to model monitoring.

There are different options to get started:

1. Take a look at our [quickstart page](https://docs.argilla.io/en/latest/getting_started/quickstart.html) 🚀

2. Start contributing by looking at our [contributor guidelines](#🫱🏾‍🫲🏼-contribute) 🫱🏾‍🫲🏼

3. Skip some steps with our [cheatsheet](#🎼-cheatsheet) 🎼

## 🎼 Cheatsheet

<h3><a href="https://docs.argilla.io/en/latest/getting_started/installation/deployments/python.html"> Python package</a></h3>


```bash
pip install argilla
```

<hr>

<h3><a href="https://docs.argilla.io/en/latest/getting_started/installation/deployments/docker-quickstart.html"> Deploy Locally</a></h3>


```bash
docker run -d --name argilla -p 6900:6900 argilla/argilla-quickstart:latest
```

<hr>
<h3><a href="https://argilla.io/blog/launching-argilla-huggingface-hub/">Deploy on Hugging Face Hub</a></h3>
HuggingFace Spaces now have persistent storage and this is supported from Argilla 1.11.0 onwards, but you will need to manually activate it via the HuggingFace Spaces settings. Otherwise, unless you're on a paid space upgrade, after 48 hours of inactivity the space will be shut off and you will lose all the data. To avoid losing data, we highly recommend using the persistent storage layer offered by HuggingFace.
<a href="https://argilla.io/blog/launching-argilla-huggingface-hub/"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/spaces-argilla-embed-space.png" width="100%"></a>

<hr>
<h3><a href="https://docs.argilla.io/en/latest/guides/guides/llms/conceptual_guides/conceptual_guides.html">LLM support</a></h3>

```python
import argilla as rg

dataset = rg.FeedbackDataset(
    guidelines="Please, read the question carefully and try to answer it as accurately as possible.",
    fields=[
        rg.TextField(name="question"),
        rg.TextField(name="answer"),
    ],
    questions=[
        rg.RatingQuestion(
            name="answer_quality",
            description="How would you rate the quality of the answer?",
            values=[1, 2, 3, 4, 5],
        ),
        rg.TextQuestion(
            name="answer_correction",
            description="If you think the answer is not accurate, please, correct it.",
            required=False,
        ),
    ]
)
```

<a href="https://docs.argilla.io/en/latest/guides/guides/llms/conceptual_guides/conceptual_guides.html"><img src="https://docs.argilla.io/en/latest/_images/snapshot-feedback-demo.png" width="100%"></a>

<hr>
<h3><a href="https://docs.argilla.io/en/latest/guides/log_load_and_prepare_data.html#Argilla-Records">Create Records</a></h3>


```python
import argilla as rg

rec = rg.TextClassificationRecord(
    text="Sun Is Closer... a parachute.",
    prediction=[("Sci/Tech", 0.75), ("World", 0.25)],
    annotation="Sci/Tech"
)
rg.log(records=record, name="news")
```

<a href="https://docs.argilla.io/en/latest/guides/log_load_and_prepare_data.html#Argilla-Records"><img src="https://docs.argilla.io/en/latest/_images/features-annotate.png" width="100%"></a>

<hr>
<h3><a href="https://docs.argilla.io/en/latest/guides/query_datasets.html">Query datasets</a></h3>


```python
import argilla as rg

rg.load(name="news", query="text:spor*")
```

<a href="https://docs.argilla.io/en/latest/guides/query_datasets.html"><img src="https://docs.argilla.io/en/latest/_images/features-search.png" width="100%">

<hr>
<h3><a href="https://docs.argilla.io/en/latest/guides/label_records_with_semanticsearch.html">Semantic search</a></h3>

```python
import argilla as rg

record = rg.TextClassificationRecord(
    text="Hello world, I am a vector record!",
    vectors= {"my_vector_name": [0, 42, 1984]}
)
rg.log(name="dataset", records=record)
rg.load(name="dataset", vector=("my_vector_name", [0, 43, 1985]))
```

<a href="https://docs.argilla.io/en/latest/guides/label_records_with_semanticsearch.html"><img src="https://docs.argilla.io/en/latest/_images/features-similaritysearch.png" width="100%"></a>

<hr>
<h3><a href="https://docs.argilla.io/en/latest/guides/programmatic_labeling_with_rules.html">Weak supervision</a></h3>


```python
from argilla.labeling.text_classification import add_rules, Rule

rule = Rule(query="positive impact", label="optimism")
add_rules(dataset="go_emotion", rules=[rule])
```

<a href="https://docs.argilla.io/en/latest/guides/programmatic_labeling_with_rules.html"><img src="https://docs.argilla.io/en/latest/_images/features-weak-labelling.png" width="100%"></a>

<!-- <tr>
<td>
<a href="https://argilla.io/blog/introducing-argilla-trainer">Active Learning</a>
</td>
<td>

```python
from argilla_plugins import classy_learner

plugin = classy_learner(name="plugin-test")
plugin.start()
```

<video src="https://share.descript.com/view/nvlUjF8tNcZ"/>
</td>
</tr> -->

<hr>
<h3><a href="https://argilla.io/blog/introducing-argilla-trainer">Train models</a></h3>

```python
from argilla.training import ArgillaTrainer

trainer = ArgillaTrainer(name="news", workspace="recognai", framework="setfit")
trainer.train()
```

<a href="https://argilla.io/blog/introducing-argilla-trainer"><img src="https://argilla.io/blog/introducing-argilla-trainer/train.png" width="100%"></a>

## 🛠️ Project Architecture

Argilla is built on 5 core components:

- **Python SDK**: A Python SDK which is installable with `pip install argilla`. To interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration and annotation workflows.
- **FastAPI Server**: The core of Argilla is a *Python FastAPI* server that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.
- **Relational Database**: A relational database to store the metadata of the records and the annotations. *SQLite* is used as the default built-in option and is deployed separately with the Argilla Server but a separate *PostgreSQL* can be used too.
- **Vector Database**: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support *ElasticSearch* and *AWS OpenSearch* and they can be deployed as separate Docker images.
- **Vue.js UI**: A web application to visualize and annotate your data, users and teams. It is built with *Vue.js* and is directly deployed alongside the Argilla Server within our Argilla Docker image.

## 📏 Principles
-  **Open**: Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can **use and combine your preferred libraries** without implementing any specific interface.



-  **End-to-end**: Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions and collect more data to improve your model over time. Argilla is designed to close this gap, enabling you to **iterate as much as you need**.



-  **User and Developer Experience**: The key to sustainable NLP solutions are to make it easier for everyone to contribute to projects. _Domain experts_ should feel comfortable interpreting and annotating data. _Data scientists_ should feel free to experiment and iterate. _Engineers_ should feel in control of data pipelines. Argilla optimizes the experience for these core users to **make your teams more productive**.



-  **Beyond hand-labeling**: Classical hand-labeling workflows are costly and inefficient, but having humans in the loop is essential. Easily combine hand-labeling with active learning, bulk-labeling, zero-shot models, and weak supervision in **novel** data annotation workflows**.

## 🫱🏾‍🫲🏼 Contribute

We love contributors and have launched a [collaboration with JustDiggit](https://argilla.io/blog/introducing-argilla-community-growers) to hand out our very own bunds and help the re-greening of sub-Saharan Africa. To help our community with the creation of contributions, we have created our [developer](https://docs.argilla.io/en/latest/community/developer_docs.html) and [contributor](https://docs.argilla.io/en/latest/community/contributing.html) docs. Additionally, you can always [schedule a meeting](https://calendly.com/argilla-office-hours/30min) with our Developer Advocacy team so they can get you up to speed.

## 🥇 Contributors
<a  href="https://github.com/argilla-io/argilla/graphs/contributors">

<img  src="https://contrib.rocks/image?repo=argilla-io/argilla" />

</a>

## 🗺️ Roadmap

We continuously work on updating [our plans and our roadmap](https://github.com/orgs/argilla-io/projects/10/views/1) and we love to discuss those with our community. Feel encouraged to participate.

