Metadata-Version: 2.4
Name: trustworthy-llm
Version: 0.0.0
Summary: Core functionality for TLM
License-Expression: Apache-2.0
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: jsonpath-ng>=1.7.0
Requires-Dist: litellm>=1.77.2
Requires-Dist: numpy>=2.1.3
Requires-Dist: openai>=2.0.0
Requires-Dist: pandas-stubs>=2.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.9.2
Provides-Extra: dev
Requires-Dist: coverage>=7.6.4; extra == 'dev'
Requires-Dist: pre-commit>=4.0.1; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-dotenv>=0.5.2; extra == 'dev'
Requires-Dist: pytest-profiling>=1.7.0; extra == 'dev'
Requires-Dist: pytest-recording>=0.13.2; extra == 'dev'
Requires-Dist: pytest>=8.3.3; extra == 'dev'
Requires-Dist: ruff>=0.7.2; extra == 'dev'
Description-Content-Type: text/markdown

# Trustworthy Language Model (TLM)

The [Trustworthy Language Model](https://cleanlab.ai/blog/trustworthy-language-model/) scores the **trustworthiness** of outputs from *any* LLM in *real-time*.

Automatically detect hallucinated/incorrect responses in: Q&A (RAG), Chatbots, Agents, Structured Outputs, Data Extraction, Tool Calling, Classification/Tagging, Data Labeling, and other LLM applications.

Use TLM to:
- Guardrail AI mistakes before they are served to user
- Escalate cases where AI is untrustworthy to humans
- Discover incorrect LLM (or human) generated outputs in datasets/logs
- Boost AI accuracy

Powered by *uncertainty estimation* techniques, TLM **works out of the box**, and does **not** require: <br>
data preparation/labeling work or custom model training/serving infrastructure.

Learn more and see precision/recall benchmarks with frontier models (from OpenAI, Anthropic, Google, etc): <br>
[Blog](https://cleanlab.ai/blog/), [Research Paper](https://aclanthology.org/2024.acl-long.283/)

## Usage

See [notebooks](notebooks) for Jupyter notebooks with example usage.
