Metadata-Version: 2.1
Name: scrapy-pagestorage
Version: 0.4.0
Summary: Scrapy extension to store info in storage service
Home-page: https://github.com/scrapy-plugins/scrapy-pagestorage
Author: Scrapy developers
Author-email: opensource@scrapinghub.com
License: BSD
Platform: Any
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.5
License-File: LICENSE

==================
scrapy-pagestorage
==================

.. image:: https://img.shields.io/pypi/v/scrapy-pagestorage.svg
   :target: https://pypi.python.org/pypi/scrapy-pagestorage
   :alt: PyPI Version

.. image:: https://img.shields.io/pypi/pyversions/scrapy-pagestorage.svg
   :target: https://pypi.python.org/pypi/scrapy-pagestorage
   :alt: Python Versions

.. image:: https://github.com/scrapy-plugins/scrapy-pagestorage/actions/workflows/tests.yml/badge.svg
   :target: https://github.com/scrapy-plugins/scrapy-pagestorage/actions/workflows/tests.yml
   :alt: Build Status

.. image:: https://img.shields.io/codecov/c/github/scrapy-plugins/scrapy-pagestorage/master.svg
   :target: https://codecov.io/github/scrapy-plugins/scrapy-pagestorage
   :alt: Coverage report

A scrapy extension to store requests and responses information in storage service.

Installation
============

You can install scrapy-pagestorage using pip::

    pip install scrapy-pagestorage

You can then enable the middleware in your `settings.py`::

    SPIDER_MIDDLEWARES = {
        ...
        'scrapy_pagestorage.PageStorageMiddleware': 900
    }

How to use it
=============

Enable extension through `settings.py`::

    PAGE_STORAGE_ENABLED = True
    PAGE_STORAGE_ON_ERROR_ENABLED = True

Configure the exension through `settings.py`::

    PAGE_STORAGE_MODE = "VERSIONED_CACHE"
    PAGE_STORAGE_LIMIT = 100
    PAGE_STORAGE_ON_ERROR_LIMIT = 100
    PAGE_STORAGE_TRIM_HTML = True

The extension is auto-enabled for Portia spiders (``SHUB_SPIDER_TYPE=portia``).

Settings
========

PAGE_STORAGE_MODE
-----------------
Default: ``None``

A string which specifies if the extension will store information using cache store or
versioned cache store (set `PAGE_STORAGE_MODE="VERSIONED_CACHE"` to use versioned one).

PAGE_STORAGE_LIMIT
------------------
An integer to set a limit of visited pages amount to store.

PAGE_STORAGE_ON_ERROR_LIMIT
---------------------------
An integer to set a limit for page errors amount to store.

PAGE_STORAGE_TRIM_HTML
----------------------
Default: ``False``

Remove whitespace from the start and end of the HTML to reduce file size.


