Format Identification for Digital Objects (fido)
================================================

By [Open Preservation Foundation](http://www.openpreservation.org)

[![Build Status](https://travis-ci.org/openpreserve/fido.svg?branch=master)](https://travis-ci.org/openpreserve/fido) [![Code Coverage](https://codecov.io/gh/openpreserve/fido/branch/master/graph/badge.svg)](https://codecov.io/gh/openpreserve/fido)

FIDO is a command-line tool to identify the file formats of digital objects.
It is designed for simple integration into automated work-flows.

FIDO uses the UK National Archives (TNA) PRONOM File Format and Container descriptions.
PRONOM is available from <http://www.nationalarchives.gov.uk/pronom/>
See [LICENSE](LICENSE.txt) for license information.

* Download from: <https://github.com/openpreserve/fido/releases>
* Usage guide: <http://wiki.opf-labs.org/display/KB/FIDO+usage+guide>
* Author: Adam Farquhar (BL), 2010
* Maintainer: Maurice de Rooij (OPF/NANETH), 2011, 2012, 2013, Misty de Meo 2014, 2015, 2016, Holly Becker 2016

Usage
-----

```shell
usage: fido [-h] [-v] [-q] [-recurse] [-zip] [-noextension] [-nocontainer]
            [-pronom_only] [-input INPUT] [-filename FILENAME]
            [-useformats INCLUDEPUIDS] [-nouseformats EXCLUDEPUIDS]
            [-matchprintf FORMATSTRING] [-nomatchprintf FORMATSTRING]
            [-bufsize BUFSIZE] [-sigs SIG_ACT]
            [-container_bufsize CONTAINER_BUFSIZE]
            [-loadformats XML1,...,XMLn] [-confdir CONFDIR]
            [FILE [FILE ...]]
```

positional arguments:

* `FILE`: files to check. If the file is -, then read content from stdin. In this case, python must be invoked with `-u` or it may convert the line terminators.

optional arguments:

* `-h`, `--help`: show this help message and exit
* `-v`: show version information
* `-q`: run (more) quietly
* `-recurse`: recurse into subdirectories
* `-zip`: recurse into zip and tar files
* `-nocontainer`: disable deep scan of container documents, increases speed but may reduce accuracy with big files
* `-pronom_only`: disables loading of format extensions file, only PRONOM signatures are loaded, may reduce accuracy of results
* `-input INPUT`: file containing a list of files to check, one per line. - means stdin
* `-filename FILENAME`: filename if file contents passed through STDIN
* `-useformats INCLUDEPUIDS`: comma separated string of formats to use in identification
* `-nouseformats EXCLUDEPUIDS`: comma separated string of formats not to use in identification
* `-matchprintf FORMATSTRING`: format string (Python style) to use on match. See nomatchprintf, README.txt.
* `-nomatchprintf FORMATSTRING`: format string (Python style) to use if no match. See README.txt
* `-bufsize BUFSIZE`: size (in bytes) of the buffer to match against (default=131072 bytes)
* `-sigs SIG_ACT`: SIG_ACT "check" for new version of signature file for download.
                   SIG_ACT "list" list all available sig file versions.
                   SIG_ACT "update" to automatically update to latest available sig file.
                   SIG_ACT "n" download and use version n.
* `-container_bufsize CONTAINER_BUFSIZE`: size (in bytes) of the buffer to match against (default=524288 bytes)
* `-loadformats XML1,...,XMLn`: comma separated string of XML format files to add.
* `-confdir CONFDIR`: configuration directory to load_fido_xml, for example, the format specifications from.

Installation
------------

(also see: <http://wiki.opf-labs.org/display/KB/FIDO+usage+guide>)

Any platform

1. Download the latest zip release from <https://github.com/openpreserve/fido/releases>
2. Unzip into some directory
3. Open a command shell, cd to the directory that you placed the zip contents into
4. Run `python setup.py install` to install FIDO and dependencies.  This may require sudo on Linux/OSX or admin privileges on Windows.
5. You should now be able to see the help text:
   `fido -h`

Using pip

1. Run `pip install opf-fido`  This may require sudo on Linux/OSX or admin privileges on Windows.
2. You should now be able to see the help text:
   `fido -h`

Updating signatures
-------------------

Signatures can be updated from the OPF's signature service.
The service is pull only and iit's location is in the `versions.xml`
configuration file as

```xml
<updateSite>https://fidosigs.openpreservation.org</updateSite>
```

To check what version of the PRONOM signatures you are using
type: `fido -v` and you'll see something like:

```shell
FIDO v1.6.0 (pronom-xml-95.zip, container-signature-20200121.xml, format_extensions.xml)
```

Here `pronom-xml-95.zip` denotes PRONOM version 95. To see if a more recent
set of signatures is available type `fido -sigs check` which will report back:

```shell
Updated signatures v104 are available, current version is v95
```

if new signatures are available or

```shell
Your signature files are up to date, current version is v104
```

if not. To update signatures to the latest version type `fido -sigs update`:

```shell
Updated signatures v104 are available, current version is v95
Updating signatures
```

If you are having trouble due to firewall restrictions, see OPF wiki: <http://wiki.opf-labs.org/display/PT/Command+Line+Interface+proxy+usage>

Please note that this WILL NOT update the container signature file located in the 'conf' folder.
The reason for this that the PRONOM container signature file contains special types
of sequences which need to be tested before FIDO can use them. If there is an update available
for the PRONOM container signature file it will show up in a next commit.

Dependencies
------------

FIDO 1.0 through 1.3.3 will run on Python 2.7 with no other dependencies.

FIDO 1.3.4 and later requires the python dependency 'olefile'.  This can be
installed using `pip install olefile`, by running `python setup.py install`,
or a pip installation will handle dependencies.

FIDO 1.3.3 and later have experimental Python 3 support.

FIDO 1.4 and later have Python 3 support.

Format Definitions
------------------

By default, FIDO loads format information from two files `conf/formats.xml`
and `conf/format_extensions.xml`. Addition format files can be specified using
the `-loadformats` command line argument.  They should use the same syntax as
`conf/format_extensions.xml`. If more than one format file needs to be specified,
then they should be comma separated as with the `-formats` argument.

Output
------

Output is controlled with the two parameters `matchprintf` and `nomatchprintf`.
Each is a string that may contain formating information.  They have access to
an object called info with the following fields:

* `printmatch`: `info.version` (file format version X), `info.alias` (format also called X), `info.apple_uti` (Apple Uniform Type Identifier), `info.group_size` and `info.group_index` (if a file has multiple (tentative) hits), `info.count` (file N)

* `printnomatch`: `info.count` (file N)

The defaults for FIDO 1.0 are:

* `printmatch`:
* `"OK,%(info.time)s,%(info.puid)s,%(info.formatname)s,%(info.signaturename)s,%(info.filesize)s,\"%(info.filename)s\",\"%(info.mimetype)s\",\"%(info.matchtype)s\"\n"`

* `printnomatch`:
* `"KO,%(info.time)s,,,,%(info.filesize)s,\"%(info.filename)s\",,\"%(info.matchtype)s\"\n"`

It can be useful to provide an empty string for either, for example to ignore all failed matches, or all successful ones (see examples below).
Note that a newline needs to be added to the end of the string using \n.

Matchtypes
-----------

FIDO returns the following matchtypes:

* fail:      the object could not be identified with signature or file extension
* extension: the object could only be identified by file extension
* signature: the object has been identified with (a) PRONOM signature(s)
* container: the object has been idenfified with (a) PRONOM container signature(s)

In some cases multiple results are returned.

Examples running FIDO
---------------------

Identify all files in the current directory and below, sending output
into file-info.csv:
   `python fido.py -recurse . > file-info.csv`

Do the same as above, but also look inside of zip or tar files:
   `python fido.py -recurse -zip . > file-info.csv`

Take input from a list of files:

Linux:

```shell
ls > files.txt
python fido.py -input files.txt
```

Windows:

```shell
dir /b > files.txt
python fido.py -input files.txt
```

Take input from a pipe:

Linux:
   `find . -type f | python fido.py -input -`

Windows:
   `dir /b | python fido.py -input -`

Only show files that could not be identified:
   `python fido.py -matchprintf "" .`

Only show files that could be identified:
   `python fido.py -nomatchprintf "" .`

Deep scan of container objects
------------------------------

By default, when FIDO detects that a file is a container (compound) object,
it will start a deep (complete) scan of the file using the PRONOM container signatures.
When identifying big files, this behaviour can cause FIDO to slow down sigificantly.
You can disable deep scanning by invoking FIDO with the `-nocontainer` argument.
While disabling deep scan speeds up identification, it may reduce accuracy.

At the moment (version 1.0) FIDO is not yet able to perform scanning containers which are
passed through STDIN. A workaround would be to save the stream to a temporary file and have
FIDO identify this file.

License information
-------------------

See the file "[LICENSE.txt](LICENSE.txt)" for information on the history of this
software, terms & conditions for usage, and a DISCLAIMER OF ALL
WARRANTIES...
