Metadata-Version: 2.1
Name: dhlab
Version: 2.4.0
Summary: Text and image analysis of NB's digital collection
Home-page: https://github.com/NationalLibraryOfNorway/DHLAB
Author: The Digital Humanities Lab at The National Library of Norway (NB)
Author-email: dh-lab@nb.no
License: MIT
Project-URL: Documentation, https://dhlab.readthedocs.io
Project-URL: Bug Tracker, https://github.com/NationalLibraryOfNorway/DHLAB/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# DHLAB
<!-- start dhlab-intro -->

 [`dhlab`](https://pypi.org/project/dhlab/) is a python library for accessing reduced representations of text and pictures at
the National Library of Norway (NLN), *Nasjonalbiblioteket* (*NB*) in Norwegian. 
 It is developed and maintained by [The Digital Humanities lab group](https://www.nb.no/dh-lab/).

The python package includes wrapper functions for the [API](https://api.nb.no/) (Application 
Programming Interface) that can be used to query the texts in [NB Digital](https://www.nb.no/search), the NLN's digital collection of books and newspapers.

The API allows for textual qualitative and quantitative analyses of the digital texts by generating 
e.g. word frequency lists, concordances, collocations, n-grams, as well as
extracting names and narrative graphs.

Analyses can be performed on both a single document, and on a larger corpus.
It is also possible to build one's own corpora based on bibliographic metadata.
<!-- end dhlab-intro -->

The Jupyter Notebooks in the [digital_tekstanalyse](https://github.com/NationalLibraryOfNorway/digital_tekstanalyse) repo show examples on
how to use the library, and can be used
[directly in your browser](https://mybinder.org/v2/gh/DH-LAB-NB/DHLAB/master)
without prior programming experience.



## Installation

<!-- start installation -->

Install `dhlab` in your terminal with pip: 

```
pip install dhlab
```

<!-- end installation -->


## Example use
<!-- start example-use -->

You could start by building your own [corpus](https://en.wikipedia.org/wiki/Text_corpus), e.g. of 
books published between 1814 and 1905: 

```python
from dhlab.text import Corpus

book_corpus = Corpus(doctype="digibok", from_year=1814, to_year=1905)
```

<!-- end example-use -->

## Contact
If you have any questions, or run into any problems with the code, please log them in our [issue 
tracker](https://github.com/NationalLibraryOfNorway/DHLAB/issues) in the DHLAB repo. 


# Changelog 

## v2.4.0 (2022-07-12)

### Feat

- ner with spaCy
- ner with spaCy

## v2.3.6 (2022-07-12)

### Fix

- nb_ngram to point to new endpoint
- nb_ngram to point to new endpoint

## v2.3.5 (2022-07-11)

### Fix

- word counts
- word counts

## v2.3.4 (2022-07-11)

### Fix

- counts variable crossing
- counts variable crossing

## v2.3.3 (2022-07-11)

### Fix

- counting api
- counting api

## v2.3.2 (2022-07-11)

### Fix

- frequency and counts
- frequency and counts

## v2.3.1 (2022-06-21)

### Fix

- frequency

## v2.3.0 (2022-06-02)

### Fix

- parenthesis
- parenthesis

### Feat

- added access to Norsk Ordbank, wordbank

## v2.2.2 (2022-05-24)

### Fix

- use custom personal access token in ci action

## v2.2.1 (2022-05-16)

### Fix

- use custom personal access token in ci action

## v2.2.0 (2022-05-13)

### Feat

- ngram, geodata


## v2.1.0 (2022-05-13)

### Feat

- geodata

## v2.0.25 (2022-04-27)

### Fix

- **setup.cfg**: make package dhlab importable

## v2.0.24 (2022-04-19)

### Fix

- add missing newline  (#50)

## v2.0.23 (2022-04-01)

### Fix

- **github-workflows**: change github access token (#47)
- **github-workflows**: change github access token (#46)

### Refactor

- expose dhlab v1 modules

## v2.0.22 (2022-03-22)

### Fix

- import all legacy modules in `__init__.py`

### Refactor
- move dhlab_v1 code into its own subpackage
- **docs/package_summary.rst**: add reference table for legacy code 

## v2.0.21 (2022-03-21)
### Refactor 
- **constants**: add global variables for URLs in constants.py 
- Reformat code with pep8 tools
- turn relative imports into absolute imports
- simplify and reduce expressions
- rename classes with CamelCase

### Docs 
- **README**: add "Example use"
- add docstrings in subpackages
- add docs/CHANGELOG.md
- **docs**: add `*.rst` documentation files 
- add autosummary of whole dhlab package
- **logo**: update logo image
- add jupyter integration and toggle feature
- add copybutton to code blocks
- add docstrings and make functions private

## v2.0.20geo (2022-03-02)

### Feat
- **dhlab.api.dhlab_api**: add function `get_places`
- **text.geo_data**: add class `GeoData`

### Fix
- **text.dispersion**: pass **kwargs to `plot()` 

## v.2.0.18dispersion (2022-02-21)

### Feat
- **text.dispersion**: add class Dispersion
- **api.dhlab_api**: add get_dispersion

### Fix
- **requirements**: remove wordcloud  

## v2.0.17params (2022-02-08)
### Refactor
- **text.corpus**: add parameter fulltext
- **api.dhlab_api.document_corpus**: add parameter fulltext
- **text.conc_coll.Concordance**: add parameters window and limit 
- **text.conc_coll.Collocations**: add parameter samplesize

### Fix 
- **text.corpus.urnlist**: fix urnlist assignment

## v2.0.12.chunk (2022-01-29)
### Refactor 
- **text.chunking**: add attribute self.chunks

### Fix 
- imports


## v2.0.10chunks (2022-01-29)

### Feat
- **text.conc_coll**: add class Counts 
- **text.corpus**: add class Corpus_from_identifiers
- **text.chunking**: add class Chunks
- **text.chunking**: add functions get_chunks, get_chunks_para

### Fix 
- imports
- **dhlab_api.get_chunks**: return dict not dataframe
- apply autopep8

## v2.0.5 (2022-01-19)

### Refactor

- **nbtokenizer**: edit tokens for mail and web addresses

### Feat 
- add Tokens class

### Fix 
- imports


## v2.0.2a (2022-01-18)

### Fix 

- typecheck of corpus objects

## v2.0.1.alpha6 (2022-01-18)

- changed wordcloud import
- fixed corpus transfer in conc_coll


## v2.0.0.beta (2022-01-18)

### Feat
- add get_file_from_github, download_from_github in utils

### Refactor

- New package structure

### Docs

- include installation instructions in README


## v1.0.0 (2022-01-06)

- Set up Github Actions to run automatic linting and testing
- Set up documentation pages
- Include documentation of the code in docstrings


### Fix

- address linting issues from flake8
- reformat code

### Feat

- add documentation summaries for all modules
- add documentation for the repo
- add docstrings from README.md to nbtext.py
- add pylint config file

### Refactor

- reduce code duplication
- update workflow file reference
- change str.format to f-strings
- optimize imports
- rename workflow that packages and publishes dhlab to pypi
- use default publish workflow
- reduce compatible python versions
- update publishing workflow
- type out scope for linting explicitly
- move pylint.yml

## v0.75 (2019-09-09)

- Inital release to pypi
