# Documents


```python
from gatenlp import Document

```


```python
# To load a document from a file with the name "file.bdocjs" into gatenlp simply use:
# doc = Document.load("test2a.bdocjs")

# But it is also possible to load from a file that is somewhere on the internet. For this notebook, we use
# an example document that gets loaded from a URL:
doc = Document.load("https://gatenlp.github.io/python-gatenlp/testdocument1.txt")

# We can visualize the document by printing it:
print(doc)
```

    Document(This is a test document.
    
    It contains just a few sentences. 
    Here is a sentence that mentions a few named entities like 
    the persons Barack Obama or Ursula von der Leyen, locations
    like New York City, Vienna or Beijing or companies like 
    Google, UniCredit or Huawei. 
    
    Here we include a URL https://gatenlp.github.io/python-gatenlp/ 
    and a fake email address john.doe@hiscoolserver.com as well 
    as #some #cool #hastags and a bunch of emojis like 😽 (a kissing cat),
    👩‍🏫 (a woman teacher), 🧬 (DNA), 
    🧗 (a person climbing), 
    💩 (a pile of poo). 
    
    Here we test a few different scripts, e.g. Hangul 한글 or 
    simplified Hanzi 汉字 or Farsi فارسی which goes from right to left. 
    
    
    ,features=Features({}),anns=[])


Printing the document shows the document text and indicates that there are no document features and no 
annotations which is to be expected since we just loaded from a plain text file. 

In a Jupyter notebook, a `gatenlp` document can also be visualized graphically by either just using the document 
as the last value of a cell or by using the IPython "display" function:


```python
from IPython.display import display
display(doc)
```


<div><style>#ZTUJDVANOM-wrapper { color: black !important; }</style>
<div id="ZTUJDVANOM-wrapper">

<div>
<style>
#ZTUJDVANOM-content {
    width: 100%;
    height: 100%;
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
}

.ZTUJDVANOM-row {
    width: 100%;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}

.ZTUJDVANOM-col {
    border: 1px solid grey;
    display: inline-block;
    min-width: 200px;
    padding: 5px;
    /* white-space: normal; */
    /* white-space: pre-wrap; */
    overflow-y: auto;
}

.ZTUJDVANOM-hdr {
    font-size: 1.2rem;
    font-weight: bold;
}

.ZTUJDVANOM-label {
    margin-bottom: -15px;
    display: block;
}

.ZTUJDVANOM-input {
    vertical-align: middle;
    position: relative;
    *overflow: hidden;
}

#ZTUJDVANOM-popup {
    display: none;
    color: black;
    position: absolute;
    margin-top: 10%;
    margin-left: 10%;
    background: #aaaaaa;
    width: 60%;
    height: 60%;
    z-index: 50;
    padding: 25px 25px 25px;
    border: 1px solid black;
    overflow: auto;
}

.ZTUJDVANOM-selection {
    margin-bottom: 5px;
}

.ZTUJDVANOM-featuretable {
    margin-top: 10px;
}

.ZTUJDVANOM-fname {
    text-align: left !important;
    font-weight: bold;
    margin-right: 10px;
}
.ZTUJDVANOM-fvalue {
    text-align: left !important;
}
</style>
  <div id="ZTUJDVANOM-content">
        <div id="ZTUJDVANOM-popup" style="display: none;">
        </div>
        <div class="ZTUJDVANOM-row" id="ZTUJDVANOM-row1" style="height:67vh; min-height:100px;">
            <div id="ZTUJDVANOM-text-wrapper" class="ZTUJDVANOM-col" style="width:70%;">
                <div class="ZTUJDVANOM-hdr" id="ZTUJDVANOM-dochdr"></div>
                <div id="ZTUJDVANOM-text">
                </div>
            </div>
            <div id="ZTUJDVANOM-chooser" class="ZTUJDVANOM-col" style="width:30%; border-left-width: 0px;"></div>
        </div>
        <div class="ZTUJDVANOM-row" id="ZTUJDVANOM-row2" style="height:30vh; min-height: 100px;">
            <div id="ZTUJDVANOM-details" class="ZTUJDVANOM-col" style="width:100%; border-top-width: 0px;">
            </div>
        </div>
    </div>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script><script src="https://unpkg.com/gatenlp-ann-viewer@1.0.9/gatenlp-ann-viewer.js"></script>
    <script type="application/json" id="ZTUJDVANOM-data">
    {"annotation_sets": {}, "text": "This is a test document.\n\nIt contains just a few sentences. \nHere is a sentence that mentions a few named entities like \nthe persons Barack Obama or Ursula von der Leyen, locations\nlike New York City, Vienna or Beijing or companies like \nGoogle, UniCredit or Huawei. \n\nHere we include a URL https://gatenlp.github.io/python-gatenlp/ \nand a fake email address john.doe@hiscoolserver.com as well \nas #some #cool #hastags and a bunch of emojis like \ud83d\ude3d (a kissing cat),\n\ud83d\udc69\u200d\ud83c\udfeb (a woman teacher), \ud83e\uddec (DNA), \n\ud83e\uddd7 (a person climbing), \n\ud83d\udca9 (a pile of poo). \n\nHere we test a few different scripts, e.g. Hangul \ud55c\uae00 or \nsimplified Hanzi \u6c49\u5b57 or Farsi \u0641\u0627\u0631\u0633\u06cc which goes from right to left. \n\n\n", "features": {}, "offset_type": "j", "name": ""}
    </script>
    <script type="text/javascript">
        gatenlp_run("ZTUJDVANOM-");
    </script>
  </div>

</div></div>


This shows the document in a layout that has three areas: the document text in the upper left,
the list of annotation set and type names in the upper right and document or annotation features
at the bottom. In the example above only the text is shown because there are no document features or 
annotations. 

## Document features

Lets add some document features:


```python
doc.features["loaded-from"] = "https://gatenlp.github.io/python-gatenlp/testdocument1.txt"
doc.features["purpose"] = "test document for gatenlp"
doc.features["someotherfeature"] = 22
doc.features["andanother"] = {"what": "a dict", "alist": [1,2,3,4,5]}
```

Document features map feature names to feature values and behave a lot like a Python dictionary. Feature names
should always be strings, feature values can be anything, but a document can only be stored or exchanged with Java GATE if feature values are restricted to whatever can be serialized with JSON: dictionaries, lists, numbers, strings and booleans. 

Now that we have create document features the document is shown like this:


```python
doc
```




<div><style>#GRHRBLBLBF-wrapper { color: black !important; }</style>
<div id="GRHRBLBLBF-wrapper">

<div>
<style>
#GRHRBLBLBF-content {
    width: 100%;
    height: 100%;
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
}

.GRHRBLBLBF-row {
    width: 100%;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}

.GRHRBLBLBF-col {
    border: 1px solid grey;
    display: inline-block;
    min-width: 200px;
    padding: 5px;
    /* white-space: normal; */
    /* white-space: pre-wrap; */
    overflow-y: auto;
}

.GRHRBLBLBF-hdr {
    font-size: 1.2rem;
    font-weight: bold;
}

.GRHRBLBLBF-label {
    margin-bottom: -15px;
    display: block;
}

.GRHRBLBLBF-input {
    vertical-align: middle;
    position: relative;
    *overflow: hidden;
}

#GRHRBLBLBF-popup {
    display: none;
    color: black;
    position: absolute;
    margin-top: 10%;
    margin-left: 10%;
    background: #aaaaaa;
    width: 60%;
    height: 60%;
    z-index: 50;
    padding: 25px 25px 25px;
    border: 1px solid black;
    overflow: auto;
}

.GRHRBLBLBF-selection {
    margin-bottom: 5px;
}

.GRHRBLBLBF-featuretable {
    margin-top: 10px;
}

.GRHRBLBLBF-fname {
    text-align: left !important;
    font-weight: bold;
    margin-right: 10px;
}
.GRHRBLBLBF-fvalue {
    text-align: left !important;
}
</style>
  <div id="GRHRBLBLBF-content">
        <div id="GRHRBLBLBF-popup" style="display: none;">
        </div>
        <div class="GRHRBLBLBF-row" id="GRHRBLBLBF-row1" style="height:67vh; min-height:100px;">
            <div id="GRHRBLBLBF-text-wrapper" class="GRHRBLBLBF-col" style="width:70%;">
                <div class="GRHRBLBLBF-hdr" id="GRHRBLBLBF-dochdr"></div>
                <div id="GRHRBLBLBF-text">
                </div>
            </div>
            <div id="GRHRBLBLBF-chooser" class="GRHRBLBLBF-col" style="width:30%; border-left-width: 0px;"></div>
        </div>
        <div class="GRHRBLBLBF-row" id="GRHRBLBLBF-row2" style="height:30vh; min-height: 100px;">
            <div id="GRHRBLBLBF-details" class="GRHRBLBLBF-col" style="width:100%; border-top-width: 0px;">
            </div>
        </div>
    </div>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script><script src="https://unpkg.com/gatenlp-ann-viewer@1.0.9/gatenlp-ann-viewer.js"></script>
    <script type="application/json" id="GRHRBLBLBF-data">
    {"annotation_sets": {}, "text": "This is a test document.\n\nIt contains just a few sentences. \nHere is a sentence that mentions a few named entities like \nthe persons Barack Obama or Ursula von der Leyen, locations\nlike New York City, Vienna or Beijing or companies like \nGoogle, UniCredit or Huawei. \n\nHere we include a URL https://gatenlp.github.io/python-gatenlp/ \nand a fake email address john.doe@hiscoolserver.com as well \nas #some #cool #hastags and a bunch of emojis like \ud83d\ude3d (a kissing cat),\n\ud83d\udc69\u200d\ud83c\udfeb (a woman teacher), \ud83e\uddec (DNA), \n\ud83e\uddd7 (a person climbing), \n\ud83d\udca9 (a pile of poo). \n\nHere we test a few different scripts, e.g. Hangul \ud55c\uae00 or \nsimplified Hanzi \u6c49\u5b57 or Farsi \u0641\u0627\u0631\u0633\u06cc which goes from right to left. \n\n\n", "features": {"loaded-from": "https://gatenlp.github.io/python-gatenlp/testdocument1.txt", "purpose": "test document for gatenlp", "someotherfeature": 22, "andanother": {"what": "a dict", "alist": [1, 2, 3, 4, 5]}}, "offset_type": "j", "name": ""}
    </script>
    <script type="text/javascript">
        gatenlp_run("GRHRBLBLBF-");
    </script>
  </div>

</div></div>




```python
# to retrieve a feature value we can do:
doc.features["purpose"]
```




    'test document for gatenlp'




```python
# If a feature does not exist, None is returned or a default value if specified:
print(doc.features.get("doesntexist"))
print(doc.features.get("doesntexist", "MV!"))

```

    None
    MV!


## Annotations

Lets add some annotations too. Annotations are items of information for some range of characters within the document. They can be used to represent information about things like tokens, entities, sentences, paragraphs, or 
anything that corresponds to some contiguous range of offsets in the document.

Annotations consist of the following parts:
* The "start" and "end" offset to identify the text the annotation refers to
* A "type" which is an arbitrary name that identifies what kind of thing the annotation describes, e.g. "Token"
* Features: these work in the same way as for the whole document: an arbitrary set of feature name / feature value
  pairs which provide more information, e.g. for a Token the features could include the lemma, the part of speech,
  the stem, the number, etc. 

Annotations can be organized in "annotation sets". Each annotation set has a name and a set of annotations. There can be as many sets as needed. 

Annotation can overlap arbitrarily and there can be as many as needed. 

Let us manually add a few annotations to the document:


```python
# create and get an annotation set with the name "Set1"
annset = doc.annset("Set1")
```

Add an annotation to the set which refers to the first word in the document "This". The range of characters
for this word starts at offset 0 and the length of the annotation is 4, so the "start" offset is 0 and the "end" offset is 0+4=4. Note that the end offset always points to the offset *after* the last character of the range.


```python
annset.add(0,4,"Word",{"what": "our first annotation"})
```




    Annotation(0,4,Word,features=Features({'what': 'our first annotation'}),id=0)




```python
# Add more
annset.add(5,7,"Word",{"what": "our second annotation"})
annset.add(0,24,"Sentence",{"what": "our first sentence annotation"})
```




    Annotation(0,24,Sentence,features=Features({'what': 'our first sentence annotation'}),id=2)



If we visualize the document now, the newly created set "Set" is shown in the right part of
the display. It shows the different annotation types that exist in the set, and how many annotations
for each type are in the set. If you click the check box, the annotation ranges are shown in the 
text with the colour associated with the annotation type. You can then click on a range / annotation in the
text and the features of the annotation are shown in the lower part. 
To show the features for a different annotation click on the coloured range for the annotation in the text.
To show the document features, click on "Document".

If you have selected more than one type, a range can have more than one overlapping annotations. 
This is shown by mixing the colours. If you click at such a location, a dialog appears which lets you
select for which of the overlapping annotations you want to display the features. 


```python
doc
```




<div><style>#FLKEUFSGIS-wrapper { color: black !important; }</style>
<div id="FLKEUFSGIS-wrapper">

<div>
<style>
#FLKEUFSGIS-content {
    width: 100%;
    height: 100%;
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
}

.FLKEUFSGIS-row {
    width: 100%;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}

.FLKEUFSGIS-col {
    border: 1px solid grey;
    display: inline-block;
    min-width: 200px;
    padding: 5px;
    /* white-space: normal; */
    /* white-space: pre-wrap; */
    overflow-y: auto;
}

.FLKEUFSGIS-hdr {
    font-size: 1.2rem;
    font-weight: bold;
}

.FLKEUFSGIS-label {
    margin-bottom: -15px;
    display: block;
}

.FLKEUFSGIS-input {
    vertical-align: middle;
    position: relative;
    *overflow: hidden;
}

#FLKEUFSGIS-popup {
    display: none;
    color: black;
    position: absolute;
    margin-top: 10%;
    margin-left: 10%;
    background: #aaaaaa;
    width: 60%;
    height: 60%;
    z-index: 50;
    padding: 25px 25px 25px;
    border: 1px solid black;
    overflow: auto;
}

.FLKEUFSGIS-selection {
    margin-bottom: 5px;
}

.FLKEUFSGIS-featuretable {
    margin-top: 10px;
}

.FLKEUFSGIS-fname {
    text-align: left !important;
    font-weight: bold;
    margin-right: 10px;
}
.FLKEUFSGIS-fvalue {
    text-align: left !important;
}
</style>
  <div id="FLKEUFSGIS-content">
        <div id="FLKEUFSGIS-popup" style="display: none;">
        </div>
        <div class="FLKEUFSGIS-row" id="FLKEUFSGIS-row1" style="height:67vh; min-height:100px;">
            <div id="FLKEUFSGIS-text-wrapper" class="FLKEUFSGIS-col" style="width:70%;">
                <div class="FLKEUFSGIS-hdr" id="FLKEUFSGIS-dochdr"></div>
                <div id="FLKEUFSGIS-text">
                </div>
            </div>
            <div id="FLKEUFSGIS-chooser" class="FLKEUFSGIS-col" style="width:30%; border-left-width: 0px;"></div>
        </div>
        <div class="FLKEUFSGIS-row" id="FLKEUFSGIS-row2" style="height:30vh; min-height: 100px;">
            <div id="FLKEUFSGIS-details" class="FLKEUFSGIS-col" style="width:100%; border-top-width: 0px;">
            </div>
        </div>
    </div>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script><script src="https://unpkg.com/gatenlp-ann-viewer@1.0.9/gatenlp-ann-viewer.js"></script>
    <script type="application/json" id="FLKEUFSGIS-data">
    {"annotation_sets": {"Set1": {"name": "detached-from:Set1", "annotations": [{"type": "Word", "start": 0, "end": 4, "id": 0, "features": {"what": "our first annotation"}}, {"type": "Word", "start": 5, "end": 7, "id": 1, "features": {"what": "our second annotation"}}, {"type": "Sentence", "start": 0, "end": 24, "id": 2, "features": {"what": "our first sentence annotation"}}], "next_annid": 3}}, "text": "This is a test document.\n\nIt contains just a few sentences. \nHere is a sentence that mentions a few named entities like \nthe persons Barack Obama or Ursula von der Leyen, locations\nlike New York City, Vienna or Beijing or companies like \nGoogle, UniCredit or Huawei. \n\nHere we include a URL https://gatenlp.github.io/python-gatenlp/ \nand a fake email address john.doe@hiscoolserver.com as well \nas #some #cool #hastags and a bunch of emojis like \ud83d\ude3d (a kissing cat),\n\ud83d\udc69\u200d\ud83c\udfeb (a woman teacher), \ud83e\uddec (DNA), \n\ud83e\uddd7 (a person climbing), \n\ud83d\udca9 (a pile of poo). \n\nHere we test a few different scripts, e.g. Hangul \ud55c\uae00 or \nsimplified Hanzi \u6c49\u5b57 or Farsi \u0641\u0627\u0631\u0633\u06cc which goes from right to left. \n\n\n", "features": {"loaded-from": "https://gatenlp.github.io/python-gatenlp/testdocument1.txt", "purpose": "test document for gatenlp", "someotherfeature": 22, "andanother": {"what": "a dict", "alist": [1, 2, 3, 4, 5]}}, "offset_type": "j", "name": ""}
    </script>
    <script type="text/javascript">
        gatenlp_run("FLKEUFSGIS-");
    </script>
  </div>

</div></div>



# Loading a larger document

Lets load a larger document, and from an HTML file: the Wikipedia page for "Natural Language processing":




```python
doc2 = Document.load("https://en.m.wikipedia.org/wiki/Natural_language_processing", fmt="html", parser="html.parser")
doc2
```




<div><style>#ELGGRWQFXQ-wrapper { color: black !important; }</style>
<div id="ELGGRWQFXQ-wrapper">

<div>
<style>
#ELGGRWQFXQ-content {
    width: 100%;
    height: 100%;
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
}

.ELGGRWQFXQ-row {
    width: 100%;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}

.ELGGRWQFXQ-col {
    border: 1px solid grey;
    display: inline-block;
    min-width: 200px;
    padding: 5px;
    /* white-space: normal; */
    /* white-space: pre-wrap; */
    overflow-y: auto;
}

.ELGGRWQFXQ-hdr {
    font-size: 1.2rem;
    font-weight: bold;
}

.ELGGRWQFXQ-label {
    margin-bottom: -15px;
    display: block;
}

.ELGGRWQFXQ-input {
    vertical-align: middle;
    position: relative;
    *overflow: hidden;
}

#ELGGRWQFXQ-popup {
    display: none;
    color: black;
    position: absolute;
    margin-top: 10%;
    margin-left: 10%;
    background: #aaaaaa;
    width: 60%;
    height: 60%;
    z-index: 50;
    padding: 25px 25px 25px;
    border: 1px solid black;
    overflow: auto;
}

.ELGGRWQFXQ-selection {
    margin-bottom: 5px;
}

.ELGGRWQFXQ-featuretable {
    margin-top: 10px;
}

.ELGGRWQFXQ-fname {
    text-align: left !important;
    font-weight: bold;
    margin-right: 10px;
}
.ELGGRWQFXQ-fvalue {
    text-align: left !important;
}
</style>
  <div id="ELGGRWQFXQ-content">
        <div id="ELGGRWQFXQ-popup" style="display: none;">
        </div>
        <div class="ELGGRWQFXQ-row" id="ELGGRWQFXQ-row1" style="height:67vh; min-height:100px;">
            <div id="ELGGRWQFXQ-text-wrapper" class="ELGGRWQFXQ-col" style="width:70%;">
                <div class="ELGGRWQFXQ-hdr" id="ELGGRWQFXQ-dochdr"></div>
                <div id="ELGGRWQFXQ-text">
                </div>
            </div>
            <div id="ELGGRWQFXQ-chooser" class="ELGGRWQFXQ-col" style="width:30%; border-left-width: 0px;"></div>
        </div>
        <div class="ELGGRWQFXQ-row" id="ELGGRWQFXQ-row2" style="height:30vh; min-height: 100px;">
            <div id="ELGGRWQFXQ-details" class="ELGGRWQFXQ-col" style="width:100%; border-top-width: 0px;">
            </div>
        </div>
    </div>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script><script src="https://unpkg.com/gatenlp-ann-viewer@1.0.9/gatenlp-ann-viewer.js"></script>
    <script type="application/json" id="ELGGRWQFXQ-data">
    {"annotation_sets": {"Original markups": {"name": "detached-from:Original markups", "annotations": [{"type": "[document]", "start": 0, "end": 36793, "id": 0, "features": {}}, {"type": "html", "start": 1, "end": 36793, "id": 1, "features": {"class": "client-nojs", "lang": "en", "dir": "ltr"}}, {"type": "head", "start": 1, "end": 41, "id": 2, "features": {}}, {"type": "meta", "start": 1, "end": 1, "id": 3, "features": {"charset": "UTF-8"}}, {"type": "title", "start": 1, "end": 40, "id": 4, "features": {}}, {"type": "link", "start": 41, "end": 41, "id": 5, "features": {"rel": "stylesheet", "href": "/w/load.php?lang=en&modules=ext.cite.styles%7Cext.math.styles%7Cext.wikimediaBadges%7Cmediawiki.hlist%7Cmediawiki.ui.button%2Cicon%7Cmobile.init.styles%7Cskins.minerva.base.styles%7Cskins.minerva.content.styles%7Cskins.minerva.content.styles.images%7Cskins.minerva.icons.wikimedia%7Cskins.minerva.mainMenu.icons%2Cstyles&only=styles&skin=minerva"}}, {"type": "meta", "start": 41, "end": 41, "id": 6, "features": {"name": "generator", "content": "MediaWiki 1.36.0-wmf.11"}}, {"type": "meta", "start": 41, "end": 41, "id": 7, "features": {"name": "referrer", "content": "origin"}}, {"type": "meta", "start": 41, "end": 41, "id": 8, "features": {"name": "referrer", "content": "origin-when-crossorigin"}}, {"type": "meta", "start": 41, "end": 41, "id": 9, "features": {"name": "referrer", "content": "origin-when-cross-origin"}}, {"type": "meta", "start": 41, "end": 41, "id": 10, "features": {"name": "theme-color", "content": "#eaecf0"}}, {"type": "meta", "start": 41, "end": 41, "id": 11, "features": {"property": "og:image", "content": "https://upload.wikimedia.org/wikipedia/commons/8/8b/Automated_online_assistant.png"}}, {"type": "meta", "start": 41, "end": 41, "id": 12, "features": {"name": "viewport", "content": "width=device-width, initial-scale=1.0, user-scalable=yes, minimum-scale=0.25, maximum-scale=5.0"}}, {"type": "link", "start": 41, "end": 41, "id": 13, "features": {"rel": "manifest", "href": "/w/api.php?action=webapp-manifest"}}, {"type": "link", "start": 41, "end": 41, "id": 14, "features": {"rel": "alternate", "type": "application/x-wiki", "title": "Edit this page", "href": "/w/index.php?title=Natural_language_processing&action=edit"}}, {"type": "link", "start": 41, "end": 41, "id": 15, "features": {"rel": "edit", "title": "Edit this page", "href": "/w/index.php?title=Natural_language_processing&action=edit"}}, {"type": "link", "start": 41, "end": 41, "id": 16, "features": {"rel": "apple-touch-icon", "href": "/static/apple-touch/wikipedia.png"}}, {"type": "link", "start": 41, "end": 41, "id": 17, "features": {"rel": "shortcut icon", "href": "/static/favicon/wikipedia.ico"}}, {"type": "link", "start": 41, "end": 41, "id": 18, "features": {"rel": "search", "type": "application/opensearchdescription+xml", "href": "/w/opensearch_desc.php", "title": "Wikipedia (en)"}}, {"type": "link", "start": 41, "end": 41, "id": 19, "features": {"rel": "EditURI", "type": "application/rsd+xml", "href": "//en.wikipedia.org/w/api.php?action=rsd"}}, {"type": "link", "start": 41, "end": 41, "id": 20, "features": {"rel": "license", "href": "//creativecommons.org/licenses/by-sa/3.0/"}}, {"type": "link", "start": 41, "end": 41, "id": 21, "features": {"rel": "canonical", "href": "https://en.wikipedia.org/wiki/Natural_language_processing"}}, {"type": "link", "start": 41, "end": 41, "id": 22, "features": {"rel": "dns-prefetch", "href": "//login.wikimedia.org"}}, {"type": "link", "start": 41, "end": 41, "id": 23, "features": {"rel": "dns-prefetch", "href": "//meta.wikimedia.org"}}, {"type": "body", "start": 41, "end": 36792, "id": 24, "features": {"class": "mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-subject mw-editable page-Natural_language_processing rootpage-Natural_language_processing stable issues-group-B skin-minerva action-view skin--responsive"}}, {"type": "div", "start": 41, "end": 36791, "id": 25, "features": {"id": "mw-mf-viewport"}}, {"type": "div", "start": 41, "end": 36791, "id": 26, "features": {"id": "mw-mf-page-center"}}, {"type": "a", "start": 41, "end": 41, "id": 27, "features": {"class": "mw-mf-page-center__mask", "href": "#"}}, {"type": "header", "start": 41, "end": 133, "id": 28, "features": {"class": "header-container header-chrome"}}, {"type": "form", "start": 41, "end": 133, "id": 29, "features": {"class": "header", "action": "/w/index.php", "method": "get"}}, {"type": "nav", "start": 41, "end": 126, "id": 30, "features": {"class": "navigation-drawer toggle-list view-border-box"}}, {"type": "input", "start": 41, "end": 41, "id": 31, "features": {"type": "checkbox", "id": "main-menu-input", "class": "toggle-list__checkbox", "role": "button", "aria-labelledby": "mw-mf-main-menu-button"}}, {"type": "label", "start": 41, "end": 55, "id": 32, "features": {"for": "main-menu-input", "id": "mw-mf-main-menu-button", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-menu-base20 mw-ui-icon-flush-left toggle-list__toggle", "title": "Open main menu", "data-event-name": "ui.mainmenu"}}, {"type": "div", "start": 56, "end": 126, "id": 33, "features": {"id": "mw-mf-page-left", "class": "menu toggle-list__list view-border-box"}}, {"type": "ul", "start": 56, "end": 75, "id": 34, "features": {"id": "p-navigation"}}, {"type": "li", "start": 56, "end": 61, "id": 35, "features": {"class": ""}}, {"type": "a", "start": 56, "end": 60, "id": 36, "features": {"href": "/wiki/Main_Page", "class": "mw-ui-icon mw-ui-icon-before mw-ui-icon-minerva-home", "data-event-name": "menu.home"}}, {"type": "span", "start": 56, "end": 60, "id": 37, "features": {}}, {"type": "li", "start": 61, "end": 68, "id": 38, "features": {"class": ""}}, {"type": "a", "start": 61, "end": 67, "id": 39, "features": {"href": "/wiki/Special:Random#/random", "class": "mw-ui-icon mw-ui-icon-before mw-ui-icon-minerva-die ", "data-event-name": "menu.random"}}, {"type": "span", "start": 61, "end": 67, "id": 40, "features": {}}, {"type": "li", "start": 68, "end": 75, "id": 41, "features": {"class": "jsonly"}}, {"type": "a", "start": 68, "end": 74, "id": 42, "features": {"href": "/wiki/Special:Nearby", "class": "mw-ui-icon mw-ui-icon-before mw-ui-icon-minerva-mapPin nearby", "data-event-name": "menu.nearby"}}, {"type": "span", "start": 68, "end": 74, "id": 43, "features": {}}, {"type": "ul", "start": 75, "end": 82, "id": 44, "features": {"id": "p-personal"}}, {"type": "li", "start": 75, "end": 82, "id": 45, "features": {"class": ""}}, {"type": "a", "start": 75, "end": 81, "id": 46, "features": {"href": "/w/index.php?title=Special:UserLogin&returnto=Natural+language+processing", "class": "menu__item--login mw-ui-icon mw-ui-icon-before mw-ui-icon-minerva-logIn ", "data-event-name": "menu.login"}}, {"type": "span", "start": 75, "end": 81, "id": 47, "features": {}}, {"type": "ul", "start": 82, "end": 91, "id": 48, "features": {"id": "pt-preferences"}}, {"type": "li", "start": 82, "end": 91, "id": 49, "features": {"class": "jsonly"}}, {"type": "a", "start": 82, "end": 90, "id": 50, "features": {"href": "/w/index.php?title=Special:MobileOptions&returnto=Natural+language+processing", "class": "menu__item--settings mw-ui-icon mw-ui-icon-before mw-ui-icon-minerva-settings ", "data-event-name": "menu.settings"}}, {"type": "span", "start": 82, "end": 90, "id": 51, "features": {}}, {"type": "ul", "start": 91, "end": 98, "id": 52, "features": {"id": "p-donation"}}, {"type": "li", "start": 91, "end": 98, "id": 53, "features": {"class": ""}}, {"type": "a", "start": 91, "end": 97, "id": 54, "features": {"href": "https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en&utm_key=minerva", "class": "mw-ui-icon mw-ui-icon-before mw-ui-icon-minerva-heart ", "data-event-name": "menu.donate"}}, {"type": "span", "start": 91, "end": 97, "id": 55, "features": {}}, {"type": "ul", "start": 98, "end": 126, "id": 56, "features": {"class": "hlist"}}, {"type": "li", "start": 98, "end": 114, "id": 57, "features": {"class": ""}}, {"type": "a", "start": 98, "end": 113, "id": 58, "features": {"href": "/wiki/Wikipedia:About", "class": "", "data-event-name": ""}}, {"type": "span", "start": 98, "end": 113, "id": 59, "features": {}}, {"type": "li", "start": 114, "end": 126, "id": 60, "features": {"class": ""}}, {"type": "a", "start": 114, "end": 125, "id": 61, "features": {"href": "/wiki/Wikipedia:General_disclaimer", "class": "", "data-event-name": ""}}, {"type": "span", "start": 114, "end": 125, "id": 62, "features": {}}, {"type": "label", "start": 126, "end": 126, "id": 63, "features": {"class": "main-menu-mask", "for": "main-menu-input"}}, {"type": "div", "start": 126, "end": 126, "id": 64, "features": {"class": "branding-box"}}, {"type": "a", "start": 126, "end": 126, "id": 65, "features": {"href": "/wiki/Main_Page"}}, {"type": "span", "start": 126, "end": 126, "id": 66, "features": {}}, {"type": "img", "start": 126, "end": 126, "id": 67, "features": {"src": "/static/images/mobile/copyright/wikipedia-wordmark-en.svg", "width": "119", "height": "18", "alt": "Wikipedia"}}, {"type": "div", "start": 126, "end": 126, "id": 68, "features": {"class": "search-box"}}, {"type": "input", "start": 126, "end": 126, "id": 69, "features": {"class": "search mw-ui-background-icon-search skin-minerva-search-trigger", "type": "search", "name": "search", "id": "searchInput", "autocomplete": "off", "placeholder": "Search Wikipedia", "aria-label": "Search Wikipedia", "value": ""}}, {"type": "nav", "start": 126, "end": 133, "id": 70, "features": {"class": "minerva-user-navigation", "aria-label": "User navigation"}}, {"type": "div", "start": 126, "end": 133, "id": 71, "features": {}}, {"type": "button", "start": 126, "end": 132, "id": 72, "features": {"id": "searchIcon", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-search-base20 skin-minerva-search-trigger", "type": "submit"}}, {"type": "main", "start": 133, "end": 36227, "id": 73, "features": {"id": "content", "class": "mw-body"}}, {"type": "div", "start": 133, "end": 133, "id": 74, "features": {"class": "banner-container"}}, {"type": "div", "start": 133, "end": 133, "id": 75, "features": {"id": "siteNotice"}}, {"type": "div", "start": 133, "end": 181, "id": 76, "features": {"class": "pre-content heading-holder"}}, {"type": "div", "start": 133, "end": 161, "id": 77, "features": {"class": "page-heading"}}, {"type": "h1", "start": 133, "end": 161, "id": 78, "features": {"id": "section_0"}}, {"type": "div", "start": 161, "end": 161, "id": 79, "features": {"class": "tagline"}}, {"type": "nav", "start": 161, "end": 181, "id": 80, "features": {"class": "page-actions-menu"}}, {"type": "ul", "start": 161, "end": 181, "id": 81, "features": {"id": "page-actions", "class": "page-actions-menu__list"}}, {"type": "li", "start": 161, "end": 170, "id": 82, "features": {"id": "language-selector", "class": "page-actions-menu__list-item"}}, {"type": "a", "start": 161, "end": 169, "id": 83, "features": {"id": "", "href": "/wiki/Special:MobileLanguages/Natural_language_processing", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-language-base20 mw-ui-icon-with-label-desktop language-selector", "data-mw": "interface", "data-event-name": "menu.languages", "role": "button", "title": "Language"}}, {"type": "li", "start": 170, "end": 176, "id": 84, "features": {"id": "page-actions-watch", "class": "page-actions-menu__list-item"}}, {"type": "a", "start": 170, "end": 175, "id": 85, "features": {"id": "ca-watch", "href": "/w/index.php?title=Special:UserLogin&returnto=Natural+language+processing", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-star-base20 mw-ui-icon-with-label-desktop watch-this-article mw-watchlink menu__item--page-actions-watch", "data-mw": "interface", "data-event-name": "menu.watch", "role": "button", "title": "Watch"}}, {"type": "li", "start": 176, "end": 181, "id": 86, "features": {"id": "page-actions-edit", "class": "page-actions-menu__list-item"}}, {"type": "a", "start": 176, "end": 180, "id": 87, "features": {"id": "ca-edit", "href": "/w/index.php?title=Natural_language_processing&action=edit&section=0", "class": "edit-page menu__item--page-actions-edit mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 mw-ui-icon-with-label-desktop", "data-mw": "interface", "data-event-name": "menu.edit", "role": "button", "title": "Edit the lead section of this page"}}, {"type": "div", "start": 181, "end": 181, "id": 88, "features": {"class": "minerva__subtitle"}}, {"type": "div", "start": 181, "end": 36227, "id": 89, "features": {"id": "bodyContent", "class": "content"}}, {"type": "div", "start": 181, "end": 36122, "id": 90, "features": {"id": "mw-content-text", "lang": "en", "dir": "ltr", "class": "mw-content-ltr"}}, {"type": "div", "start": 181, "end": 36122, "id": 91, "features": {"class": "mw-parser-output"}}, {"type": "section", "start": 181, "end": 1346, "id": 92, "features": {"class": "mf-section-0", "id": "mf-section-0"}}, {"type": "p", "start": 181, "end": 463, "id": 93, "features": {}}, {"type": "b", "start": 181, "end": 208, "id": 94, "features": {}}, {"type": "b", "start": 210, "end": 213, "id": 95, "features": {}}, {"type": "a", "start": 232, "end": 243, "id": 96, "features": {"href": "/wiki/Linguistics", "title": "Linguistics"}}, {"type": "a", "start": 245, "end": 261, "id": 97, "features": {"href": "/wiki/Computer_science", "title": "Computer science"}}, {"type": "a", "start": 267, "end": 290, "id": 98, "features": {"href": "/wiki/Artificial_intelligence", "title": "Artificial intelligence"}}, {"type": "a", "start": 440, "end": 456, "id": 99, "features": {"href": "/wiki/Natural_language", "title": "Natural language"}}, {"type": "div", "start": 463, "end": 625, "id": 100, "features": {"class": "thumb tright"}}, {"type": "div", "start": 463, "end": 625, "id": 101, "features": {"class": "thumbinner", "style": "width:202px;"}}, {"type": "a", "start": 463, "end": 463, "id": 102, "features": {"href": "/wiki/File:Automated_online_assistant.png", "class": "image"}}, {"type": "img", "start": 463, "end": 463, "id": 103, "features": {"alt": "", "src": "//upload.wikimedia.org/wikipedia/commons/thumb/8/8b/Automated_online_assistant.png/200px-Automated_online_assistant.png", "decoding": "async", "width": "200", "height": "251", "class": "thumbimage", "data-file-width": "400", "data-file-height": "501"}}, {"type": "div", "start": 465, "end": 625, "id": 104, "features": {"class": "thumbcaption"}}, {"type": "div", "start": 465, "end": 465, "id": 105, "features": {"class": "magnify"}}, {"type": "a", "start": 465, "end": 465, "id": 106, "features": {"href": "/wiki/File:Automated_online_assistant.png", "class": "internal", "title": "Enlarge"}}, {"type": "a", "start": 468, "end": 494, "id": 107, "features": {"href": "/wiki/Automated_online_assistant", "class": "mw-redirect", "title": "Automated online assistant"}}, {"type": "a", "start": 505, "end": 521, "id": 108, "features": {"href": "/wiki/Customer_service", "title": "Customer service"}}, {"type": "sup", "start": 621, "end": 624, "id": 109, "features": {"id": "cite_ref-Kongthon_1-0", "class": "reference"}}, {"type": "a", "start": 621, "end": 624, "id": 110, "features": {"href": "#cite_note-Kongthon-1"}}, {"type": "p", "start": 625, "end": 771, "id": 111, "features": {}}, {"type": "a", "start": 686, "end": 704, "id": 112, "features": {"href": "/wiki/Speech_recognition", "title": "Speech recognition"}}, {"type": "a", "start": 706, "end": 736, "id": 113, "features": {"href": "/wiki/Natural_language_understanding", "class": "mw-redirect", "title": "Natural language understanding"}}, {"type": "a", "start": 742, "end": 769, "id": 114, "features": {"href": "/wiki/Natural-language_generation", "title": "Natural-language generation"}}, {"type": "div", "start": 771, "end": 1346, "id": 115, "features": {"id": "toc", "class": "toc", "role": "navigation", "aria-labelledby": "mw-toc-heading"}}, {"type": "input", "start": 771, "end": 771, "id": 116, "features": {"type": "checkbox", "role": "button", "id": "toctogglecheckbox", "class": "toctogglecheckbox", "style": "display:none"}}, {"type": "div", "start": 771, "end": 780, "id": 117, "features": {"class": "toctitle", "lang": "en", "dir": "ltr"}}, {"type": "h2", "start": 771, "end": 780, "id": 118, "features": {"id": "mw-toc-heading"}}, {"type": "span", "start": 780, "end": 780, "id": 119, "features": {"class": "toctogglespan"}}, {"type": "label", "start": 780, "end": 780, "id": 120, "features": {"class": "toctogglelabel", "for": "toctogglecheckbox"}}, {"type": "ul", "start": 780, "end": 1346, "id": 121, "features": {}}, {"type": "li", "start": 780, "end": 890, "id": 122, "features": {"class": "toclevel-1 tocsection-1"}}, {"type": "a", "start": 780, "end": 789, "id": 123, "features": {"href": "#History"}}, {"type": "span", "start": 780, "end": 781, "id": 124, "features": {"class": "tocnumber"}}, {"type": "span", "start": 782, "end": 789, "id": 125, "features": {"class": "toctext"}}, {"type": "ul", "start": 790, "end": 890, "id": 126, "features": {}}, {"type": "li", "start": 790, "end": 829, "id": 127, "features": {"class": "toclevel-2 tocsection-2"}}, {"type": "a", "start": 790, "end": 828, "id": 128, "features": {"href": "#Symbolic_NLP_(1950s_-_early_1990s)"}}, {"type": "span", "start": 790, "end": 793, "id": 129, "features": {"class": "tocnumber"}}, {"type": "span", "start": 794, "end": 828, "id": 130, "features": {"class": "toctext"}}, {"type": "li", "start": 829, "end": 865, "id": 131, "features": {"class": "toclevel-2 tocsection-3"}}, {"type": "a", "start": 829, "end": 864, "id": 132, "features": {"href": "#Statistical_NLP_(1990s_-_2010s)"}}, {"type": "span", "start": 829, "end": 832, "id": 133, "features": {"class": "tocnumber"}}, {"type": "span", "start": 833, "end": 864, "id": 134, "features": {"class": "toctext"}}, {"type": "li", "start": 865, "end": 890, "id": 135, "features": {"class": "toclevel-2 tocsection-4"}}, {"type": "a", "start": 865, "end": 889, "id": 136, "features": {"href": "#Neural_NLP_(present)"}}, {"type": "span", "start": 865, "end": 868, "id": 137, "features": {"class": "tocnumber"}}, {"type": "span", "start": 869, "end": 889, "id": 138, "features": {"class": "toctext"}}, {"type": "li", "start": 890, "end": 980, "id": 139, "features": {"class": "toclevel-1 tocsection-5"}}, {"type": "a", "start": 890, "end": 935, "id": 140, "features": {"href": "#Methods:_Rules,_statistics,_neural_networks"}}, {"type": "span", "start": 890, "end": 891, "id": 141, "features": {"class": "tocnumber"}}, {"type": "span", "start": 892, "end": 935, "id": 142, "features": {"class": "toctext"}}, {"type": "ul", "start": 936, "end": 980, "id": 143, "features": {}}, {"type": "li", "start": 936, "end": 960, "id": 144, "features": {"class": "toclevel-2 tocsection-6"}}, {"type": "a", "start": 936, "end": 959, "id": 145, "features": {"href": "#Statistical_methods"}}, {"type": "span", "start": 936, "end": 939, "id": 146, "features": {"class": "tocnumber"}}, {"type": "span", "start": 940, "end": 959, "id": 147, "features": {"class": "toctext"}}, {"type": "li", "start": 960, "end": 980, "id": 148, "features": {"class": "toclevel-2 tocsection-7"}}, {"type": "a", "start": 960, "end": 979, "id": 149, "features": {"href": "#Neural_networks"}}, {"type": "span", "start": 960, "end": 963, "id": 150, "features": {"class": "tocnumber"}}, {"type": "span", "start": 964, "end": 979, "id": 151, "features": {"class": "toctext"}}, {"type": "li", "start": 980, "end": 1284, "id": 152, "features": {"class": "toclevel-1 tocsection-8"}}, {"type": "a", "start": 980, "end": 998, "id": 153, "features": {"href": "#Common_NLP_Tasks"}}, {"type": "span", "start": 980, "end": 981, "id": 154, "features": {"class": "tocnumber"}}, {"type": "span", "start": 982, "end": 998, "id": 155, "features": {"class": "toctext"}}, {"type": "ul", "start": 999, "end": 1284, "id": 156, "features": {}}, {"type": "li", "start": 999, "end": 1030, "id": 157, "features": {"class": "toclevel-2 tocsection-9"}}, {"type": "a", "start": 999, "end": 1029, "id": 158, "features": {"href": "#Text_and_speech_processing"}}, {"type": "span", "start": 999, "end": 1002, "id": 159, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1003, "end": 1029, "id": 160, "features": {"class": "toctext"}}, {"type": "li", "start": 1030, "end": 1057, "id": 161, "features": {"class": "toclevel-2 tocsection-10"}}, {"type": "a", "start": 1030, "end": 1056, "id": 162, "features": {"href": "#Morphological_analysis"}}, {"type": "span", "start": 1030, "end": 1033, "id": 163, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1034, "end": 1056, "id": 164, "features": {"class": "toctext"}}, {"type": "li", "start": 1057, "end": 1080, "id": 165, "features": {"class": "toclevel-2 tocsection-11"}}, {"type": "a", "start": 1057, "end": 1079, "id": 166, "features": {"href": "#Syntactic_analysis"}}, {"type": "span", "start": 1057, "end": 1060, "id": 167, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1061, "end": 1079, "id": 168, "features": {"class": "toctext"}}, {"type": "li", "start": 1080, "end": 1135, "id": 169, "features": {"class": "toclevel-2 tocsection-12"}}, {"type": "a", "start": 1080, "end": 1134, "id": 170, "features": {"href": "#Lexical_semantics_(of_individual_words_in_context)"}}, {"type": "span", "start": 1080, "end": 1083, "id": 171, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1084, "end": 1134, "id": 172, "features": {"class": "toctext"}}, {"type": "li", "start": 1135, "end": 1196, "id": 173, "features": {"class": "toclevel-2 tocsection-13"}}, {"type": "a", "start": 1135, "end": 1195, "id": 174, "features": {"href": "#Relational_semantics_(semantics_of_individual_sentences)"}}, {"type": "span", "start": 1135, "end": 1138, "id": 175, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1139, "end": 1195, "id": 176, "features": {"class": "toctext"}}, {"type": "li", "start": 1196, "end": 1250, "id": 177, "features": {"class": "toclevel-2 tocsection-14"}}, {"type": "a", "start": 1196, "end": 1249, "id": 178, "features": {"href": "#Discourse_(semantics_beyond_individual_sentences)"}}, {"type": "span", "start": 1196, "end": 1199, "id": 179, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1200, "end": 1249, "id": 180, "features": {"class": "toctext"}}, {"type": "li", "start": 1250, "end": 1284, "id": 181, "features": {"class": "toclevel-2 tocsection-15"}}, {"type": "a", "start": 1250, "end": 1283, "id": 182, "features": {"href": "#Higher-level_NLP_applications"}}, {"type": "span", "start": 1250, "end": 1253, "id": 183, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1254, "end": 1283, "id": 184, "features": {"class": "toctext"}}, {"type": "li", "start": 1284, "end": 1304, "id": 185, "features": {"class": "toclevel-1 tocsection-16"}}, {"type": "a", "start": 1284, "end": 1303, "id": 186, "features": {"href": "#Cognition_and_NLP"}}, {"type": "span", "start": 1284, "end": 1285, "id": 187, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1286, "end": 1303, "id": 188, "features": {"class": "toctext"}}, {"type": "li", "start": 1304, "end": 1315, "id": 189, "features": {"class": "toclevel-1 tocsection-17"}}, {"type": "a", "start": 1304, "end": 1314, "id": 190, "features": {"href": "#See_also"}}, {"type": "span", "start": 1304, "end": 1305, "id": 191, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1306, "end": 1314, "id": 192, "features": {"class": "toctext"}}, {"type": "li", "start": 1315, "end": 1328, "id": 193, "features": {"class": "toclevel-1 tocsection-18"}}, {"type": "a", "start": 1315, "end": 1327, "id": 194, "features": {"href": "#References"}}, {"type": "span", "start": 1315, "end": 1316, "id": 195, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1317, "end": 1327, "id": 196, "features": {"class": "toctext"}}, {"type": "li", "start": 1328, "end": 1346, "id": 197, "features": {"class": "toclevel-1 tocsection-19"}}, {"type": "a", "start": 1328, "end": 1345, "id": 198, "features": {"href": "#Further_reading"}}, {"type": "span", "start": 1328, "end": 1329, "id": 199, "features": {"class": "tocnumber"}}, {"type": "span", "start": 1330, "end": 1345, "id": 200, "features": {"class": "toctext"}}, {"type": "h2", "start": 1346, "end": 1358, "id": 201, "features": {"class": "section-heading", "onclick": "javascript:mfTempOpenSection(1)"}}, {"type": "div", "start": 1346, "end": 1346, "id": 202, "features": {"class": "mw-ui-icon mw-ui-icon-element indicator mw-ui-icon-small mw-ui-icon-flush-left"}}, {"type": "span", "start": 1346, "end": 1353, "id": 203, "features": {"class": "mw-headline", "id": "History"}}, {"type": "span", "start": 1353, "end": 1357, "id": 204, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 1353, "end": 1357, "id": 205, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=1", "title": "Edit section: History", "data-section": "1", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "section", "start": 1358, "end": 7152, "id": 206, "features": {"class": "mf-section-1 collapsible-block", "id": "mf-section-1"}}, {"type": "div", "start": 1358, "end": 1418, "id": 207, "features": {"role": "note", "class": "hatnote navigation-not-searchable"}}, {"type": "a", "start": 1379, "end": 1417, "id": 208, "features": {"href": "/wiki/History_of_natural_language_processing", "title": "History of natural language processing"}}, {"type": "p", "start": 1418, "end": 1822, "id": 209, "features": {}}, {"type": "a", "start": 1491, "end": 1502, "id": 210, "features": {"href": "/wiki/Alan_Turing", "title": "Alan Turing"}}, {"type": "a", "start": 1532, "end": 1568, "id": 211, "features": {"href": "/wiki/Computing_Machinery_and_Intelligence", "title": "Computing Machinery and Intelligence"}}, {"type": "a", "start": 1608, "end": 1619, "id": 212, "features": {"href": "/wiki/Turing_test", "title": "Turing test"}}, {"type": "h3", "start": 1822, "end": 1861, "id": 213, "features": {"class": "in-block"}}, {"type": "span", "start": 1822, "end": 1822, "id": 214, "features": {"id": "Symbolic_NLP_.281950s_-_early_1990s.29"}}, {"type": "span", "start": 1822, "end": 1856, "id": 215, "features": {"class": "mw-headline", "id": "Symbolic_NLP_(1950s_-_early_1990s)"}}, {"type": "span", "start": 1856, "end": 1860, "id": 216, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 1856, "end": 1860, "id": 217, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=2", "title": "Edit section: Symbolic NLP (1950s - early 1990s)", "data-section": "2", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "p", "start": 1861, "end": 2178, "id": 218, "features": {}}, {"type": "a", "start": 1911, "end": 1922, "id": 219, "features": {"href": "/wiki/John_Searle", "title": "John Searle"}}, {"type": "a", "start": 1925, "end": 1937, "id": 220, "features": {"href": "/wiki/Chinese_room", "title": "Chinese room"}}, {"type": "ul", "start": 2178, "end": 4525, "id": 221, "features": {}}, {"type": "li", "start": 2178, "end": 2779, "id": 222, "features": {}}, {"type": "b", "start": 2178, "end": 2183, "id": 223, "features": {}}, {"type": "a", "start": 2189, "end": 2210, "id": 224, "features": {"href": "/wiki/Georgetown-IBM_experiment", "class": "mw-redirect", "title": "Georgetown-IBM experiment"}}, {"type": "a", "start": 2234, "end": 2255, "id": 225, "features": {"href": "/wiki/Automatic_translation", "class": "mw-redirect", "title": "Automatic translation"}}, {"type": "sup", "start": 2406, "end": 2409, "id": 226, "features": {"id": "cite_ref-2", "class": "reference"}}, {"type": "a", "start": 2406, "end": 2409, "id": 227, "features": {"href": "#cite_note-2"}}, {"type": "a", "start": 2465, "end": 2477, "id": 228, "features": {"href": "/wiki/ALPAC", "title": "ALPAC"}}, {"type": "a", "start": 2723, "end": 2754, "id": 229, "features": {"href": "/wiki/Statistical_machine_translation", "title": "Statistical machine translation"}}, {"type": "li", "start": 2779, "end": 3389, "id": 230, "features": {}}, {"type": "b", "start": 2779, "end": 2784, "id": 231, "features": {}}, {"type": "a", "start": 2874, "end": 2880, "id": 232, "features": {"href": "/wiki/SHRDLU", "title": "SHRDLU"}}, {"type": "a", "start": 2931, "end": 2944, "id": 233, "features": {"href": "/wiki/Blocks_world", "title": "Blocks world"}}, {"type": "a", "start": 2980, "end": 2985, "id": 234, "features": {"href": "/wiki/ELIZA", "title": "ELIZA"}}, {"type": "a", "start": 3005, "end": 3029, "id": 235, "features": {"href": "/wiki/Rogerian_psychotherapy", "class": "mw-redirect", "title": "Rogerian psychotherapy"}}, {"type": "a", "start": 3042, "end": 3059, "id": 236, "features": {"href": "/wiki/Joseph_Weizenbaum", "title": "Joseph Weizenbaum"}}, {"type": "li", "start": 3389, "end": 3811, "id": 237, "features": {}}, {"type": "b", "start": 3389, "end": 3394, "id": 238, "features": {}}, {"type": "a", "start": 3458, "end": 3468, "id": 239, "features": {"href": "/wiki/Ontology_(information_science)", "title": "Ontology (information science)"}}, {"type": "a", "start": 3771, "end": 3782, "id": 240, "features": {"href": "/wiki/Chatterbots", "class": "mw-redirect", "title": "Chatterbots"}}, {"type": "a", "start": 3803, "end": 3808, "id": 241, "features": {"href": "/wiki/PARRY", "title": "PARRY"}}, {"type": "li", "start": 3811, "end": 4525, "id": 242, "features": {}}, {"type": "b", "start": 3811, "end": 3816, "id": 243, "features": {}}, {"type": "a", "start": 3979, "end": 3983, "id": 244, "features": {"href": "/wiki/Head-driven_phrase_structure_grammar", "title": "Head-driven phrase structure grammar"}}, {"type": "a", "start": 4025, "end": 4043, "id": 245, "features": {"href": "/wiki/Generative_grammar", "title": "Generative grammar"}}, {"type": "sup", "start": 4084, "end": 4087, "id": 246, "features": {"id": "cite_ref-3", "class": "reference"}}, {"type": "a", "start": 4084, "end": 4087, "id": 247, "features": {"href": "#cite_note-3"}}, {"type": "a", "start": 4107, "end": 4121, "id": 248, "features": {"href": "/wiki/Lesk_algorithm", "title": "Lesk algorithm"}}, {"type": "sup", "start": 4164, "end": 4167, "id": 249, "features": {"id": "cite_ref-4", "class": "reference"}}, {"type": "a", "start": 4164, "end": 4167, "id": 250, "features": {"href": "#cite_note-4"}}, {"type": "a", "start": 4233, "end": 4260, "id": 251, "features": {"href": "/wiki/Rhetorical_structure_theory", "title": "Rhetorical structure theory"}}, {"type": "a", "start": 4345, "end": 4351, "id": 252, "features": {"href": "/wiki/Racter", "title": "Racter"}}, {"type": "a", "start": 4356, "end": 4367, "id": 253, "features": {"href": "/wiki/Jabberwacky", "title": "Jabberwacky"}}, {"type": "sup", "start": 4521, "end": 4524, "id": 254, "features": {"id": "cite_ref-5", "class": "reference"}}, {"type": "a", "start": 4521, "end": 4524, "id": 255, "features": {"href": "#cite_note-5"}}, {"type": "h3", "start": 4525, "end": 4561, "id": 256, "features": {"class": "in-block"}}, {"type": "span", "start": 4525, "end": 4525, "id": 257, "features": {"id": "Statistical_NLP_.281990s_-_2010s.29"}}, {"type": "span", "start": 4525, "end": 4556, "id": 258, "features": {"class": "mw-headline", "id": "Statistical_NLP_(1990s_-_2010s)"}}, {"type": "span", "start": 4556, "end": 4560, "id": 259, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 4556, "end": 4560, "id": 260, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=3", "title": "Edit section: Statistical NLP (1990s - 2010s)", "data-section": "3", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "p", "start": 4561, "end": 5182, "id": 261, "features": {}}, {"type": "a", "start": 4786, "end": 4802, "id": 262, "features": {"href": "/wiki/Machine_learning", "title": "Machine learning"}}, {"type": "a", "start": 4909, "end": 4920, "id": 263, "features": {"href": "/wiki/Moore%27s_law", "title": "Moore's law"}}, {"type": "a", "start": 4968, "end": 4977, "id": 264, "features": {"href": "/wiki/Noam_Chomsky", "title": "Noam Chomsky"}}, {"type": "a", "start": 5008, "end": 5032, "id": 265, "features": {"href": "/wiki/Transformational_grammar", "title": "Transformational grammar"}}, {"type": "a", "start": 5091, "end": 5109, "id": 266, "features": {"href": "/wiki/Corpus_linguistics", "title": "Corpus linguistics"}}, {"type": "sup", "start": 5178, "end": 5181, "id": 267, "features": {"id": "cite_ref-6", "class": "reference"}}, {"type": "a", "start": 5178, "end": 5181, "id": 268, "features": {"href": "#cite_note-6"}}, {"type": "ul", "start": 5182, "end": 6774, "id": 269, "features": {}}, {"type": "li", "start": 5182, "end": 5976, "id": 270, "features": {}}, {"type": "b", "start": 5182, "end": 5187, "id": 271, "features": {}}, {"type": "a", "start": 5280, "end": 5299, "id": 272, "features": {"href": "/wiki/Machine_translation", "title": "Machine translation"}}, {"type": "a", "start": 5409, "end": 5424, "id": 273, "features": {"href": "/wiki/Text_corpus", "title": "Text corpus"}}, {"type": "a", "start": 5455, "end": 5475, "id": 274, "features": {"href": "/wiki/Parliament_of_Canada", "title": "Parliament of Canada"}}, {"type": "a", "start": 5484, "end": 5498, "id": 275, "features": {"href": "/wiki/European_Union", "title": "European Union"}}, {"type": "li", "start": 5976, "end": 6774, "id": 276, "features": {}}, {"type": "b", "start": 5976, "end": 5981, "id": 277, "features": {}}, {"type": "a", "start": 6149, "end": 6161, "id": 278, "features": {"href": "/wiki/Unsupervised_learning", "title": "Unsupervised learning"}}, {"type": "a", "start": 6166, "end": 6190, "id": 279, "features": {"href": "/wiki/Semi-supervised_learning", "title": "Semi-supervised learning"}}, {"type": "a", "start": 6408, "end": 6427, "id": 280, "features": {"href": "/wiki/Supervised_learning", "title": "Supervised learning"}}, {"type": "a", "start": 6636, "end": 6650, "id": 281, "features": {"href": "/wiki/World_Wide_Web", "title": "World Wide Web"}}, {"type": "a", "start": 6741, "end": 6756, "id": 282, "features": {"href": "/wiki/Time_complexity", "title": "Time complexity"}}, {"type": "h3", "start": 6774, "end": 6799, "id": 283, "features": {"class": "in-block"}}, {"type": "span", "start": 6774, "end": 6774, "id": 284, "features": {"id": "Neural_NLP_.28present.29"}}, {"type": "span", "start": 6774, "end": 6794, "id": 285, "features": {"class": "mw-headline", "id": "Neural_NLP_(present)"}}, {"type": "span", "start": 6794, "end": 6798, "id": 286, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 6794, "end": 6798, "id": 287, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=4", "title": "Edit section: Neural NLP (present)", "data-section": "4", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "p", "start": 6799, "end": 7152, "id": 288, "features": {}}, {"type": "a", "start": 6813, "end": 6836, "id": 289, "features": {"href": "/wiki/Representation_learning", "class": "mw-redirect", "title": "Representation learning"}}, {"type": "a", "start": 6841, "end": 6860, "id": 290, "features": {"href": "/wiki/Deep_learning", "title": "Deep learning"}}, {"type": "sup", "start": 7005, "end": 7008, "id": 291, "features": {"id": "cite_ref-goldberg:nnlp17_7-0", "class": "reference"}}, {"type": "a", "start": 7005, "end": 7008, "id": 292, "features": {"href": "#cite_note-goldberg:nnlp17-7"}}, {"type": "sup", "start": 7008, "end": 7011, "id": 293, "features": {"id": "cite_ref-goodfellow:book16_8-0", "class": "reference"}}, {"type": "a", "start": 7008, "end": 7011, "id": 294, "features": {"href": "#cite_note-goodfellow:book16-8"}}, {"type": "sup", "start": 7114, "end": 7117, "id": 295, "features": {"id": "cite_ref-jozefowicz:lm16_9-0", "class": "reference"}}, {"type": "a", "start": 7114, "end": 7117, "id": 296, "features": {"href": "#cite_note-jozefowicz:lm16-9"}}, {"type": "sup", "start": 7126, "end": 7130, "id": 297, "features": {"id": "cite_ref-choe:emnlp16_10-0", "class": "reference"}}, {"type": "a", "start": 7126, "end": 7130, "id": 298, "features": {"href": "#cite_note-choe:emnlp16-10"}}, {"type": "sup", "start": 7130, "end": 7134, "id": 299, "features": {"id": "cite_ref-vinyals:nips15_11-0", "class": "reference"}}, {"type": "a", "start": 7130, "end": 7134, "id": 300, "features": {"href": "#cite_note-vinyals:nips15-11"}}, {"type": "h2", "start": 7152, "end": 7200, "id": 301, "features": {"class": "section-heading", "onclick": "javascript:mfTempOpenSection(2)"}}, {"type": "div", "start": 7152, "end": 7152, "id": 302, "features": {"class": "mw-ui-icon mw-ui-icon-element indicator mw-ui-icon-small mw-ui-icon-flush-left"}}, {"type": "span", "start": 7152, "end": 7152, "id": 303, "features": {"id": "Methods:_Rules.2C_statistics.2C_neural_networks"}}, {"type": "span", "start": 7152, "end": 7195, "id": 304, "features": {"class": "mw-headline", "id": "Methods:_Rules,_statistics,_neural_networks"}}, {"type": "span", "start": 7195, "end": 7195, "id": 305, "features": {"class": "anchor", "id": "Statistical_natural_language_processing_(SNLP)"}}, {"type": "span", "start": 7195, "end": 7199, "id": 306, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 7195, "end": 7199, "id": 307, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=5", "title": "Edit section: Methods: Rules, statistics, neural networks", "data-section": "5", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "section", "start": 7200, "end": 12638, "id": 308, "features": {"class": "mf-section-2 collapsible-block", "id": "mf-section-2"}}, {"type": "p", "start": 7200, "end": 7440, "id": 309, "features": {}}, {"type": "sup", "start": 7361, "end": 7365, "id": 310, "features": {"id": "cite_ref-winograd:shrdlu71_12-0", "class": "reference"}}, {"type": "a", "start": 7361, "end": 7365, "id": 311, "features": {"href": "#cite_note-winograd:shrdlu71-12"}}, {"type": "sup", "start": 7365, "end": 7369, "id": 312, "features": {"id": "cite_ref-schank77_13-0", "class": "reference"}}, {"type": "a", "start": 7365, "end": 7369, "id": 313, "features": {"href": "#cite_note-schank77-13"}}, {"type": "a", "start": 7430, "end": 7438, "id": 314, "features": {"href": "/wiki/Stemming", "title": "Stemming"}}, {"type": "p", "start": 7440, "end": 7545, "id": 315, "features": {}}, {"type": "a", "start": 7469, "end": 7485, "id": 316, "features": {"href": "/wiki/Machine_learning", "title": "Machine learning"}}, {"type": "ul", "start": 7545, "end": 8881, "id": 317, "features": {}}, {"type": "li", "start": 7545, "end": 7747, "id": 318, "features": {}}, {"type": "li", "start": 7747, "end": 8238, "id": 319, "features": {}}, {"type": "li", "start": 8238, "end": 8881, "id": 320, "features": {}}, {"type": "p", "start": 8881, "end": 8989, "id": 321, "features": {}}, {"type": "ul", "start": 8989, "end": 9365, "id": 322, "features": {}}, {"type": "li", "start": 8989, "end": 9187, "id": 323, "features": {}}, {"type": "a", "start": 9170, "end": 9178, "id": 324, "features": {"href": "/wiki/Apertium", "title": "Apertium"}}, {"type": "li", "start": 9187, "end": 9246, "id": 325, "features": {}}, {"type": "a", "start": 9229, "end": 9241, "id": 326, "features": {"href": "/wiki/Tokenization_(lexical_analysis)", "class": "mw-redirect", "title": "Tokenization (lexical analysis)"}}, {"type": "li", "start": 9246, "end": 9365, "id": 327, "features": {}}, {"type": "a", "start": 9321, "end": 9341, "id": 328, "features": {"href": "/wiki/Knowledge_extraction", "title": "Knowledge extraction"}}, {"type": "h3", "start": 9365, "end": 9389, "id": 329, "features": {"class": "in-block"}}, {"type": "span", "start": 9365, "end": 9384, "id": 330, "features": {"class": "mw-headline", "id": "Statistical_methods"}}, {"type": "span", "start": 9384, "end": 9388, "id": 331, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 9384, "end": 9388, "id": 332, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=6", "title": "Edit section: Statistical methods", "data-section": "6", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "p", "start": 9389, "end": 9833, "id": 333, "features": {}}, {"type": "sup", "start": 9433, "end": 9437, "id": 334, "features": {"id": "cite_ref-johnson:eacl:ilcl09_14-0", "class": "reference"}}, {"type": "a", "start": 9433, "end": 9437, "id": 335, "features": {"href": "#cite_note-johnson:eacl:ilcl09-14"}}, {"type": "sup", "start": 9437, "end": 9441, "id": 336, "features": {"id": "cite_ref-resnik:langlog11_15-0", "class": "reference"}}, {"type": "a", "start": 9437, "end": 9441, "id": 337, "features": {"href": "#cite_note-resnik:langlog11-15"}}, {"type": "a", "start": 9611, "end": 9632, "id": 338, "features": {"href": "/wiki/Statistical_inference", "title": "Statistical inference"}}, {"type": "i", "start": 9697, "end": 9704, "id": 339, "features": {}}, {"type": "a", "start": 9697, "end": 9704, "id": 340, "features": {"href": "/wiki/Text_corpus", "title": "Text corpus"}}, {"type": "i", "start": 9725, "end": 9731, "id": 341, "features": {}}, {"type": "p", "start": 9833, "end": 10440, "id": 342, "features": {}}, {"type": "a", "start": 10087, "end": 10105, "id": 343, "features": {"href": "/wiki/Statistical_models", "class": "mw-redirect", "title": "Statistical models"}}, {"type": "a", "start": 10124, "end": 10137, "id": 344, "features": {"href": "/wiki/Probabilistic", "class": "mw-redirect", "title": "Probabilistic"}}, {"type": "a", "start": 10167, "end": 10178, "id": 345, "features": {"href": "/wiki/Real-valued", "class": "mw-redirect", "title": "Real-valued"}}, {"type": "p", "start": 10440, "end": 11257, "id": 346, "features": {}}, {"type": "a", "start": 10503, "end": 10517, "id": 347, "features": {"href": "/wiki/Decision_tree", "title": "Decision tree"}}, {"type": "a", "start": 10608, "end": 10630, "id": 348, "features": {"href": "/wiki/Part_of_speech_tagging", "class": "mw-redirect", "title": "Part of speech tagging"}}, {"type": "a", "start": 10653, "end": 10673, "id": 349, "features": {"href": "/wiki/Hidden_Markov_models", "class": "mw-redirect", "title": "Hidden Markov models"}}, {"type": "a", "start": 10748, "end": 10766, "id": 350, "features": {"href": "/wiki/Statistical_models", "class": "mw-redirect", "title": "Statistical models"}}, {"type": "a", "start": 10785, "end": 10798, "id": 351, "features": {"href": "/wiki/Probabilistic", "class": "mw-redirect", "title": "Probabilistic"}}, {"type": "a", "start": 10828, "end": 10839, "id": 352, "features": {"href": "/wiki/Real-valued", "class": "mw-redirect", "title": "Real-valued"}}, {"type": "a", "start": 10894, "end": 10915, "id": 353, "features": {"href": "/wiki/Cache_language_model", "title": "Cache language model"}}, {"type": "a", "start": 10932, "end": 10950, "id": 354, "features": {"href": "/wiki/Speech_recognition", "title": "Speech recognition"}}, {"type": "p", "start": 11257, "end": 11482, "id": 355, "features": {}}, {"type": "h3", "start": 11482, "end": 11502, "id": 356, "features": {"class": "in-block"}}, {"type": "span", "start": 11482, "end": 11497, "id": 357, "features": {"class": "mw-headline", "id": "Neural_networks"}}, {"type": "span", "start": 11497, "end": 11501, "id": 358, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 11497, "end": 11501, "id": 359, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=7", "title": "Edit section: Neural networks", "data-section": "7", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "div", "start": 11502, "end": 11549, "id": 360, "features": {"role": "note", "class": "hatnote navigation-not-searchable"}}, {"type": "a", "start": 11523, "end": 11548, "id": 361, "features": {"href": "/wiki/Artificial_neural_network", "title": "Artificial neural network"}}, {"type": "p", "start": 11549, "end": 12638, "id": 362, "features": {}}, {"type": "sup", "start": 11663, "end": 11667, "id": 363, "features": {"id": "cite_ref-16", "class": "reference"}}, {"type": "a", "start": 11663, "end": 11667, "id": 364, "features": {"href": "#cite_note-16"}}, {"type": "a", "start": 11740, "end": 11755, "id": 365, "features": {"href": "/wiki/Neural_network", "title": "Neural network"}}, {"type": "a", "start": 11816, "end": 11831, "id": 366, "features": {"href": "/wiki/Word_embedding", "title": "Word embedding"}}, {"type": "i", "start": 12330, "end": 12356, "id": 367, "features": {}}, {"type": "a", "start": 12330, "end": 12356, "id": 368, "features": {"href": "/wiki/Neural_machine_translation", "title": "Neural machine translation"}}, {"type": "a", "start": 12457, "end": 12477, "id": 369, "features": {"href": "/wiki/Seq2seq", "title": "Seq2seq"}}, {"type": "a", "start": 12599, "end": 12630, "id": 370, "features": {"href": "/wiki/Statistical_machine_translation", "title": "Statistical machine translation"}}, {"type": "h2", "start": 12638, "end": 12659, "id": 371, "features": {"class": "section-heading", "onclick": "javascript:mfTempOpenSection(3)"}}, {"type": "div", "start": 12638, "end": 12638, "id": 372, "features": {"class": "mw-ui-icon mw-ui-icon-element indicator mw-ui-icon-small mw-ui-icon-flush-left"}}, {"type": "span", "start": 12638, "end": 12654, "id": 373, "features": {"class": "mw-headline", "id": "Common_NLP_Tasks"}}, {"type": "span", "start": 12654, "end": 12658, "id": 374, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 12654, "end": 12658, "id": 375, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=8", "title": "Edit section: Common NLP Tasks", "data-section": "8", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "section", "start": 12659, "end": 25447, "id": 376, "features": {"class": "mf-section-3 collapsible-block", "id": "mf-section-3"}}, {"type": "p", "start": 12659, "end": 12909, "id": 377, "features": {}}, {"type": "p", "start": 12909, "end": 13065, "id": 378, "features": {}}, {"type": "h3", "start": 13065, "end": 13096, "id": 379, "features": {"class": "in-block"}}, {"type": "span", "start": 13065, "end": 13091, "id": 380, "features": {"class": "mw-headline", "id": "Text_and_speech_processing"}}, {"type": "span", "start": 13091, "end": 13095, "id": 381, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 13091, "end": 13095, "id": 382, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=9", "title": "Edit section: Text and speech processing", "data-section": "9", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "dl", "start": 13096, "end": 13207, "id": 383, "features": {}}, {"type": "dt", "start": 13096, "end": 13131, "id": 384, "features": {}}, {"type": "a", "start": 13096, "end": 13125, "id": 385, "features": {"href": "/wiki/Optical_character_recognition", "title": "Optical character recognition"}}, {"type": "dd", "start": 13132, "end": 13207, "id": 386, "features": {}}, {"type": "dl", "start": 13207, "end": 14245, "id": 387, "features": {}}, {"type": "dt", "start": 13207, "end": 13225, "id": 388, "features": {}}, {"type": "a", "start": 13207, "end": 13225, "id": 389, "features": {"href": "/wiki/Speech_recognition", "title": "Speech recognition"}}, {"type": "dd", "start": 13226, "end": 14086, "id": 390, "features": {}}, {"type": "a", "start": 13354, "end": 13368, "id": 391, "features": {"href": "/wiki/Text_to_speech", "class": "mw-redirect", "title": "Text to speech"}}, {"type": "a", "start": 13437, "end": 13448, "id": 392, "features": {"href": "/wiki/AI-complete", "title": "AI-complete"}}, {"type": "a", "start": 13467, "end": 13481, "id": 393, "features": {"href": "/wiki/Natural_speech", "class": "mw-redirect", "title": "Natural speech"}}, {"type": "a", "start": 13545, "end": 13564, "id": 394, "features": {"href": "/wiki/Speech_segmentation", "title": "Speech segmentation"}}, {"type": "a", "start": 13734, "end": 13748, "id": 395, "features": {"href": "/wiki/Coarticulation", "title": "Coarticulation"}}, {"type": "a", "start": 13775, "end": 13788, "id": 396, "features": {"href": "/wiki/Analog_signal", "title": "Analog signal"}}, {"type": "dt", "start": 14087, "end": 14106, "id": 397, "features": {}}, {"type": "a", "start": 14087, "end": 14106, "id": 398, "features": {"href": "/wiki/Speech_segmentation", "title": "Speech segmentation"}}, {"type": "dd", "start": 14107, "end": 14245, "id": 399, "features": {}}, {"type": "a", "start": 14196, "end": 14214, "id": 400, "features": {"href": "/wiki/Speech_recognition", "title": "Speech recognition"}}, {"type": "dl", "start": 14245, "end": 14393, "id": 401, "features": {}}, {"type": "dt", "start": 14245, "end": 14259, "id": 402, "features": {}}, {"type": "a", "start": 14245, "end": 14259, "id": 403, "features": {"href": "/wiki/Text-to-speech", "class": "mw-redirect", "title": "Text-to-speech"}}, {"type": "dd", "start": 14260, "end": 14393, "id": 404, "features": {}}, {"type": "sup", "start": 14389, "end": 14393, "id": 405, "features": {"id": "cite_ref-17", "class": "reference"}}, {"type": "a", "start": 14389, "end": 14393, "id": 406, "features": {"href": "#cite_note-17"}}, {"type": "dl", "start": 14393, "end": 14927, "id": 407, "features": {}}, {"type": "dt", "start": 14393, "end": 14425, "id": 408, "features": {}}, {"type": "a", "start": 14393, "end": 14410, "id": 409, "features": {"href": "/wiki/Word_segmentation", "class": "mw-redirect", "title": "Word segmentation"}}, {"type": "a", "start": 14412, "end": 14424, "id": 410, "features": {"href": "/wiki/Tokenization_(lexical_analysis)", "class": "mw-redirect", "title": "Tokenization (lexical analysis)"}}, {"type": "dd", "start": 14426, "end": 14927, "id": 411, "features": {}}, {"type": "a", "start": 14503, "end": 14510, "id": 412, "features": {"href": "/wiki/English_language", "title": "English language"}}, {"type": "a", "start": 14618, "end": 14625, "id": 413, "features": {"href": "/wiki/Chinese_language", "title": "Chinese language"}}, {"type": "a", "start": 14627, "end": 14635, "id": 414, "features": {"href": "/wiki/Japanese_language", "title": "Japanese language"}}, {"type": "a", "start": 14640, "end": 14644, "id": 415, "features": {"href": "/wiki/Thai_language", "title": "Thai language"}}, {"type": "a", "start": 14782, "end": 14792, "id": 416, "features": {"href": "/wiki/Vocabulary", "title": "Vocabulary"}}, {"type": "a", "start": 14797, "end": 14807, "id": 417, "features": {"href": "/wiki/Morphology_(linguistics)", "title": "Morphology (linguistics)"}}, {"type": "a", "start": 14884, "end": 14896, "id": 418, "features": {"href": "/wiki/Bag_of_words", "class": "mw-redirect", "title": "Bag of words"}}, {"type": "h3", "start": 14928, "end": 14955, "id": 419, "features": {"class": "in-block"}}, {"type": "span", "start": 14928, "end": 14950, "id": 420, "features": {"class": "mw-headline", "id": "Morphological_analysis"}}, {"type": "span", "start": 14950, "end": 14954, "id": 421, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 14950, "end": 14954, "id": 422, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=10", "title": "Edit section: Morphological analysis", "data-section": "10", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "dl", "start": 14955, "end": 16525, "id": 423, "features": {}}, {"type": "dt", "start": 14955, "end": 14968, "id": 424, "features": {}}, {"type": "a", "start": 14955, "end": 14968, "id": 425, "features": {"href": "/wiki/Lemmatisation", "title": "Lemmatisation"}}, {"type": "dd", "start": 14969, "end": 15096, "id": 426, "features": {}}, {"type": "dt", "start": 15097, "end": 15123, "id": 427, "features": {}}, {"type": "a", "start": 15097, "end": 15123, "id": 428, "features": {"href": "/wiki/Morphology_(linguistics)", "title": "Morphology (linguistics)"}}, {"type": "dd", "start": 15124, "end": 15776, "id": 429, "features": {}}, {"type": "a", "start": 15155, "end": 15164, "id": 430, "features": {"href": "/wiki/Morpheme", "title": "Morpheme"}}, {"type": "a", "start": 15275, "end": 15285, "id": 431, "features": {"href": "/wiki/Morphology_(linguistics)", "title": "Morphology (linguistics)"}}, {"type": "i", "start": 15287, "end": 15291, "id": 432, "features": {}}, {"type": "a", "start": 15351, "end": 15358, "id": 433, "features": {"href": "/wiki/English_language", "title": "English language"}}, {"type": "a", "start": 15400, "end": 15423, "id": 434, "features": {"href": "/wiki/Inflectional_morphology", "class": "mw-redirect", "title": "Inflectional morphology"}}, {"type": "i", "start": 15531, "end": 15535, "id": 435, "features": {}}, {"type": "a", "start": 15609, "end": 15616, "id": 436, "features": {"href": "/wiki/Turkish_language", "title": "Turkish language"}}, {"type": "a", "start": 15620, "end": 15626, "id": 437, "features": {"href": "/wiki/Meitei_language", "title": "Meitei language"}}, {"type": "sup", "start": 15627, "end": 15631, "id": 438, "features": {"id": "cite_ref-18", "class": "reference"}}, {"type": "a", "start": 15627, "end": 15631, "id": 439, "features": {"href": "#cite_note-18"}}, {"type": "a", "start": 15641, "end": 15653, "id": 440, "features": {"href": "/wiki/Agglutination", "title": "Agglutination"}}, {"type": "dt", "start": 15777, "end": 15799, "id": 441, "features": {}}, {"type": "a", "start": 15777, "end": 15799, "id": 442, "features": {"href": "/wiki/Part-of-speech_tagging", "title": "Part-of-speech tagging"}}, {"type": "dd", "start": 15800, "end": 16525, "id": 443, "features": {}}, {"type": "a", "start": 15832, "end": 15846, "id": 444, "features": {"href": "/wiki/Part_of_speech", "title": "Part of speech"}}, {"type": "a", "start": 15926, "end": 15941, "id": 445, "features": {"href": "/wiki/Parts_of_speech", "class": "mw-redirect", "title": "Parts of speech"}}, {"type": "a", "start": 15972, "end": 15976, "id": 446, "features": {"href": "/wiki/Noun", "title": "Noun"}}, {"type": "a", "start": 16006, "end": 16010, "id": 447, "features": {"href": "/wiki/Verb", "title": "Verb"}}, {"type": "a", "start": 16048, "end": 16052, "id": 448, "features": {"href": "/wiki/Noun", "title": "Noun"}}, {"type": "a", "start": 16054, "end": 16058, "id": 449, "features": {"href": "/wiki/Verb", "title": "Verb"}}, {"type": "a", "start": 16062, "end": 16071, "id": 450, "features": {"href": "/wiki/Adjective", "title": "Adjective"}}, {"type": "sup", "start": 16190, "end": 16210, "id": 451, "features": {"class": "noprint Inline-Template", "style": "white-space:nowrap;"}}, {"type": "i", "start": 16191, "end": 16209, "id": 452, "features": {}}, {"type": "a", "start": 16191, "end": 16198, "id": 453, "features": {"href": "/wiki/Wikipedia:Accuracy_dispute#Disputed_statement", "title": "Wikipedia:Accuracy dispute"}}, {"type": "span", "start": 16191, "end": 16198, "id": 454, "features": {"title": "The material near this tag is possibly inaccurate or nonfactual. (June 2018)"}}, {"type": "span", "start": 16199, "end": 16209, "id": 455, "features": {"class": "metadata"}}, {"type": "a", "start": 16202, "end": 16209, "id": 456, "features": {"href": "/wiki/Talk:Natural_language_processing#Dubious", "title": "Talk:Natural language processing"}}, {"type": "a", "start": 16233, "end": 16256, "id": 457, "features": {"href": "/wiki/Inflectional_morphology", "class": "mw-redirect", "title": "Inflectional morphology"}}, {"type": "a", "start": 16266, "end": 16273, "id": 458, "features": {"href": "/wiki/English_language", "title": "English language"}}, {"type": "a", "start": 16317, "end": 16324, "id": 459, "features": {"href": "/wiki/Chinese_language", "title": "Chinese language"}}, {"type": "a", "start": 16368, "end": 16382, "id": 460, "features": {"href": "/wiki/Tonal_language", "class": "mw-redirect", "title": "Tonal language"}}, {"type": "dl", "start": 16525, "end": 16698, "id": 461, "features": {}}, {"type": "dt", "start": 16525, "end": 16533, "id": 462, "features": {}}, {"type": "a", "start": 16525, "end": 16533, "id": 463, "features": {"href": "/wiki/Stemming", "title": "Stemming"}}, {"type": "dd", "start": 16534, "end": 16698, "id": 464, "features": {}}, {"type": "i", "start": 16618, "end": 16622, "id": 465, "features": {}}, {"type": "h3", "start": 16699, "end": 16722, "id": 466, "features": {"class": "in-block"}}, {"type": "span", "start": 16699, "end": 16717, "id": 467, "features": {"class": "mw-headline", "id": "Syntactic_analysis"}}, {"type": "span", "start": 16717, "end": 16721, "id": 468, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 16717, "end": 16721, "id": 469, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=11", "title": "Edit section: Syntactic analysis", "data-section": "11", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "dl", "start": 16722, "end": 17777, "id": 470, "features": {}}, {"type": "dt", "start": 16722, "end": 16743, "id": 471, "features": {}}, {"type": "a", "start": 16722, "end": 16739, "id": 472, "features": {"href": "/wiki/Grammar_induction", "title": "Grammar induction"}}, {"type": "sup", "start": 16739, "end": 16743, "id": 473, "features": {"id": "cite_ref-19", "class": "reference"}}, {"type": "a", "start": 16739, "end": 16743, "id": 474, "features": {"href": "#cite_note-19"}}, {"type": "dd", "start": 16744, "end": 16805, "id": 475, "features": {}}, {"type": "a", "start": 16755, "end": 16769, "id": 476, "features": {"href": "/wiki/Formal_grammar", "title": "Formal grammar"}}, {"type": "dt", "start": 16806, "end": 16874, "id": 477, "features": {}}, {"type": "a", "start": 16806, "end": 16823, "id": 478, "features": {"href": "/wiki/Sentence_breaking", "class": "mw-redirect", "title": "Sentence breaking"}}, {"type": "a", "start": 16840, "end": 16872, "id": 479, "features": {"href": "/wiki/Sentence_boundary_disambiguation", "title": "Sentence boundary disambiguation"}}, {"type": "dd", "start": 16875, "end": 17085, "id": 480, "features": {}}, {"type": "a", "start": 16968, "end": 16975, "id": 481, "features": {"href": "/wiki/Full_stop", "title": "Full stop"}}, {"type": "a", "start": 16985, "end": 17002, "id": 482, "features": {"href": "/wiki/Punctuation_mark", "class": "mw-redirect", "title": "Punctuation mark"}}, {"type": "i", "start": 17056, "end": 17060, "id": 483, "features": {}}, {"type": "a", "start": 17070, "end": 17083, "id": 484, "features": {"href": "/wiki/Abbreviation", "title": "Abbreviation"}}, {"type": "dt", "start": 17086, "end": 17093, "id": 485, "features": {}}, {"type": "a", "start": 17086, "end": 17093, "id": 486, "features": {"href": "/wiki/Parsing", "title": "Parsing"}}, {"type": "dd", "start": 17094, "end": 17777, "id": 487, "features": {}}, {"type": "a", "start": 17108, "end": 17118, "id": 488, "features": {"href": "/wiki/Parse_tree", "title": "Parse tree"}}, {"type": "a", "start": 17167, "end": 17174, "id": 489, "features": {"href": "/wiki/Grammar", "title": "Grammar"}}, {"type": "a", "start": 17179, "end": 17196, "id": 490, "features": {"href": "/wiki/Natural_language", "title": "Natural language"}}, {"type": "a", "start": 17200, "end": 17209, "id": 491, "features": {"href": "/wiki/Ambiguous", "class": "mw-redirect", "title": "Ambiguous"}}, {"type": "i", "start": 17454, "end": 17472, "id": 492, "features": {}}, {"type": "i", "start": 17477, "end": 17497, "id": 493, "features": {}}, {"type": "a", "start": 17705, "end": 17739, "id": 494, "features": {"href": "/wiki/Probabilistic_context-free_grammar", "title": "Probabilistic context-free grammar"}}, {"type": "i", "start": 17757, "end": 17775, "id": 495, "features": {}}, {"type": "a", "start": 17757, "end": 17775, "id": 496, "features": {"href": "/wiki/Stochastic_grammar", "title": "Stochastic grammar"}}, {"type": "h3", "start": 17778, "end": 17833, "id": 497, "features": {"class": "in-block"}}, {"type": "span", "start": 17778, "end": 17778, "id": 498, "features": {"id": "Lexical_semantics_.28of_individual_words_in_context.29"}}, {"type": "span", "start": 17778, "end": 17828, "id": 499, "features": {"class": "mw-headline", "id": "Lexical_semantics_(of_individual_words_in_context)"}}, {"type": "span", "start": 17828, "end": 17832, "id": 500, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 17828, "end": 17832, "id": 501, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=12", "title": "Edit section: Lexical semantics (of individual words in context)", "data-section": "12", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "dl", "start": 17833, "end": 18935, "id": 502, "features": {}}, {"type": "dt", "start": 17833, "end": 17850, "id": 503, "features": {}}, {"type": "a", "start": 17833, "end": 17850, "id": 504, "features": {"href": "/wiki/Lexical_semantics", "title": "Lexical semantics"}}, {"type": "dd", "start": 17851, "end": 17916, "id": 505, "features": {}}, {"type": "dt", "start": 17917, "end": 17941, "id": 506, "features": {}}, {"type": "a", "start": 17917, "end": 17941, "id": 507, "features": {"href": "/wiki/Distributional_semantics", "title": "Distributional semantics"}}, {"type": "dd", "start": 17942, "end": 17994, "id": 508, "features": {}}, {"type": "dt", "start": 17995, "end": 18025, "id": 509, "features": {}}, {"type": "a", "start": 17995, "end": 18019, "id": 510, "features": {"href": "/wiki/Named_entity_recognition", "class": "mw-redirect", "title": "Named entity recognition"}}, {"type": "dd", "start": 18026, "end": 18935, "id": 511, "features": {}}, {"type": "a", "start": 18218, "end": 18232, "id": 512, "features": {"href": "/wiki/Capitalization", "title": "Capitalization"}}, {"type": "a", "start": 18635, "end": 18642, "id": 513, "features": {"href": "/wiki/Chinese_language", "title": "Chinese language"}}, {"type": "a", "start": 18646, "end": 18652, "id": 514, "features": {"href": "/wiki/Arabic_language", "class": "mw-redirect", "title": "Arabic language"}}, {"type": "a", "start": 18795, "end": 18801, "id": 515, "features": {"href": "/wiki/German_language", "title": "German language"}}, {"type": "a", "start": 18818, "end": 18823, "id": 516, "features": {"href": "/wiki/Noun", "title": "Noun"}}, {"type": "a", "start": 18867, "end": 18873, "id": 517, "features": {"href": "/wiki/French_language", "title": "French language"}}, {"type": "a", "start": 18878, "end": 18885, "id": 518, "features": {"href": "/wiki/Spanish_language", "title": "Spanish language"}}, {"type": "a", "start": 18924, "end": 18934, "id": 519, "features": {"href": "/wiki/Adjective", "title": "Adjective"}}, {"type": "dl", "start": 18935, "end": 19230, "id": 520, "features": {}}, {"type": "dt", "start": 18935, "end": 18994, "id": 521, "features": {}}, {"type": "a", "start": 18935, "end": 18953, "id": 522, "features": {"href": "/wiki/Sentiment_analysis", "title": "Sentiment analysis"}}, {"type": "a", "start": 18964, "end": 18993, "id": 523, "features": {"href": "/wiki/Multimodal_sentiment_analysis", "title": "Multimodal sentiment analysis"}}, {"type": "dd", "start": 18995, "end": 19230, "id": 524, "features": {}}, {"type": "dl", "start": 19230, "end": 19632, "id": 525, "features": {}}, {"type": "dt", "start": 19230, "end": 19230, "id": 526, "features": {}}, {"type": "dl", "start": 19230, "end": 19252, "id": 527, "features": {}}, {"type": "dt", "start": 19230, "end": 19252, "id": 528, "features": {}}, {"type": "a", "start": 19230, "end": 19252, "id": 529, "features": {"href": "/wiki/Terminology_extraction", "title": "Terminology extraction"}}, {"type": "dd", "start": 19252, "end": 19350, "id": 530, "features": {}}, {"type": "dt", "start": 19351, "end": 19376, "id": 531, "features": {}}, {"type": "a", "start": 19351, "end": 19376, "id": 532, "features": {"href": "/wiki/Word_sense_disambiguation", "class": "mw-redirect", "title": "Word sense disambiguation"}}, {"type": "dd", "start": 19377, "end": 19632, "id": 533, "features": {}}, {"type": "a", "start": 19407, "end": 19414, "id": 534, "features": {"href": "/wiki/Meaning_(linguistics)", "class": "mw-redirect", "title": "Meaning (linguistics)"}}, {"type": "a", "start": 19624, "end": 19631, "id": 535, "features": {"href": "/wiki/WordNet", "title": "WordNet"}}, {"type": "h3", "start": 19633, "end": 19694, "id": 536, "features": {"class": "in-block"}}, {"type": "span", "start": 19633, "end": 19633, "id": 537, "features": {"id": "Relational_semantics_.28semantics_of_individual_sentences.29"}}, {"type": "span", "start": 19633, "end": 19689, "id": 538, "features": {"class": "mw-headline", "id": "Relational_semantics_(semantics_of_individual_sentences)"}}, {"type": "span", "start": 19689, "end": 19693, "id": 539, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 19689, "end": 19693, "id": 540, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=13", "title": "Edit section: Relational semantics (semantics of individual sentences)", "data-section": "13", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "dl", "start": 19694, "end": 20570, "id": 541, "features": {}}, {"type": "dt", "start": 19694, "end": 19717, "id": 542, "features": {}}, {"type": "a", "start": 19694, "end": 19717, "id": 543, "features": {"href": "/wiki/Relationship_extraction", "title": "Relationship extraction"}}, {"type": "dd", "start": 19718, "end": 19819, "id": 544, "features": {}}, {"type": "dt", "start": 19820, "end": 19836, "id": 545, "features": {}}, {"type": "a", "start": 19820, "end": 19836, "id": 546, "features": {"href": "/wiki/Semantic_parsing", "title": "Semantic parsing"}}, {"type": "dd", "start": 19837, "end": 20338, "id": 547, "features": {}}, {"type": "a", "start": 19961, "end": 19972, "id": 548, "features": {"href": "/wiki/Abstract_Meaning_Representation", "title": "Abstract Meaning Representation"}}, {"type": "a", "start": 20026, "end": 20037, "id": 549, "features": {"href": "/wiki/Discourse_representation_theory", "title": "Discourse representation theory"}}, {"type": "dt", "start": 20339, "end": 20412, "id": 550, "features": {}}, {"type": "a", "start": 20339, "end": 20362, "id": 551, "features": {"href": "/wiki/Semantic_role_labeling", "title": "Semantic role labeling"}}, {"type": "dd", "start": 20413, "end": 20570, "id": 552, "features": {}}, {"type": "a", "start": 20498, "end": 20504, "id": 553, "features": {"href": "/wiki/Frame_semantics_(linguistics)", "title": "Frame semantics (linguistics)"}}, {"type": "a", "start": 20554, "end": 20568, "id": 554, "features": {"href": "/wiki/Semantic_roles", "class": "mw-redirect", "title": "Semantic roles"}}, {"type": "dl", "start": 20570, "end": 20571, "id": 555, "features": {}}, {"type": "dt", "start": 20570, "end": 20570, "id": 556, "features": {}}, {"type": "dt", "start": 20571, "end": 20571, "id": 557, "features": {}}, {"type": "h3", "start": 20571, "end": 20625, "id": 558, "features": {"class": "in-block"}}, {"type": "span", "start": 20571, "end": 20571, "id": 559, "features": {"id": "Discourse_.28semantics_beyond_individual_sentences.29"}}, {"type": "span", "start": 20571, "end": 20620, "id": 560, "features": {"class": "mw-headline", "id": "Discourse_(semantics_beyond_individual_sentences)"}}, {"type": "span", "start": 20620, "end": 20624, "id": 561, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 20620, "end": 20624, "id": 562, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=14", "title": "Edit section: Discourse (semantics beyond individual sentences)", "data-section": "14", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "dl", "start": 20625, "end": 21811, "id": 563, "features": {}}, {"type": "dt", "start": 20625, "end": 20647, "id": 564, "features": {}}, {"type": "a", "start": 20625, "end": 20647, "id": 565, "features": {"href": "/wiki/Coreference", "title": "Coreference"}}, {"type": "dd", "start": 20648, "end": 21386, "id": 566, "features": {}}, {"type": "a", "start": 20765, "end": 20784, "id": 567, "features": {"href": "/wiki/Anaphora_resolution", "class": "mw-redirect", "title": "Anaphora resolution"}}, {"type": "a", "start": 20868, "end": 20876, "id": 568, "features": {"href": "/wiki/Pronoun", "title": "Pronoun"}}, {"type": "a", "start": 21041, "end": 21062, "id": 569, "features": {"href": "/wiki/Referring_expression", "title": "Referring expression"}}, {"type": "dt", "start": 21387, "end": 21405, "id": 570, "features": {}}, {"type": "a", "start": 21387, "end": 21405, "id": 571, "features": {"href": "/wiki/Discourse_analysis", "title": "Discourse analysis"}}, {"type": "dd", "start": 21406, "end": 21811, "id": 572, "features": {}}, {"type": "a", "start": 21504, "end": 21513, "id": 573, "features": {"href": "/wiki/Discourse", "title": "Discourse"}}, {"type": "a", "start": 21711, "end": 21722, "id": 574, "features": {"href": "/wiki/Speech_act", "title": "Speech act"}}, {"type": "dl", "start": 21811, "end": 22401, "id": 575, "features": {}}, {"type": "dt", "start": 21811, "end": 21843, "id": 576, "features": {}}, {"type": "dd", "start": 21844, "end": 22401, "id": 577, "features": {}}, {"type": "a", "start": 21929, "end": 21935, "id": 578, "features": {"href": "/wiki/Frame_semantics_(linguistics)", "title": "Frame semantics (linguistics)"}}, {"type": "a", "start": 22382, "end": 22400, "id": 579, "features": {"href": "/wiki/Pro-drop_language", "title": "Pro-drop language"}}, {"type": "dl", "start": 22401, "end": 22586, "id": 580, "features": {}}, {"type": "dt", "start": 22401, "end": 22431, "id": 581, "features": {}}, {"type": "a", "start": 22401, "end": 22431, "id": 582, "features": {"href": "/wiki/Textual_entailment", "title": "Textual entailment"}}, {"type": "dd", "start": 22432, "end": 22586, "id": 583, "features": {}}, {"type": "sup", "start": 22582, "end": 22586, "id": 584, "features": {"id": "cite_ref-rte:11_20-0", "class": "reference"}}, {"type": "a", "start": 22582, "end": 22586, "id": 585, "features": {"href": "#cite_note-rte:11-20"}}, {"type": "dl", "start": 22586, "end": 22745, "id": 586, "features": {}}, {"type": "dt", "start": 22586, "end": 22620, "id": 587, "features": {}}, {"type": "a", "start": 22586, "end": 22604, "id": 588, "features": {"href": "/wiki/Topic_segmentation", "class": "mw-redirect", "title": "Topic segmentation"}}, {"type": "dd", "start": 22621, "end": 22745, "id": 589, "features": {}}, {"type": "h3", "start": 22746, "end": 22780, "id": 590, "features": {"class": "in-block"}}, {"type": "span", "start": 22746, "end": 22775, "id": 591, "features": {"class": "mw-headline", "id": "Higher-level_NLP_applications"}}, {"type": "span", "start": 22775, "end": 22779, "id": 592, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 22775, "end": 22779, "id": 593, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=15", "title": "Edit section: Higher-level NLP applications", "data-section": "15", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "dl", "start": 22780, "end": 25446, "id": 594, "features": {}}, {"type": "dt", "start": 22780, "end": 22824, "id": 595, "features": {}}, {"type": "a", "start": 22780, "end": 22803, "id": 596, "features": {"href": "/wiki/Automatic_summarization", "title": "Automatic summarization"}}, {"type": "dd", "start": 22825, "end": 23008, "id": 597, "features": {}}, {"type": "dt", "start": 23009, "end": 23024, "id": 598, "features": {}}, {"type": "dd", "start": 23025, "end": 23734, "id": 599, "features": {}}, {"type": "i", "start": 23240, "end": 23281, "id": 600, "features": {}}, {"type": "sup", "start": 23283, "end": 23287, "id": 601, "features": {"id": "cite_ref-21", "class": "reference"}}, {"type": "a", "start": 23283, "end": 23287, "id": 602, "features": {"href": "#cite_note-21"}}, {"type": "i", "start": 23356, "end": 23366, "id": 603, "features": {}}, {"type": "a", "start": 23356, "end": 23366, "id": 604, "features": {"href": "/wiki/1_the_Road", "title": "1 the Road"}}, {"type": "a", "start": 23496, "end": 23511, "id": 605, "features": {"href": "/wiki/Language_model", "title": "Language model"}}, {"type": "i", "start": 23590, "end": 23611, "id": 606, "features": {}}, {"type": "sup", "start": 23629, "end": 23633, "id": 607, "features": {"id": "cite_ref-22", "class": "reference"}}, {"type": "a", "start": 23629, "end": 23633, "id": 608, "features": {"href": "#cite_note-22"}}, {"type": "i", "start": 23641, "end": 23647, "id": 609, "features": {}}, {"type": "i", "start": 23652, "end": 23662, "id": 610, "features": {}}, {"type": "dt", "start": 23735, "end": 23754, "id": 611, "features": {}}, {"type": "a", "start": 23735, "end": 23754, "id": 612, "features": {"href": "/wiki/Dialogue_system", "title": "Dialogue system"}}, {"type": "dd", "start": 23755, "end": 23806, "id": 613, "features": {}}, {"type": "dt", "start": 23807, "end": 23826, "id": 614, "features": {}}, {"type": "a", "start": 23807, "end": 23826, "id": 615, "features": {"href": "/wiki/Machine_translation", "title": "Machine translation"}}, {"type": "dd", "start": 23827, "end": 24159, "id": 616, "features": {}}, {"type": "a", "start": 23997, "end": 24008, "id": 617, "features": {"href": "/wiki/AI-complete", "title": "AI-complete"}}, {"type": "dt", "start": 24160, "end": 24194, "id": 618, "features": {}}, {"type": "a", "start": 24160, "end": 24187, "id": 619, "features": {"href": "/wiki/Natural_language_generation", "class": "mw-redirect", "title": "Natural language generation"}}, {"type": "dd", "start": 24195, "end": 24288, "id": 620, "features": {}}, {"type": "dt", "start": 24289, "end": 24325, "id": 621, "features": {}}, {"type": "a", "start": 24289, "end": 24319, "id": 622, "features": {"href": "/wiki/Natural_language_understanding", "class": "mw-redirect", "title": "Natural language understanding"}}, {"type": "dd", "start": 24326, "end": 25122, "id": 623, "features": {}}, {"type": "a", "start": 24390, "end": 24407, "id": 624, "features": {"href": "/wiki/First-order_logic", "title": "First-order logic"}}, {"type": "a", "start": 24439, "end": 24447, "id": 625, "features": {"href": "/wiki/Computer", "title": "Computer"}}, {"type": "a", "start": 24944, "end": 24967, "id": 626, "features": {"href": "/wiki/Closed-world_assumption", "title": "Closed-world assumption"}}, {"type": "a", "start": 24978, "end": 24999, "id": 627, "features": {"href": "/wiki/Open-world_assumption", "title": "Open-world assumption"}}, {"type": "sup", "start": 25118, "end": 25122, "id": 628, "features": {"id": "cite_ref-23", "class": "reference"}}, {"type": "a", "start": 25118, "end": 25122, "id": 629, "features": {"href": "#cite_note-23"}}, {"type": "dt", "start": 25123, "end": 25141, "id": 630, "features": {}}, {"type": "a", "start": 25123, "end": 25141, "id": 631, "features": {"href": "/wiki/Question_answering", "title": "Question answering"}}, {"type": "dd", "start": 25142, "end": 25446, "id": 632, "features": {}}, {"type": "sup", "start": 25442, "end": 25446, "id": 633, "features": {"id": "cite_ref-24", "class": "reference"}}, {"type": "a", "start": 25442, "end": 25446, "id": 634, "features": {"href": "#cite_note-24"}}, {"type": "h2", "start": 25447, "end": 25469, "id": 635, "features": {"class": "section-heading", "onclick": "javascript:mfTempOpenSection(4)"}}, {"type": "div", "start": 25447, "end": 25447, "id": 636, "features": {"class": "mw-ui-icon mw-ui-icon-element indicator mw-ui-icon-small mw-ui-icon-flush-left"}}, {"type": "span", "start": 25447, "end": 25464, "id": 637, "features": {"class": "mw-headline", "id": "Cognition_and_NLP"}}, {"type": "span", "start": 25464, "end": 25468, "id": 638, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 25464, "end": 25468, "id": 639, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=16", "title": "Edit section: Cognition and NLP", "data-section": "16", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "section", "start": 25469, "end": 28151, "id": 640, "features": {"class": "mf-section-4 collapsible-block", "id": "mf-section-4"}}, {"type": "p", "start": 25469, "end": 26039, "id": 641, "features": {}}, {"type": "a", "start": 25469, "end": 25478, "id": 642, "features": {"href": "/wiki/Cognition", "title": "Cognition"}}, {"type": "sup", "start": 25605, "end": 25609, "id": 643, "features": {"id": "cite_ref-25", "class": "reference"}}, {"type": "a", "start": 25605, "end": 25609, "id": 644, "features": {"href": "#cite_note-25"}}, {"type": "a", "start": 25610, "end": 25627, "id": 645, "features": {"href": "/wiki/Cognitive_science", "title": "Cognitive science"}}, {"type": "sup", "start": 25701, "end": 25705, "id": 646, "features": {"id": "cite_ref-26", "class": "reference"}}, {"type": "a", "start": 25701, "end": 25705, "id": 647, "features": {"href": "#cite_note-26"}}, {"type": "a", "start": 25706, "end": 25727, "id": 648, "features": {"href": "/wiki/Cognitive_linguistics", "title": "Cognitive linguistics"}}, {"type": "sup", "start": 25845, "end": 25849, "id": 649, "features": {"id": "cite_ref-27", "class": "reference"}}, {"type": "a", "start": 25845, "end": 25849, "id": 650, "features": {"href": "#cite_note-27"}}, {"type": "a", "start": 25850, "end": 25863, "id": 651, "features": {"href": "/wiki/George_Lakoff", "title": "George Lakoff"}}, {"type": "a", "start": 25966, "end": 25983, "id": 652, "features": {"href": "/wiki/Cognitive_science", "title": "Cognitive science"}}, {"type": "a", "start": 26012, "end": 26033, "id": 653, "features": {"href": "/wiki/Cognitive_linguistics", "title": "Cognitive linguistics"}}, {"type": "sup", "start": 26034, "end": 26038, "id": 654, "features": {"id": "cite_ref-28", "class": "reference"}}, {"type": "a", "start": 26034, "end": 26038, "id": 655, "features": {"href": "#cite_note-28"}}, {"type": "p", "start": 26039, "end": 26284, "id": 656, "features": {}}, {"type": "a", "start": 26131, "end": 26150, "id": 657, "features": {"href": "/wiki/Conceptual_metaphor", "title": "Conceptual metaphor"}}, {"type": "sup", "start": 26279, "end": 26283, "id": 658, "features": {"id": "cite_ref-29", "class": "reference"}}, {"type": "a", "start": 26279, "end": 26283, "id": 659, "features": {"href": "#cite_note-29"}}, {"type": "p", "start": 26284, "end": 27080, "id": 660, "features": {}}, {"type": "i", "start": 26352, "end": 26357, "id": 661, "features": {}}, {"type": "a", "start": 26374, "end": 26385, "id": 662, "features": {"href": "/wiki/Comparative", "title": "Comparative"}}, {"type": "i", "start": 26393, "end": 26414, "id": 663, "features": {}}, {"type": "i", "start": 26499, "end": 26504, "id": 664, "features": {}}, {"type": "i", "start": 26547, "end": 26565, "id": 665, "features": {}}, {"type": "a", "start": 26638, "end": 26650, "id": 666, "features": {"href": "/wiki/Stative_verb", "title": "Stative verb"}}, {"type": "i", "start": 26658, "end": 26681, "id": 667, "features": {}}, {"type": "i", "start": 26733, "end": 26738, "id": 668, "features": {}}, {"type": "i", "start": 26762, "end": 26774, "id": 669, "features": {}}, {"type": "a", "start": 26885, "end": 26904, "id": 670, "features": {"href": "/wiki/Conceptual_metaphor", "title": "Conceptual metaphor"}}, {"type": "i", "start": 26947, "end": 26968, "id": 671, "features": {}}, {"type": "p", "start": 27080, "end": 27493, "id": 672, "features": {}}, {"type": "a", "start": 27159, "end": 27193, "id": 673, "features": {"href": "/wiki/Probabilistic_context-free_grammar", "title": "Probabilistic context-free grammar"}}, {"type": "span", "start": 27473, "end": 27490, "id": 674, "features": {"class": "citation patent"}}, {"type": "a", "start": 27473, "end": 27490, "id": 675, "features": {"rel": "nofollow", "class": "external text", "href": "https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US9269353"}}, {"type": "span", "start": 27490, "end": 27491, "id": 676, "features": {"class": "Z3988", "title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.number=9269353&rft.cc=US&rft.title="}}, {"type": "span", "start": 27490, "end": 27491, "id": 677, "features": {"style": "display: none;"}}, {"type": "dl", "start": 27493, "end": 27798, "id": 678, "features": {}}, {"type": "dd", "start": 27493, "end": 27798, "id": 679, "features": {}}, {"type": "span", "start": 27493, "end": 27798, "id": 680, "features": {"class": "mwe-math-element"}}, {"type": "span", "start": 27493, "end": 27797, "id": 681, "features": {"class": "mwe-math-mathml-inline mwe-math-mathml-a11y", "style": "display: none;"}}, {"type": "math", "start": 27493, "end": 27797, "id": 682, "features": {"xmlns": "http://www.w3.org/1998/Math/MathML", "alttext": "{\\displaystyle {RMM(token_{N})}={PMM(token_{N})}\\times {\\frac {1}{2d}}\\left(\\sum _{i=-d}^{d}{((PMM(token_{N-1})}\\times {PF(token_{N},token_{N-1}))_{i}}\\right)}"}}, {"type": "semantics", "start": 27493, "end": 27797, "id": 683, "features": {}}, {"type": "mrow", "start": 27493, "end": 27637, "id": 684, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mstyle", "start": 27493, "end": 27637, "id": 685, "features": {"displaystyle": "true", "scriptlevel": "0"}}, {"type": "mrow", "start": 27493, "end": 27515, "id": 686, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27493, "end": 27494, "id": 687, "features": {}}, {"type": "mi", "start": 27495, "end": 27496, "id": 688, "features": {}}, {"type": "mi", "start": 27497, "end": 27498, "id": 689, "features": {}}, {"type": "mo", "start": 27499, "end": 27500, "id": 690, "features": {"stretchy": "false"}}, {"type": "mi", "start": 27501, "end": 27502, "id": 691, "features": {}}, {"type": "mi", "start": 27503, "end": 27504, "id": 692, "features": {}}, {"type": "mi", "start": 27505, "end": 27506, "id": 693, "features": {}}, {"type": "mi", "start": 27507, "end": 27508, "id": 694, "features": {}}, {"type": "msub", "start": 27509, "end": 27513, "id": 695, "features": {}}, {"type": "mi", "start": 27509, "end": 27510, "id": 696, "features": {}}, {"type": "mrow", "start": 27511, "end": 27513, "id": 697, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27511, "end": 27512, "id": 698, "features": {}}, {"type": "mo", "start": 27513, "end": 27514, "id": 699, "features": {"stretchy": "false"}}, {"type": "mo", "start": 27515, "end": 27516, "id": 700, "features": {}}, {"type": "mrow", "start": 27517, "end": 27539, "id": 701, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27517, "end": 27518, "id": 702, "features": {}}, {"type": "mi", "start": 27519, "end": 27520, "id": 703, "features": {}}, {"type": "mi", "start": 27521, "end": 27522, "id": 704, "features": {}}, {"type": "mo", "start": 27523, "end": 27524, "id": 705, "features": {"stretchy": "false"}}, {"type": "mi", "start": 27525, "end": 27526, "id": 706, "features": {}}, {"type": "mi", "start": 27527, "end": 27528, "id": 707, "features": {}}, {"type": "mi", "start": 27529, "end": 27530, "id": 708, "features": {}}, {"type": "mi", "start": 27531, "end": 27532, "id": 709, "features": {}}, {"type": "msub", "start": 27533, "end": 27537, "id": 710, "features": {}}, {"type": "mi", "start": 27533, "end": 27534, "id": 711, "features": {}}, {"type": "mrow", "start": 27535, "end": 27537, "id": 712, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27535, "end": 27536, "id": 713, "features": {}}, {"type": "mo", "start": 27537, "end": 27538, "id": 714, "features": {"stretchy": "false"}}, {"type": "mo", "start": 27539, "end": 27540, "id": 715, "features": {}}, {"type": "mrow", "start": 27541, "end": 27547, "id": 716, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mfrac", "start": 27541, "end": 27547, "id": 717, "features": {}}, {"type": "mn", "start": 27541, "end": 27542, "id": 718, "features": {}}, {"type": "mrow", "start": 27543, "end": 27547, "id": 719, "features": {}}, {"type": "mn", "start": 27543, "end": 27544, "id": 720, "features": {}}, {"type": "mi", "start": 27545, "end": 27546, "id": 721, "features": {}}, {"type": "mrow", "start": 27547, "end": 27637, "id": 722, "features": {}}, {"type": "mo", "start": 27547, "end": 27548, "id": 723, "features": {}}, {"type": "mrow", "start": 27549, "end": 27635, "id": 724, "features": {}}, {"type": "munderover", "start": 27549, "end": 27561, "id": 725, "features": {}}, {"type": "mo", "start": 27549, "end": 27550, "id": 726, "features": {}}, {"type": "mrow", "start": 27551, "end": 27559, "id": 727, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27551, "end": 27552, "id": 728, "features": {}}, {"type": "mo", "start": 27553, "end": 27554, "id": 729, "features": {}}, {"type": "mo", "start": 27555, "end": 27556, "id": 730, "features": {}}, {"type": "mi", "start": 27557, "end": 27558, "id": 731, "features": {}}, {"type": "mrow", "start": 27559, "end": 27561, "id": 732, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27559, "end": 27560, "id": 733, "features": {}}, {"type": "mrow", "start": 27561, "end": 27591, "id": 734, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mo", "start": 27561, "end": 27562, "id": 735, "features": {"stretchy": "false"}}, {"type": "mo", "start": 27563, "end": 27564, "id": 736, "features": {"stretchy": "false"}}, {"type": "mi", "start": 27565, "end": 27566, "id": 737, "features": {}}, {"type": "mi", "start": 27567, "end": 27568, "id": 738, "features": {}}, {"type": "mi", "start": 27569, "end": 27570, "id": 739, "features": {}}, {"type": "mo", "start": 27571, "end": 27572, "id": 740, "features": {"stretchy": "false"}}, {"type": "mi", "start": 27573, "end": 27574, "id": 741, "features": {}}, {"type": "mi", "start": 27575, "end": 27576, "id": 742, "features": {}}, {"type": "mi", "start": 27577, "end": 27578, "id": 743, "features": {}}, {"type": "mi", "start": 27579, "end": 27580, "id": 744, "features": {}}, {"type": "msub", "start": 27581, "end": 27589, "id": 745, "features": {}}, {"type": "mi", "start": 27581, "end": 27582, "id": 746, "features": {}}, {"type": "mrow", "start": 27583, "end": 27589, "id": 747, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27583, "end": 27584, "id": 748, "features": {}}, {"type": "mo", "start": 27585, "end": 27586, "id": 749, "features": {}}, {"type": "mn", "start": 27587, "end": 27588, "id": 750, "features": {}}, {"type": "mo", "start": 27589, "end": 27590, "id": 751, "features": {"stretchy": "false"}}, {"type": "mo", "start": 27591, "end": 27592, "id": 752, "features": {}}, {"type": "mrow", "start": 27593, "end": 27635, "id": 753, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27593, "end": 27594, "id": 754, "features": {}}, {"type": "mi", "start": 27595, "end": 27596, "id": 755, "features": {}}, {"type": "mo", "start": 27597, "end": 27598, "id": 756, "features": {"stretchy": "false"}}, {"type": "mi", "start": 27599, "end": 27600, "id": 757, "features": {}}, {"type": "mi", "start": 27601, "end": 27602, "id": 758, "features": {}}, {"type": "mi", "start": 27603, "end": 27604, "id": 759, "features": {}}, {"type": "mi", "start": 27605, "end": 27606, "id": 760, "features": {}}, {"type": "msub", "start": 27607, "end": 27611, "id": 761, "features": {}}, {"type": "mi", "start": 27607, "end": 27608, "id": 762, "features": {}}, {"type": "mrow", "start": 27609, "end": 27611, "id": 763, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27609, "end": 27610, "id": 764, "features": {}}, {"type": "mo", "start": 27611, "end": 27612, "id": 765, "features": {}}, {"type": "mi", "start": 27613, "end": 27614, "id": 766, "features": {}}, {"type": "mi", "start": 27615, "end": 27616, "id": 767, "features": {}}, {"type": "mi", "start": 27617, "end": 27618, "id": 768, "features": {}}, {"type": "mi", "start": 27619, "end": 27620, "id": 769, "features": {}}, {"type": "msub", "start": 27621, "end": 27629, "id": 770, "features": {}}, {"type": "mi", "start": 27621, "end": 27622, "id": 771, "features": {}}, {"type": "mrow", "start": 27623, "end": 27629, "id": 772, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27623, "end": 27624, "id": 773, "features": {}}, {"type": "mo", "start": 27625, "end": 27626, "id": 774, "features": {}}, {"type": "mn", "start": 27627, "end": 27628, "id": 775, "features": {}}, {"type": "mo", "start": 27629, "end": 27630, "id": 776, "features": {"stretchy": "false"}}, {"type": "msub", "start": 27631, "end": 27635, "id": 777, "features": {}}, {"type": "mo", "start": 27631, "end": 27632, "id": 778, "features": {"stretchy": "false"}}, {"type": "mrow", "start": 27633, "end": 27635, "id": 779, "features": {"class": "MJX-TeXAtom-ORD"}}, {"type": "mi", "start": 27633, "end": 27634, "id": 780, "features": {}}, {"type": "mo", "start": 27635, "end": 27636, "id": 781, "features": {}}, {"type": "annotation", "start": 27637, "end": 27796, "id": 782, "features": {"encoding": "application/x-tex"}}, {"type": "noscript", "start": 27797, "end": 27797, "id": 783, "features": {}}, {"type": "img", "start": 27797, "end": 27797, "id": 784, "features": {"src": "https://wikimedia.org/api/rest_v1/media/math/render/svg/145bdbd62e463df3e65c94db2e17224ecbcb2c40", "class": "mwe-math-fallback-image-inline", "aria-hidden": "true", "style": "vertical-align: -3.171ex; width:96.554ex; height:7.509ex;", "alt": "{\\displaystyle {RMM(token_{N})}={PMM(token_{N})}\\times {\\frac {1}{2d}}\\left(\\sum _{i=-d}^{d}{((PMM(token_{N-1})}\\times {PF(token_{N},token_{N-1}))_{i}}\\right)}"}}, {"type": "span", "start": 27797, "end": 27798, "id": 785, "features": {"class": "lazy-image-placeholder", "style": "width: 96.554ex;height: 7.509ex;vertical-align: -3.171ex;", "data-src": "https://wikimedia.org/api/rest_v1/media/math/render/svg/145bdbd62e463df3e65c94db2e17224ecbcb2c40", "data-alt": "{\\displaystyle {RMM(token_{N})}={PMM(token_{N})}\\times {\\frac {1}{2d}}\\left(\\sum _{i=-d}^{d}{((PMM(token_{N-1})}\\times {PF(token_{N},token_{N-1}))_{i}}\\right)}", "data-class": "mwe-math-fallback-image-inline"}}, {"type": "p", "start": 27799, "end": 28151, "id": 786, "features": {}}, {"type": "i", "start": 27799, "end": 27805, "id": 787, "features": {}}, {"type": "br", "start": 27806, "end": 27806, "id": 788, "features": {}}, {"type": "b", "start": 27811, "end": 27814, "id": 789, "features": {}}, {"type": "br", "start": 27851, "end": 27851, "id": 790, "features": {}}, {"type": "b", "start": 27856, "end": 27861, "id": 791, "features": {}}, {"type": "br", "start": 27910, "end": 27910, "id": 792, "features": {}}, {"type": "b", "start": 27915, "end": 27916, "id": 793, "features": {}}, {"type": "br", "start": 27957, "end": 27957, "id": 794, "features": {}}, {"type": "b", "start": 27962, "end": 27965, "id": 795, "features": {}}, {"type": "br", "start": 28021, "end": 28021, "id": 796, "features": {}}, {"type": "b", "start": 28026, "end": 28027, "id": 797, "features": {}}, {"type": "b", "start": 28080, "end": 28083, "id": 798, "features": {}}, {"type": "br", "start": 28091, "end": 28091, "id": 799, "features": {}}, {"type": "b", "start": 28096, "end": 28098, "id": 800, "features": {}}, {"type": "h2", "start": 28151, "end": 28164, "id": 801, "features": {"class": "section-heading", "onclick": "javascript:mfTempOpenSection(5)"}}, {"type": "div", "start": 28151, "end": 28151, "id": 802, "features": {"class": "mw-ui-icon mw-ui-icon-element indicator mw-ui-icon-small mw-ui-icon-flush-left"}}, {"type": "span", "start": 28151, "end": 28159, "id": 803, "features": {"class": "mw-headline", "id": "See_also"}}, {"type": "span", "start": 28159, "end": 28163, "id": 804, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 28159, "end": 28163, "id": 805, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=17", "title": "Edit section: See also", "data-section": "17", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "section", "start": 28164, "end": 28919, "id": 806, "features": {"class": "mf-section-5 collapsible-block", "id": "mf-section-5"}}, {"type": "div", "start": 28164, "end": 28919, "id": 807, "features": {"class": "div-col columns column-width", "style": "-moz-column-width: 22em; -webkit-column-width: 22em; column-width: 22em;"}}, {"type": "ul", "start": 28164, "end": 28919, "id": 808, "features": {}}, {"type": "li", "start": 28164, "end": 28175, "id": 809, "features": {}}, {"type": "i", "start": 28164, "end": 28174, "id": 810, "features": {}}, {"type": "a", "start": 28164, "end": 28174, "id": 811, "features": {"href": "/wiki/1_the_Road", "title": "1 the Road"}}, {"type": "li", "start": 28175, "end": 28199, "id": 812, "features": {}}, {"type": "a", "start": 28175, "end": 28198, "id": 813, "features": {"href": "/wiki/Automated_essay_scoring", "title": "Automated essay scoring"}}, {"type": "li", "start": 28199, "end": 28222, "id": 814, "features": {}}, {"type": "a", "start": 28199, "end": 28221, "id": 815, "features": {"href": "/wiki/Biomedical_text_mining", "title": "Biomedical text mining"}}, {"type": "li", "start": 28222, "end": 28247, "id": 816, "features": {}}, {"type": "a", "start": 28222, "end": 28246, "id": 817, "features": {"href": "/wiki/Compound_term_processing", "class": "mw-redirect", "title": "Compound term processing"}}, {"type": "li", "start": 28247, "end": 28273, "id": 818, "features": {}}, {"type": "a", "start": 28247, "end": 28272, "id": 819, "features": {"href": "/wiki/Computational_linguistics", "title": "Computational linguistics"}}, {"type": "li", "start": 28273, "end": 28301, "id": 820, "features": {}}, {"type": "a", "start": 28273, "end": 28300, "id": 821, "features": {"href": "/wiki/Computer-assisted_reviewing", "title": "Computer-assisted reviewing"}}, {"type": "li", "start": 28301, "end": 28329, "id": 822, "features": {}}, {"type": "a", "start": 28301, "end": 28328, "id": 823, "features": {"href": "/wiki/Controlled_natural_language", "title": "Controlled natural language"}}, {"type": "li", "start": 28329, "end": 28343, "id": 824, "features": {}}, {"type": "a", "start": 28329, "end": 28342, "id": 825, "features": {"href": "/wiki/Deep_learning", "title": "Deep learning"}}, {"type": "li", "start": 28343, "end": 28370, "id": 826, "features": {}}, {"type": "a", "start": 28343, "end": 28369, "id": 827, "features": {"href": "/wiki/Deep_linguistic_processing", "title": "Deep linguistic processing"}}, {"type": "li", "start": 28370, "end": 28395, "id": 828, "features": {}}, {"type": "a", "start": 28370, "end": 28394, "id": 829, "features": {"href": "/wiki/Distributional_semantics", "title": "Distributional semantics"}}, {"type": "li", "start": 28395, "end": 28424, "id": 830, "features": {}}, {"type": "a", "start": 28395, "end": 28423, "id": 831, "features": {"href": "/wiki/Foreign_language_reading_aid", "class": "mw-redirect", "title": "Foreign language reading aid"}}, {"type": "li", "start": 28424, "end": 28453, "id": 832, "features": {}}, {"type": "a", "start": 28424, "end": 28452, "id": 833, "features": {"href": "/wiki/Foreign_language_writing_aid", "title": "Foreign language writing aid"}}, {"type": "li", "start": 28453, "end": 28476, "id": 834, "features": {}}, {"type": "a", "start": 28453, "end": 28475, "id": 835, "features": {"href": "/wiki/Information_extraction", "title": "Information extraction"}}, {"type": "li", "start": 28476, "end": 28498, "id": 836, "features": {}}, {"type": "a", "start": 28476, "end": 28497, "id": 837, "features": {"href": "/wiki/Information_retrieval", "title": "Information retrieval"}}, {"type": "li", "start": 28498, "end": 28538, "id": 838, "features": {}}, {"type": "a", "start": 28498, "end": 28537, "id": 839, "features": {"href": "/wiki/Language_and_Communication_Technologies", "title": "Language and Communication Technologies"}}, {"type": "li", "start": 28538, "end": 28558, "id": 840, "features": {}}, {"type": "a", "start": 28538, "end": 28557, "id": 841, "features": {"href": "/wiki/Language_technology", "title": "Language technology"}}, {"type": "li", "start": 28558, "end": 28583, "id": 842, "features": {}}, {"type": "a", "start": 28558, "end": 28582, "id": 843, "features": {"href": "/wiki/Latent_semantic_indexing", "class": "mw-redirect", "title": "Latent semantic indexing"}}, {"type": "li", "start": 28583, "end": 28614, "id": 844, "features": {}}, {"type": "a", "start": 28583, "end": 28613, "id": 845, "features": {"href": "/wiki/Native-language_identification", "title": "Native-language identification"}}, {"type": "li", "start": 28614, "end": 28643, "id": 846, "features": {}}, {"type": "a", "start": 28614, "end": 28642, "id": 847, "features": {"href": "/wiki/Natural_language_programming", "class": "mw-redirect", "title": "Natural language programming"}}, {"type": "li", "start": 28643, "end": 28667, "id": 848, "features": {}}, {"type": "a", "start": 28643, "end": 28666, "id": 849, "features": {"href": "/wiki/Natural_language_user_interface", "class": "mw-redirect", "title": "Natural language user interface"}}, {"type": "li", "start": 28667, "end": 28706, "id": 850, "features": {}}, {"type": "a", "start": 28667, "end": 28705, "id": 851, "features": {"href": "/wiki/Outline_of_natural_language_processing", "title": "Outline of natural language processing"}}, {"type": "li", "start": 28706, "end": 28722, "id": 852, "features": {}}, {"type": "a", "start": 28706, "end": 28721, "id": 853, "features": {"href": "/wiki/Query_expansion", "title": "Query expansion"}}, {"type": "li", "start": 28722, "end": 28742, "id": 854, "features": {}}, {"type": "a", "start": 28722, "end": 28741, "id": 855, "features": {"href": "/wiki/Query_understanding", "title": "Query understanding"}}, {"type": "li", "start": 28742, "end": 28768, "id": 856, "features": {}}, {"type": "a", "start": 28742, "end": 28767, "id": 857, "features": {"href": "/wiki/Reification_(linguistics)", "title": "Reification (linguistics)"}}, {"type": "li", "start": 28768, "end": 28786, "id": 858, "features": {}}, {"type": "a", "start": 28768, "end": 28785, "id": 859, "features": {"href": "/wiki/Speech_processing", "title": "Speech processing"}}, {"type": "li", "start": 28786, "end": 28809, "id": 860, "features": {}}, {"type": "a", "start": 28786, "end": 28808, "id": 861, "features": {"href": "/wiki/Spoken_dialogue_system", "class": "mw-redirect", "title": "Spoken dialogue system"}}, {"type": "li", "start": 28809, "end": 28823, "id": 862, "features": {}}, {"type": "a", "start": 28809, "end": 28822, "id": 863, "features": {"href": "/wiki/Text-proofing", "class": "mw-redirect", "title": "Text-proofing"}}, {"type": "li", "start": 28823, "end": 28843, "id": 864, "features": {}}, {"type": "a", "start": 28823, "end": 28842, "id": 865, "features": {"href": "/wiki/Text_simplification", "title": "Text simplification"}}, {"type": "li", "start": 28843, "end": 28880, "id": 866, "features": {}}, {"type": "a", "start": 28843, "end": 28879, "id": 867, "features": {"href": "/wiki/Transformer_(machine_learning_model)", "title": "Transformer (machine learning model)"}}, {"type": "li", "start": 28880, "end": 28891, "id": 868, "features": {}}, {"type": "a", "start": 28880, "end": 28890, "id": 869, "features": {"href": "/wiki/Truecasing", "title": "Truecasing"}}, {"type": "li", "start": 28891, "end": 28910, "id": 870, "features": {}}, {"type": "a", "start": 28891, "end": 28909, "id": 871, "features": {"href": "/wiki/Question_answering", "title": "Question answering"}}, {"type": "li", "start": 28910, "end": 28919, "id": 872, "features": {}}, {"type": "a", "start": 28910, "end": 28918, "id": 873, "features": {"href": "/wiki/Word2vec", "title": "Word2vec"}}, {"type": "h2", "start": 28919, "end": 28934, "id": 874, "features": {"class": "section-heading", "onclick": "javascript:mfTempOpenSection(6)"}}, {"type": "div", "start": 28919, "end": 28919, "id": 875, "features": {"class": "mw-ui-icon mw-ui-icon-element indicator mw-ui-icon-small mw-ui-icon-flush-left"}}, {"type": "span", "start": 28919, "end": 28929, "id": 876, "features": {"class": "mw-headline", "id": "References"}}, {"type": "span", "start": 28929, "end": 28933, "id": 877, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 28929, "end": 28933, "id": 878, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=18", "title": "Edit section: References", "data-section": "18", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "section", "start": 28934, "end": 34687, "id": 879, "features": {"class": "mf-section-6 collapsible-block", "id": "mf-section-6"}}, {"type": "div", "start": 28934, "end": 34687, "id": 880, "features": {"class": "reflist columns references-column-width", "style": "-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;"}}, {"type": "ol", "start": 28934, "end": 34687, "id": 881, "features": {"class": "references"}}, {"type": "li", "start": 28934, "end": 29251, "id": 882, "features": {"id": "cite_note-Kongthon-1"}}, {"type": "span", "start": 28934, "end": 28935, "id": 883, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 28934, "end": 28935, "id": 884, "features": {}}, {"type": "a", "start": 28934, "end": 28935, "id": 885, "features": {"href": "#cite_ref-Kongthon_1-0"}}, {"type": "span", "start": 28936, "end": 29250, "id": 886, "features": {"class": "reference-text"}}, {"type": "cite", "start": 28936, "end": 29250, "id": 887, "features": {"id": "CITEREFKongthonSangkeettrakarnKongyoungHaruechaiyasak2009", "class": "citation conference cs1"}}, {"type": "i", "start": 29052, "end": 29121, "id": 888, "features": {}}, {"type": "a", "start": 29222, "end": 29225, "id": 889, "features": {"href": "/wiki/Doi_(identifier)", "class": "mw-redirect", "title": "Doi (identifier)"}}, {"type": "a", "start": 29226, "end": 29249, "id": 890, "features": {"rel": "nofollow", "class": "external text", "href": "https://doi.org/10.1145%2F1643823.1643908"}}, {"type": "span", "start": 29250, "end": 29250, "id": 891, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=conference&rft.btitle=Implementing+an+online+help+desk+system+based+on+conversational+agent&rft.place=France&rft.pub=ACM&rft.date=2009-10-27%2F2009-10-30&rft_id=info%3Adoi%2F10.1145%2F1643823.1643908&rft.aulast=Kongthon&rft.aufirst=Alisa&rft.au=Sangkeettrakarn%2C+Chatchawal&rft.au=Kongyoung%2C+Sarawoot&rft.au=Haruechaiyasak%2C+Choochart&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "li", "start": 29251, "end": 29355, "id": 892, "features": {"id": "cite_note-2"}}, {"type": "span", "start": 29251, "end": 29252, "id": 893, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 29251, "end": 29252, "id": 894, "features": {}}, {"type": "a", "start": 29251, "end": 29252, "id": 895, "features": {"href": "#cite_ref-2"}}, {"type": "span", "start": 29253, "end": 29354, "id": 896, "features": {"class": "reference-text"}}, {"type": "cite", "start": 29253, "end": 29331, "id": 897, "features": {"id": "CITEREFHutchins,_J.2005", "class": "citation web cs1"}}, {"type": "a", "start": 29274, "end": 29324, "id": 898, "features": {"rel": "nofollow", "class": "external text", "href": "http://www.hutchinsweb.me.uk/Nutshell-2005.pdf"}}, {"type": "span", "start": 29325, "end": 29330, "id": 899, "features": {"class": "cs1-format"}}, {"type": "span", "start": 29331, "end": 29331, "id": 900, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=unknown&rft.btitle=The+history+of+machine+translation+in+a+nutshell&rft.date=2005&rft.au=Hutchins%2C+J.&rft_id=http%3A%2F%2Fwww.hutchinsweb.me.uk%2FNutshell-2005.pdf&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 29331, "end": 29331, "id": 901, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "sup", "start": 29331, "end": 29354, "id": 902, "features": {"class": "noprint Inline-Template", "style": "white-space:nowrap;"}}, {"type": "i", "start": 29332, "end": 29353, "id": 903, "features": {}}, {"type": "a", "start": 29332, "end": 29353, "id": 904, "features": {"href": "/wiki/Wikipedia:Verifiability#Self-published_sources", "title": "Wikipedia:Verifiability"}}, {"type": "span", "start": 29332, "end": 29353, "id": 905, "features": {"title": "This reference citation appears to be to a self-published source. (December 2013)"}}, {"type": "li", "start": 29355, "end": 29541, "id": 906, "features": {"id": "cite_note-3"}}, {"type": "span", "start": 29355, "end": 29356, "id": 907, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 29355, "end": 29356, "id": 908, "features": {}}, {"type": "a", "start": 29355, "end": 29356, "id": 909, "features": {"href": "#cite_ref-3"}}, {"type": "span", "start": 29357, "end": 29540, "id": 910, "features": {"class": "reference-text"}}, {"type": "cite", "start": 29357, "end": 29540, "id": 911, "features": {"id": "CITEREFKoskenniemi1983", "class": "citation cs2"}}, {"type": "a", "start": 29357, "end": 29375, "id": 912, "features": {"href": "/wiki/Kimmo_Koskenniemi", "title": "Kimmo Koskenniemi"}}, {"type": "a", "start": 29384, "end": 29475, "id": 913, "features": {"rel": "nofollow", "class": "external text", "href": "http://www.ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf"}}, {"type": "i", "start": 29384, "end": 29475, "id": 914, "features": {}}, {"type": "span", "start": 29476, "end": 29481, "id": 915, "features": {"class": "cs1-format"}}, {"type": "a", "start": 29518, "end": 29540, "id": 916, "features": {"href": "/wiki/University_of_Helsinki", "title": "University of Helsinki"}}, {"type": "span", "start": 29540, "end": 29540, "id": 917, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Two-level+morphology%3A+A+general+computational+model+of+word-form+recognition+and+production&rft.pub=Department+of+General+Linguistics%2C+University+of+Helsinki&rft.date=1983&rft.aulast=Koskenniemi&rft.aufirst=Kimmo&rft_id=http%3A%2F%2Fwww.ling.helsinki.fi%2F~koskenni%2Fdoc%2FTwo-LevelMorphology.pdf&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 29540, "end": 29540, "id": 918, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 29541, "end": 29690, "id": 919, "features": {"id": "cite_note-4"}}, {"type": "span", "start": 29541, "end": 29542, "id": 920, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 29541, "end": 29542, "id": 921, "features": {}}, {"type": "a", "start": 29541, "end": 29542, "id": 922, "features": {"href": "#cite_ref-4"}}, {"type": "span", "start": 29543, "end": 29689, "id": 923, "features": {"class": "reference-text"}}, {"type": "a", "start": 29589, "end": 29664, "id": 924, "features": {"rel": "nofollow", "class": "external text", "href": "https://www.ijcai.org/Proceedings/81-1/Papers/071.pdf"}}, {"type": "i", "start": 29669, "end": 29674, "id": 925, "features": {}}, {"type": "li", "start": 29690, "end": 29907, "id": 926, "features": {"id": "cite_note-5"}}, {"type": "span", "start": 29690, "end": 29691, "id": 927, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 29690, "end": 29691, "id": 928, "features": {}}, {"type": "a", "start": 29690, "end": 29691, "id": 929, "features": {"href": "#cite_ref-5"}}, {"type": "span", "start": 29692, "end": 29906, "id": 930, "features": {"class": "reference-text"}}, {"type": "cite", "start": 29692, "end": 29906, "id": 931, "features": {"id": "CITEREFGuidaMauri1986", "class": "citation journal cs1"}}, {"type": "i", "start": 29802, "end": 29825, "id": 932, "features": {}}, {"type": "b", "start": 29827, "end": 29829, "id": 933, "features": {}}, {"type": "a", "start": 29846, "end": 29849, "id": 934, "features": {"href": "/wiki/Doi_(identifier)", "class": "mw-redirect", "title": "Doi (identifier)"}}, {"type": "a", "start": 29850, "end": 29873, "id": 935, "features": {"rel": "nofollow", "class": "external text", "href": "https://doi.org/10.1109%2FPROC.1986.13580"}}, {"type": "a", "start": 29875, "end": 29879, "id": 936, "features": {"href": "/wiki/ISSN_(identifier)", "class": "mw-redirect", "title": "ISSN (identifier)"}}, {"type": "a", "start": 29880, "end": 29889, "id": 937, "features": {"rel": "nofollow", "class": "external text", "href": "//www.worldcat.org/issn/1558-2256"}}, {"type": "a", "start": 29891, "end": 29896, "id": 938, "features": {"href": "/wiki/S2CID_(identifier)", "class": "mw-redirect", "title": "S2CID (identifier)"}}, {"type": "a", "start": 29897, "end": 29905, "id": 939, "features": {"rel": "nofollow", "class": "external text", "href": "https://api.semanticscholar.org/CorpusID:30688575"}}, {"type": "span", "start": 29906, "end": 29906, "id": 940, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Proceedings+of+the+IEEE&rft.atitle=Evaluation+of+natural+language+processing+systems%3A+Issues+and+approaches&rft.volume=74&rft.issue=7&rft.pages=1026-1035&rft.date=1986-07&rft_id=https%3A%2F%2Fapi.semanticscholar.org%2FCorpusID%3A30688575&rft.issn=1558-2256&rft_id=info%3Adoi%2F10.1109%2FPROC.1986.13580&rft.aulast=Guida&rft.aufirst=G.&rft.au=Mauri%2C+G.&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 29906, "end": 29906, "id": 941, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 29907, "end": 30756, "id": 942, "features": {"id": "cite_note-6"}}, {"type": "span", "start": 29907, "end": 29908, "id": 943, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 29907, "end": 29908, "id": 944, "features": {}}, {"type": "a", "start": 29907, "end": 29908, "id": 945, "features": {"href": "#cite_ref-6"}}, {"type": "span", "start": 29909, "end": 30755, "id": 946, "features": {"class": "reference-text"}}, {"type": "a", "start": 29964, "end": 29976, "id": 947, "features": {"href": "/wiki/Corner_case", "title": "Corner case"}}, {"type": "a", "start": 30042, "end": 30054, "id": 948, "features": {"href": "/wiki/Pathological_(mathematics)", "title": "Pathological (mathematics)"}}, {"type": "a", "start": 30106, "end": 30125, "id": 949, "features": {"href": "/wiki/Thought_experiment", "title": "Thought experiment"}}, {"type": "a", "start": 30238, "end": 30256, "id": 950, "features": {"href": "/wiki/Corpus_linguistics", "title": "Corpus linguistics"}}, {"type": "a", "start": 30288, "end": 30295, "id": 951, "features": {"href": "/wiki/Text_corpus", "title": "Text corpus"}}, {"type": "a", "start": 30489, "end": 30512, "id": 952, "features": {"href": "/wiki/Poverty_of_the_stimulus", "title": "Poverty of the stimulus"}}, {"type": "li", "start": 30756, "end": 30964, "id": 953, "features": {"id": "cite_note-goldberg:nnlp17-7"}}, {"type": "span", "start": 30756, "end": 30757, "id": 954, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 30756, "end": 30757, "id": 955, "features": {}}, {"type": "a", "start": 30756, "end": 30757, "id": 956, "features": {"href": "#cite_ref-goldberg:nnlp17_7-0"}}, {"type": "span", "start": 30758, "end": 30963, "id": 957, "features": {"class": "reference-text"}}, {"type": "cite", "start": 30758, "end": 30963, "id": 958, "features": {"id": "CITEREFGoldberg2016", "class": "citation journal cs1"}}, {"type": "i", "start": 30850, "end": 30893, "id": 959, "features": {}}, {"type": "b", "start": 30895, "end": 30897, "id": 960, "features": {}}, {"type": "a", "start": 30908, "end": 30913, "id": 961, "features": {"href": "/wiki/ArXiv_(identifier)", "class": "mw-redirect", "title": "ArXiv (identifier)"}}, {"type": "span", "start": 30914, "end": 30924, "id": 962, "features": {"class": "cs1-lock-free", "title": "Freely accessible"}}, {"type": "a", "start": 30914, "end": 30924, "id": 963, "features": {"rel": "nofollow", "class": "external text", "href": "//arxiv.org/abs/1807.10854"}}, {"type": "a", "start": 30926, "end": 30929, "id": 964, "features": {"href": "/wiki/Doi_(identifier)", "class": "mw-redirect", "title": "Doi (identifier)"}}, {"type": "a", "start": 30930, "end": 30947, "id": 965, "features": {"rel": "nofollow", "class": "external text", "href": "https://doi.org/10.1613%2Fjair.4992"}}, {"type": "a", "start": 30949, "end": 30954, "id": 966, "features": {"href": "/wiki/S2CID_(identifier)", "class": "mw-redirect", "title": "S2CID (identifier)"}}, {"type": "a", "start": 30955, "end": 30962, "id": 967, "features": {"rel": "nofollow", "class": "external text", "href": "https://api.semanticscholar.org/CorpusID:8273530"}}, {"type": "span", "start": 30963, "end": 30963, "id": 968, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Journal+of+Artificial+Intelligence+Research&rft.atitle=A+Primer+on+Neural+Network+Models+for+Natural+Language+Processing&rft.volume=57&rft.pages=345-420&rft.date=2016&rft_id=info%3Aarxiv%2F1807.10854&rft_id=https%3A%2F%2Fapi.semanticscholar.org%2FCorpusID%3A8273530&rft_id=info%3Adoi%2F10.1613%2Fjair.4992&rft.aulast=Goldberg&rft.aufirst=Yoav&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 30963, "end": 30963, "id": 969, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 30964, "end": 31050, "id": 970, "features": {"id": "cite_note-goodfellow:book16-8"}}, {"type": "span", "start": 30964, "end": 30965, "id": 971, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 30964, "end": 30965, "id": 972, "features": {}}, {"type": "a", "start": 30964, "end": 30965, "id": 973, "features": {"href": "#cite_ref-goodfellow:book16_8-0"}}, {"type": "span", "start": 30966, "end": 31049, "id": 974, "features": {"class": "reference-text"}}, {"type": "cite", "start": 30966, "end": 31049, "id": 975, "features": {"id": "CITEREFGoodfellowBengioCourville2016", "class": "citation book cs1"}}, {"type": "a", "start": 31024, "end": 31037, "id": 976, "features": {"rel": "nofollow", "class": "external text", "href": "http://www.deeplearningbook.org/"}}, {"type": "i", "start": 31024, "end": 31037, "id": 977, "features": {}}, {"type": "span", "start": 31049, "end": 31049, "id": 978, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Deep+Learning&rft.pub=MIT+Press&rft.date=2016&rft.aulast=Goodfellow&rft.aufirst=Ian&rft.au=Bengio%2C+Yoshua&rft.au=Courville%2C+Aaron&rft_id=http%3A%2F%2Fwww.deeplearningbook.org%2F&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 31049, "end": 31049, "id": 979, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 31050, "end": 31228, "id": 980, "features": {"id": "cite_note-jozefowicz:lm16-9"}}, {"type": "span", "start": 31050, "end": 31051, "id": 981, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 31050, "end": 31051, "id": 982, "features": {}}, {"type": "a", "start": 31050, "end": 31051, "id": 983, "features": {"href": "#cite_ref-jozefowicz:lm16_9-0"}}, {"type": "span", "start": 31052, "end": 31227, "id": 984, "features": {"class": "reference-text"}}, {"type": "cite", "start": 31052, "end": 31227, "id": 985, "features": {"id": "CITEREFJozefowiczVinyalsSchusterShazeer2016", "class": "citation book cs1"}}, {"type": "i", "start": 31138, "end": 31179, "id": 986, "features": {}}, {"type": "a", "start": 31181, "end": 31186, "id": 987, "features": {"href": "/wiki/ArXiv_(identifier)", "class": "mw-redirect", "title": "ArXiv (identifier)"}}, {"type": "span", "start": 31187, "end": 31197, "id": 988, "features": {"class": "cs1-lock-free", "title": "Freely accessible"}}, {"type": "a", "start": 31187, "end": 31197, "id": 989, "features": {"rel": "nofollow", "class": "external text", "href": "//arxiv.org/abs/1602.02410"}}, {"type": "a", "start": 31199, "end": 31206, "id": 990, "features": {"href": "/wiki/Bibcode_(identifier)", "class": "mw-redirect", "title": "Bibcode (identifier)"}}, {"type": "a", "start": 31207, "end": 31226, "id": 991, "features": {"rel": "nofollow", "class": "external text", "href": "https://ui.adsabs.harvard.edu/abs/2016arXiv160202410J"}}, {"type": "span", "start": 31227, "end": 31227, "id": 992, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Exploring+the+Limits+of+Language+Modeling&rft.date=2016&rft_id=info%3Aarxiv%2F1602.02410&rft_id=info%3Abibcode%2F2016arXiv160202410J&rft.aulast=Jozefowicz&rft.aufirst=Rafal&rft.au=Vinyals%2C+Oriol&rft.au=Schuster%2C+Mike&rft.au=Shazeer%2C+Noam&rft.au=Wu%2C+Yonghui&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 31227, "end": 31227, "id": 993, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 31228, "end": 31307, "id": 994, "features": {"id": "cite_note-choe:emnlp16-10"}}, {"type": "span", "start": 31228, "end": 31229, "id": 995, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 31228, "end": 31229, "id": 996, "features": {}}, {"type": "a", "start": 31228, "end": 31229, "id": 997, "features": {"href": "#cite_ref-choe:emnlp16_10-0"}}, {"type": "span", "start": 31230, "end": 31306, "id": 998, "features": {"class": "reference-text"}}, {"type": "cite", "start": 31230, "end": 31306, "id": 999, "features": {"id": "CITEREFChoeCharniak", "class": "citation journal cs1"}}, {"type": "a", "start": 31263, "end": 31293, "id": 1000, "features": {"rel": "nofollow", "class": "external text", "href": "https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257"}}, {"type": "i", "start": 31295, "end": 31305, "id": 1001, "features": {}}, {"type": "span", "start": 31306, "end": 31306, "id": 1002, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Emnlp+2016&rft.atitle=Parsing+as+Language+Modeling&rft.aulast=Choe&rft.aufirst=Do+Kook&rft.au=Charniak%2C+Eugene&rft_id=https%3A%2F%2Faclanthology.coli.uni-saarland.de%2Fpapers%2FD16-1257%2Fd16-1257&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 31306, "end": 31306, "id": 1003, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 31307, "end": 31436, "id": 1004, "features": {"id": "cite_note-vinyals:nips15-11"}}, {"type": "span", "start": 31307, "end": 31308, "id": 1005, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 31307, "end": 31308, "id": 1006, "features": {}}, {"type": "a", "start": 31307, "end": 31308, "id": 1007, "features": {"href": "#cite_ref-vinyals:nips15_11-0"}}, {"type": "span", "start": 31309, "end": 31435, "id": 1008, "features": {"class": "reference-text"}}, {"type": "cite", "start": 31309, "end": 31435, "id": 1009, "features": {"id": "CITEREFVinyalsKaiser2014", "class": "citation journal cs1"}}, {"type": "a", "start": 31341, "end": 31372, "id": 1010, "features": {"rel": "nofollow", "class": "external text", "href": "https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf"}}, {"type": "span", "start": 31373, "end": 31378, "id": 1011, "features": {"class": "cs1-format"}}, {"type": "i", "start": 31380, "end": 31388, "id": 1012, "features": {}}, {"type": "a", "start": 31390, "end": 31395, "id": 1013, "features": {"href": "/wiki/ArXiv_(identifier)", "class": "mw-redirect", "title": "ArXiv (identifier)"}}, {"type": "span", "start": 31396, "end": 31405, "id": 1014, "features": {"class": "cs1-lock-free", "title": "Freely accessible"}}, {"type": "a", "start": 31396, "end": 31405, "id": 1015, "features": {"rel": "nofollow", "class": "external text", "href": "//arxiv.org/abs/1412.7449"}}, {"type": "a", "start": 31407, "end": 31414, "id": 1016, "features": {"href": "/wiki/Bibcode_(identifier)", "class": "mw-redirect", "title": "Bibcode (identifier)"}}, {"type": "a", "start": 31415, "end": 31434, "id": 1017, "features": {"rel": "nofollow", "class": "external text", "href": "https://ui.adsabs.harvard.edu/abs/2014arXiv1412.7449V"}}, {"type": "span", "start": 31435, "end": 31435, "id": 1018, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Nips2015&rft.atitle=Grammar+as+a+Foreign+Language&rft.date=2014&rft_id=info%3Aarxiv%2F1412.7449&rft_id=info%3Abibcode%2F2014arXiv1412.7449V&rft.aulast=Vinyals&rft.aufirst=Oriol&rft.au=Kaiser%2C+Lukasz&rft_id=https%3A%2F%2Fpapers.nips.cc%2Fpaper%2F5635-grammar-as-a-foreign-language.pdf&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 31435, "end": 31435, "id": 1019, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 31436, "end": 31569, "id": 1020, "features": {"id": "cite_note-winograd:shrdlu71-12"}}, {"type": "span", "start": 31436, "end": 31437, "id": 1021, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 31436, "end": 31437, "id": 1022, "features": {}}, {"type": "a", "start": 31436, "end": 31437, "id": 1023, "features": {"href": "#cite_ref-winograd:shrdlu71_12-0"}}, {"type": "span", "start": 31438, "end": 31568, "id": 1024, "features": {"class": "reference-text"}}, {"type": "cite", "start": 31438, "end": 31568, "id": 1025, "features": {"id": "CITEREFWinograd1971", "class": "citation thesis cs1"}}, {"type": "a", "start": 31462, "end": 31558, "id": 1026, "features": {"rel": "nofollow", "class": "external text", "href": "http://hci.stanford.edu/winograd/shrdlu/"}}, {"type": "i", "start": 31462, "end": 31558, "id": 1027, "features": {}}, {"type": "span", "start": 31568, "end": 31568, "id": 1028, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.title=Procedures+as+a+Representation+for+Data+in+a+Computer+Program+for+Understanding+Natural+Language&rft.date=1971&rft.aulast=Winograd&rft.aufirst=Terry&rft_id=http%3A%2F%2Fhci.stanford.edu%2Fwinograd%2Fshrdlu%2F&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 31568, "end": 31568, "id": 1029, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 31569, "end": 31744, "id": 1030, "features": {"id": "cite_note-schank77-13"}}, {"type": "span", "start": 31569, "end": 31570, "id": 1031, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 31569, "end": 31570, "id": 1032, "features": {}}, {"type": "a", "start": 31569, "end": 31570, "id": 1033, "features": {"href": "#cite_ref-schank77_13-0"}}, {"type": "span", "start": 31571, "end": 31743, "id": 1034, "features": {"class": "reference-text"}}, {"type": "cite", "start": 31571, "end": 31743, "id": 1035, "features": {"id": "CITEREFSchankAbelson1977", "class": "citation book cs1"}}, {"type": "i", "start": 31616, "end": 31700, "id": 1036, "features": {}}, {"type": "a", "start": 31722, "end": 31726, "id": 1037, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 31727, "end": 31742, "id": 1038, "features": {"href": "/wiki/Special:BookSources/0-470-99033-3", "title": "Special:BookSources/0-470-99033-3"}}, {"type": "bdi", "start": 31728, "end": 31742, "id": 1039, "features": {}}, {"type": "span", "start": 31743, "end": 31743, "id": 1040, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Scripts%2C+Plans%2C+Goals%2C+and+Understanding%3A+An+Inquiry+Into+Human+Knowledge+Structures&rft.place=Hillsdale&rft.pub=Erlbaum&rft.date=1977&rft.isbn=0-470-99033-3&rft.aulast=Schank&rft.aufirst=Roger+C.&rft.au=Abelson%2C+Robert+P.&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 31743, "end": 31743, "id": 1041, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 31744, "end": 31936, "id": 1042, "features": {"id": "cite_note-johnson:eacl:ilcl09-14"}}, {"type": "span", "start": 31744, "end": 31745, "id": 1043, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 31744, "end": 31745, "id": 1044, "features": {}}, {"type": "a", "start": 31744, "end": 31745, "id": 1045, "features": {"href": "#cite_ref-johnson:eacl:ilcl09_14-0"}}, {"type": "span", "start": 31746, "end": 31935, "id": 1046, "features": {"class": "reference-text"}}, {"type": "a", "start": 31746, "end": 31827, "id": 1047, "features": {"rel": "nofollow", "class": "external text", "href": "http://www.aclweb.org/anthology/W09-0103"}}, {"type": "li", "start": 31936, "end": 32003, "id": 1048, "features": {"id": "cite_note-resnik:langlog11-15"}}, {"type": "span", "start": 31936, "end": 31937, "id": 1049, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 31936, "end": 31937, "id": 1050, "features": {}}, {"type": "a", "start": 31936, "end": 31937, "id": 1051, "features": {"href": "#cite_ref-resnik:langlog11_15-0"}}, {"type": "span", "start": 31938, "end": 32002, "id": 1052, "features": {"class": "reference-text"}}, {"type": "a", "start": 31938, "end": 31970, "id": 1053, "features": {"rel": "nofollow", "class": "external text", "href": "http://languagelog.ldc.upenn.edu/nll/?p=2946"}}, {"type": "li", "start": 32003, "end": 32414, "id": 1054, "features": {"id": "cite_note-16"}}, {"type": "span", "start": 32003, "end": 32004, "id": 1055, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 32003, "end": 32004, "id": 1056, "features": {}}, {"type": "a", "start": 32003, "end": 32004, "id": 1057, "features": {"href": "#cite_ref-16"}}, {"type": "span", "start": 32005, "end": 32413, "id": 1058, "features": {"class": "reference-text"}}, {"type": "cite", "start": 32005, "end": 32102, "id": 1059, "features": {"id": "CITEREFSocher", "class": "citation web cs1"}}, {"type": "a", "start": 32022, "end": 32063, "id": 1060, "features": {"rel": "nofollow", "class": "external text", "href": "https://www.socher.org/index.php/Main/DeepLearningForNLP-ACL2012Tutorial"}}, {"type": "i", "start": 32065, "end": 32079, "id": 1061, "features": {}}, {"type": "span", "start": 32079, "end": 32101, "id": 1062, "features": {"class": "reference-accessdate"}}, {"type": "span", "start": 32091, "end": 32101, "id": 1063, "features": {"class": "nowrap"}}, {"type": "span", "start": 32102, "end": 32102, "id": 1064, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=unknown&rft.jtitle=www.socher.org&rft.atitle=Deep+Learning+For+NLP-ACL+2012+Tutorial&rft.aulast=Socher&rft.aufirst=Richard&rft_id=https%3A%2F%2Fwww.socher.org%2Findex.php%2FMain%2FDeepLearningForNLP-ACL2012Tutorial&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 32102, "end": 32102, "id": 1065, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 32414, "end": 32683, "id": 1066, "features": {"id": "cite_note-17"}}, {"type": "span", "start": 32414, "end": 32415, "id": 1067, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 32414, "end": 32415, "id": 1068, "features": {}}, {"type": "a", "start": 32414, "end": 32415, "id": 1069, "features": {"href": "#cite_ref-17"}}, {"type": "span", "start": 32416, "end": 32683, "id": 1070, "features": {"class": "reference-text"}}, {"type": "cite", "start": 32416, "end": 32683, "id": 1071, "features": {"id": "CITEREFYiTian2012", "class": "citation cs2"}}, {"type": "i", "start": 32517, "end": 32563, "id": 1072, "features": {}}, {"type": "a", "start": 32604, "end": 32613, "id": 1073, "features": {"href": "/wiki/CiteSeerX_(identifier)", "class": "mw-redirect", "title": "CiteSeerX (identifier)"}}, {"type": "span", "start": 32614, "end": 32628, "id": 1074, "features": {"class": "cs1-lock-free", "title": "Freely accessible"}}, {"type": "a", "start": 32614, "end": 32628, "id": 1075, "features": {"rel": "nofollow", "class": "external text", "href": "//citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.668.869"}}, {"type": "a", "start": 32630, "end": 32633, "id": 1076, "features": {"href": "/wiki/Doi_(identifier)", "class": "mw-redirect", "title": "Doi (identifier)"}}, {"type": "a", "start": 32634, "end": 32661, "id": 1077, "features": {"rel": "nofollow", "class": "external text", "href": "https://doi.org/10.1007%2F978-3-642-29364-1_2"}}, {"type": "a", "start": 32663, "end": 32667, "id": 1078, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 32668, "end": 32683, "id": 1079, "features": {"href": "/wiki/Special:BookSources/9783642293634", "title": "Special:BookSources/9783642293634"}}, {"type": "bdi", "start": 32669, "end": 32683, "id": 1080, "features": {}}, {"type": "span", "start": 32683, "end": 32683, "id": 1081, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Camera-Based+Document+Analysis+and+Recognition&rft.atitle=Assistive+Text+Reading+from+Complex+Background+for+Blind+Persons&rft.pages=15-28&rft.date=2012&rft_id=%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fsummary%3Fdoi%3D10.1.1.668.869&rft_id=info%3Adoi%2F10.1007%2F978-3-642-29364-1_2&rft.isbn=9783642293634&rft.aulast=Yi&rft.aufirst=Chucai&rft.au=Tian%2C+Yingli&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 32683, "end": 32683, "id": 1082, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 32683, "end": 32957, "id": 1083, "features": {"id": "cite_note-18"}}, {"type": "span", "start": 32683, "end": 32684, "id": 1084, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 32683, "end": 32684, "id": 1085, "features": {}}, {"type": "a", "start": 32683, "end": 32684, "id": 1086, "features": {"href": "#cite_ref-18"}}, {"type": "span", "start": 32685, "end": 32956, "id": 1087, "features": {"class": "reference-text"}}, {"type": "cite", "start": 32685, "end": 32930, "id": 1088, "features": {"id": "CITEREFKishorjitVidyaNirmalSivaji2012", "class": "citation journal cs1"}}, {"type": "a", "start": 32747, "end": 32781, "id": 1089, "features": {"rel": "nofollow", "class": "external text", "href": "http://aclweb.org/anthology//W/W12/W12-5008.pdf"}}, {"type": "span", "start": 32782, "end": 32787, "id": 1090, "features": {"class": "cs1-format"}}, {"type": "i", "start": 32789, "end": 32885, "id": 1091, "features": {}}, {"type": "span", "start": 32930, "end": 32930, "id": 1092, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Proceedings+of+the+3rd+Workshop+on+South+and+Southeast+Asian+Natural+Language+Processing+%28SANLP%29&rft.atitle=Manipuri+Morpheme+Identification&rft.pages=95-108&rft.date=2012&rft.aulast=Kishorjit&rft.aufirst=N.&rft.au=Vidya%2C+Raj+RK.&rft.au=Nirmal%2C+Y.&rft.au=Sivaji%2C+B.&rft_id=http%3A%2F%2Faclweb.org%2Fanthology%2F%2FW%2FW12%2FW12-5008.pdf&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "span", "start": 32930, "end": 32956, "id": 1093, "features": {"class": "cs1-maint citation-comment"}}, {"type": "a", "start": 32951, "end": 32955, "id": 1094, "features": {"href": "/wiki/Category:CS1_maint:_location", "title": "Category:CS1 maint: location"}}, {"type": "link", "start": 32956, "end": 32956, "id": 1095, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 32957, "end": 33132, "id": 1096, "features": {"id": "cite_note-19"}}, {"type": "span", "start": 32957, "end": 32958, "id": 1097, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 32957, "end": 32958, "id": 1098, "features": {}}, {"type": "a", "start": 32957, "end": 32958, "id": 1099, "features": {"href": "#cite_ref-19"}}, {"type": "span", "start": 32959, "end": 33131, "id": 1100, "features": {"class": "reference-text"}}, {"type": "cite", "start": 32959, "end": 33131, "id": 1101, "features": {"id": "CITEREFKleinManning2002", "class": "citation journal cs1"}}, {"type": "a", "start": 33003, "end": 33073, "id": 1102, "features": {"rel": "nofollow", "class": "external text", "href": "http://papers.nips.cc/paper/1945-natural-language-grammar-induction-using-a-constituent-context-model.pdf"}}, {"type": "span", "start": 33074, "end": 33079, "id": 1103, "features": {"class": "cs1-format"}}, {"type": "i", "start": 33081, "end": 33130, "id": 1104, "features": {}}, {"type": "span", "start": 33131, "end": 33131, "id": 1105, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Advances+in+Neural+Information+Processing+Systems&rft.atitle=Natural+language+grammar+induction+using+a+constituent-context+model&rft.date=2002&rft.aulast=Klein&rft.aufirst=Dan&rft.au=Manning%2C+Christopher+D.&rft_id=http%3A%2F%2Fpapers.nips.cc%2Fpaper%2F1945-natural-language-grammar-induction-using-a-constituent-context-model.pdf&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 33131, "end": 33131, "id": 1106, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 33132, "end": 33222, "id": 1107, "features": {"id": "cite_note-rte:11-20"}}, {"type": "span", "start": 33132, "end": 33133, "id": 1108, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 33132, "end": 33133, "id": 1109, "features": {}}, {"type": "a", "start": 33132, "end": 33133, "id": 1110, "features": {"href": "#cite_ref-rte:11_20-0"}}, {"type": "span", "start": 33134, "end": 33221, "id": 1111, "features": {"class": "reference-text"}}, {"type": "a", "start": 33190, "end": 33221, "id": 1112, "features": {"rel": "nofollow", "class": "external free", "href": "https://tac.nist.gov//2011/RTE/"}}, {"type": "li", "start": 33222, "end": 33284, "id": 1113, "features": {"id": "cite_note-21"}}, {"type": "span", "start": 33222, "end": 33223, "id": 1114, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 33222, "end": 33223, "id": 1115, "features": {}}, {"type": "a", "start": 33222, "end": 33223, "id": 1116, "features": {"href": "#cite_ref-21"}}, {"type": "span", "start": 33224, "end": 33283, "id": 1117, "features": {"class": "reference-text"}}, {"type": "cite", "start": 33224, "end": 33283, "id": 1118, "features": {"class": "citation web cs1"}}, {"type": "a", "start": 33224, "end": 33247, "id": 1119, "features": {"rel": "nofollow", "class": "external text", "href": "http://www.ubu.com/historical/racter/index.html"}}, {"type": "i", "start": 33249, "end": 33260, "id": 1120, "features": {}}, {"type": "span", "start": 33260, "end": 33282, "id": 1121, "features": {"class": "reference-accessdate"}}, {"type": "span", "start": 33272, "end": 33282, "id": 1122, "features": {"class": "nowrap"}}, {"type": "span", "start": 33283, "end": 33283, "id": 1123, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=unknown&rft.jtitle=www.ubu.com&rft.atitle=U+B+U+W+E+B+%3A%3A+Racter&rft_id=http%3A%2F%2Fwww.ubu.com%2Fhistorical%2Fracter%2Findex.html&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 33283, "end": 33283, "id": 1124, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 33284, "end": 33387, "id": 1125, "features": {"id": "cite_note-22"}}, {"type": "span", "start": 33284, "end": 33285, "id": 1126, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 33284, "end": 33285, "id": 1127, "features": {}}, {"type": "a", "start": 33284, "end": 33285, "id": 1128, "features": {"href": "#cite_ref-22"}}, {"type": "span", "start": 33286, "end": 33386, "id": 1129, "features": {"class": "reference-text"}}, {"type": "cite", "start": 33286, "end": 33386, "id": 1130, "features": {"id": "CITEREFWriter2019", "class": "citation book cs1"}}, {"type": "i", "start": 33307, "end": 33328, "id": 1131, "features": {}}, {"type": "a", "start": 33330, "end": 33333, "id": 1132, "features": {"href": "/wiki/Doi_(identifier)", "class": "mw-redirect", "title": "Doi (identifier)"}}, {"type": "a", "start": 33334, "end": 33359, "id": 1133, "features": {"rel": "nofollow", "class": "external text", "href": "https://doi.org/10.1007%2F978-3-030-16800-1"}}, {"type": "a", "start": 33361, "end": 33365, "id": 1134, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 33366, "end": 33385, "id": 1135, "features": {"href": "/wiki/Special:BookSources/978-3-030-16799-8", "title": "Special:BookSources/978-3-030-16799-8"}}, {"type": "bdi", "start": 33367, "end": 33385, "id": 1136, "features": {}}, {"type": "span", "start": 33386, "end": 33386, "id": 1137, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Lithium-Ion+Batteries&rft.date=2019&rft_id=info%3Adoi%2F10.1007%2F978-3-030-16800-1&rft.isbn=978-3-030-16799-8&rft.aulast=Writer&rft.aufirst=Beta&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 33386, "end": 33386, "id": 1138, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 33387, "end": 33633, "id": 1139, "features": {"id": "cite_note-23"}}, {"type": "span", "start": 33387, "end": 33388, "id": 1140, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 33387, "end": 33388, "id": 1141, "features": {}}, {"type": "a", "start": 33387, "end": 33388, "id": 1142, "features": {"href": "#cite_ref-23"}}, {"type": "span", "start": 33389, "end": 33632, "id": 1143, "features": {"class": "reference-text"}}, {"type": "cite", "start": 33389, "end": 33632, "id": 1144, "features": {"id": "CITEREFDuanCruz2011", "class": "citation journal cs1"}}, {"type": "a", "start": 33428, "end": 33511, "id": 1145, "features": {"rel": "nofollow", "class": "external text", "href": "https://web.archive.org/web/20111009135952/http://www.ijimt.org/abstract/100-E00187.htm"}}, {"type": "i", "start": 33513, "end": 33575, "id": 1146, "features": {}}, {"type": "b", "start": 33577, "end": 33578, "id": 1147, "features": {}}, {"type": "a", "start": 33605, "end": 33617, "id": 1148, "features": {"rel": "nofollow", "class": "external text", "href": "http://www.ijimt.org/abstract/100-E00187.htm"}}, {"type": "span", "start": 33632, "end": 33632, "id": 1149, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=International+Journal+of+Innovation%2C+Management+and+Technology&rft.atitle=Formalizing+Semantic+of+Natural+Language+through+Conceptualization+from+Existence&rft.volume=2&rft.issue=1&rft.pages=37-42&rft.date=2011&rft.aulast=Duan&rft.aufirst=Yucong&rft.au=Cruz%2C+Christophe&rft_id=http%3A%2F%2Fwww.ijimt.org%2Fabstract%2F100-E00187.htm&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 33632, "end": 33632, "id": 1150, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 33633, "end": 33836, "id": 1151, "features": {"id": "cite_note-24"}}, {"type": "span", "start": 33633, "end": 33634, "id": 1152, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 33633, "end": 33634, "id": 1153, "features": {}}, {"type": "a", "start": 33633, "end": 33634, "id": 1154, "features": {"href": "#cite_ref-24"}}, {"type": "span", "start": 33635, "end": 33835, "id": 1155, "features": {"class": "reference-text"}}, {"type": "cite", "start": 33635, "end": 33835, "id": 1156, "features": {"id": "CITEREFMittal2011", "class": "citation journal cs1"}}, {"type": "a", "start": 33650, "end": 33709, "id": 1157, "features": {"rel": "nofollow", "class": "external text", "href": "https://hal.archives-ouvertes.fr/hal-01104648/file/Mittal_VersatileQA_IJIIDS.pdf"}}, {"type": "span", "start": 33710, "end": 33715, "id": 1158, "features": {"class": "cs1-format"}}, {"type": "i", "start": 33717, "end": 33786, "id": 1159, "features": {}}, {"type": "b", "start": 33788, "end": 33789, "id": 1160, "features": {}}, {"type": "a", "start": 33804, "end": 33807, "id": 1161, "features": {"href": "/wiki/Doi_(identifier)", "class": "mw-redirect", "title": "Doi (identifier)"}}, {"type": "a", "start": 33808, "end": 33834, "id": 1162, "features": {"rel": "nofollow", "class": "external text", "href": "https://doi.org/10.1504%2FIJIIDS.2011.038968"}}, {"type": "span", "start": 33835, "end": 33835, "id": 1163, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=International+Journal+of+Intelligent+Information+and+Database+Systems&rft.atitle=Versatile+question+answering+systems%3A+seeing+in+synthesis&rft.volume=5&rft.issue=2&rft.pages=119-142&rft.date=2011&rft_id=info%3Adoi%2F10.1504%2FIJIIDS.2011.038968&rft.au=Mittal&rft_id=https%3A%2F%2Fhal.archives-ouvertes.fr%2Fhal-01104648%2Ffile%2FMittal_VersatileQA_IJIIDS.pdf&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 33835, "end": 33835, "id": 1164, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 33836, "end": 33925, "id": 1165, "features": {"id": "cite_note-25"}}, {"type": "span", "start": 33836, "end": 33837, "id": 1166, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 33836, "end": 33837, "id": 1167, "features": {}}, {"type": "a", "start": 33836, "end": 33837, "id": 1168, "features": {"href": "#cite_ref-25"}}, {"type": "span", "start": 33838, "end": 33924, "id": 1169, "features": {"class": "reference-text"}}, {"type": "cite", "start": 33838, "end": 33924, "id": 1170, "features": {"class": "citation web cs1"}}, {"type": "a", "start": 33838, "end": 33849, "id": 1171, "features": {"rel": "nofollow", "class": "external text", "href": "https://www.lexico.com/definition/cognition"}}, {"type": "i", "start": 33851, "end": 33857, "id": 1172, "features": {}}, {"type": "a", "start": 33859, "end": 33882, "id": 1173, "features": {"href": "/wiki/Oxford_University_Press", "title": "Oxford University Press"}}, {"type": "a", "start": 33887, "end": 33901, "id": 1174, "features": {"href": "/wiki/Dictionary.com", "title": "Dictionary.com"}}, {"type": "span", "start": 33901, "end": 33923, "id": 1175, "features": {"class": "reference-accessdate"}}, {"type": "span", "start": 33913, "end": 33918, "id": 1176, "features": {"class": "nowrap"}}, {"type": "span", "start": 33924, "end": 33924, "id": 1177, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=unknown&rft.jtitle=Lexico&rft.atitle=Cognition&rft_id=https%3A%2F%2Fwww.lexico.com%2Fdefinition%2Fcognition&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 33924, "end": 33924, "id": 1178, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 33925, "end": 34194, "id": 1179, "features": {"id": "cite_note-26"}}, {"type": "span", "start": 33925, "end": 33926, "id": 1180, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 33925, "end": 33926, "id": 1181, "features": {}}, {"type": "a", "start": 33925, "end": 33926, "id": 1182, "features": {"href": "#cite_ref-26"}}, {"type": "span", "start": 33927, "end": 34193, "id": 1183, "features": {"class": "reference-text"}}, {"type": "cite", "start": 33927, "end": 34193, "id": 1184, "features": {"class": "citation web cs1"}}, {"type": "a", "start": 33927, "end": 33956, "id": 1185, "features": {"rel": "nofollow", "class": "external text", "href": "http://www.aft.org/newspubs/periodicals/ae/summer2002/willingham.cfm"}}, {"type": "i", "start": 33958, "end": 33989, "id": 1186, "features": {}}, {"type": "q", "start": 34006, "end": 34193, "id": 1187, "features": {}}, {"type": "span", "start": 34193, "end": 34193, "id": 1188, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=unknown&rft.jtitle=American+Federation+of+Teachers&rft.atitle=Ask+the+Cognitive+Scientist&rft.date=2014-08-08&rft_id=http%3A%2F%2Fwww.aft.org%2Fnewspubs%2Fperiodicals%2Fae%2Fsummer2002%2Fwillingham.cfm&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 34193, "end": 34193, "id": 1189, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 34194, "end": 34333, "id": 1190, "features": {"id": "cite_note-27"}}, {"type": "span", "start": 34194, "end": 34195, "id": 1191, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 34194, "end": 34195, "id": 1192, "features": {}}, {"type": "a", "start": 34194, "end": 34195, "id": 1193, "features": {"href": "#cite_ref-27"}}, {"type": "span", "start": 34196, "end": 34332, "id": 1194, "features": {"class": "reference-text"}}, {"type": "cite", "start": 34196, "end": 34332, "id": 1195, "features": {"id": "CITEREFRobinson2008", "class": "citation book cs1"}}, {"type": "i", "start": 34220, "end": 34285, "id": 1196, "features": {}}, {"type": "a", "start": 34307, "end": 34311, "id": 1197, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 34312, "end": 34331, "id": 1198, "features": {"href": "/wiki/Special:BookSources/978-0-805-85352-0", "title": "Special:BookSources/978-0-805-85352-0"}}, {"type": "bdi", "start": 34313, "end": 34331, "id": 1199, "features": {}}, {"type": "span", "start": 34332, "end": 34332, "id": 1200, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Handbook+of+Cognitive+Linguistics+and+Second+Language+Acquisition&rft.pages=3-8&rft.pub=Routledge&rft.date=2008&rft.isbn=978-0-805-85352-0&rft.aulast=Robinson&rft.aufirst=Peter&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 34332, "end": 34332, "id": 1201, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 34333, "end": 34553, "id": 1202, "features": {"id": "cite_note-28"}}, {"type": "span", "start": 34333, "end": 34334, "id": 1203, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 34333, "end": 34334, "id": 1204, "features": {}}, {"type": "a", "start": 34333, "end": 34334, "id": 1205, "features": {"href": "#cite_ref-28"}}, {"type": "span", "start": 34335, "end": 34552, "id": 1206, "features": {"class": "reference-text"}}, {"type": "cite", "start": 34335, "end": 34552, "id": 1207, "features": {"id": "CITEREFLakoff1999", "class": "citation book cs1"}}, {"type": "i", "start": 34358, "end": 34490, "id": 1208, "features": {}}, {"type": "a", "start": 34527, "end": 34531, "id": 1209, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 34532, "end": 34551, "id": 1210, "features": {"href": "/wiki/Special:BookSources/978-0-465-05674-3", "title": "Special:BookSources/978-0-465-05674-3"}}, {"type": "bdi", "start": 34533, "end": 34551, "id": 1211, "features": {}}, {"type": "span", "start": 34552, "end": 34552, "id": 1212, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Philosophy+in+the+Flesh%3A+The+Embodied+Mind+and+Its+Challenge+to+Western+Philosophy%3B+Appendix%3A+The+Neural+Theory+of+Language+Paradigm&rft.pages=569-583&rft.pub=New+York+Basic+Books&rft.date=1999&rft.isbn=978-0-465-05674-3&rft.aulast=Lakoff&rft.aufirst=George&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 34552, "end": 34552, "id": 1213, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 34553, "end": 34687, "id": 1214, "features": {"id": "cite_note-29"}}, {"type": "span", "start": 34553, "end": 34554, "id": 1215, "features": {"class": "mw-cite-backlink"}}, {"type": "b", "start": 34553, "end": 34554, "id": 1216, "features": {}}, {"type": "a", "start": 34553, "end": 34554, "id": 1217, "features": {"href": "#cite_ref-29"}}, {"type": "span", "start": 34555, "end": 34686, "id": 1218, "features": {"class": "reference-text"}}, {"type": "cite", "start": 34555, "end": 34686, "id": 1219, "features": {"id": "CITEREFStrauss1999", "class": "citation book cs1"}}, {"type": "i", "start": 34580, "end": 34618, "id": 1220, "features": {}}, {"type": "a", "start": 34661, "end": 34665, "id": 1221, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 34666, "end": 34685, "id": 1222, "features": {"href": "/wiki/Special:BookSources/978-0-521-59541-4", "title": "Special:BookSources/978-0-521-59541-4"}}, {"type": "bdi", "start": 34667, "end": 34685, "id": 1223, "features": {}}, {"type": "span", "start": 34686, "end": 34686, "id": 1224, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=A+Cognitive+Theory+of+Cultural+Meaning&rft.pages=156-164&rft.pub=Cambridge+University+Press&rft.date=1999&rft.isbn=978-0-521-59541-4&rft.aulast=Strauss&rft.aufirst=Claudia&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 34686, "end": 34686, "id": 1225, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "h2", "start": 34687, "end": 34707, "id": 1226, "features": {"class": "section-heading", "onclick": "javascript:mfTempOpenSection(7)"}}, {"type": "div", "start": 34687, "end": 34687, "id": 1227, "features": {"class": "mw-ui-icon mw-ui-icon-element indicator mw-ui-icon-small mw-ui-icon-flush-left"}}, {"type": "span", "start": 34687, "end": 34702, "id": 1228, "features": {"class": "mw-headline", "id": "Further_reading"}}, {"type": "span", "start": 34702, "end": 34706, "id": 1229, "features": {"class": "mw-editsection"}}, {"type": "a", "start": 34702, "end": 34706, "id": 1230, "features": {"href": "/w/index.php?title=Natural_language_processing&action=edit&section=19", "title": "Edit section: Further reading", "data-section": "19", "class": "mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right"}}, {"type": "section", "start": 34707, "end": 36122, "id": 1231, "features": {"class": "mf-section-7 collapsible-block", "id": "mf-section-7"}}, {"type": "div", "start": 34707, "end": 36054, "id": 1232, "features": {"class": "refbegin reflist", "style": ""}}, {"type": "ul", "start": 34707, "end": 36054, "id": 1233, "features": {}}, {"type": "li", "start": 34707, "end": 34952, "id": 1234, "features": {}}, {"type": "cite", "start": 34707, "end": 34951, "id": 1235, "features": {"id": "CITEREFBates1995", "class": "citation journal cs1"}}, {"type": "a", "start": 34724, "end": 34766, "id": 1236, "features": {"rel": "nofollow", "class": "external text", "href": "//www.ncbi.nlm.nih.gov/pmc/articles/PMC40721"}}, {"type": "i", "start": 34768, "end": 34847, "id": 1237, "features": {}}, {"type": "b", "start": 34849, "end": 34851, "id": 1238, "features": {}}, {"type": "a", "start": 34869, "end": 34876, "id": 1239, "features": {"href": "/wiki/Bibcode_(identifier)", "class": "mw-redirect", "title": "Bibcode (identifier)"}}, {"type": "a", "start": 34877, "end": 34896, "id": 1240, "features": {"rel": "nofollow", "class": "external text", "href": "https://ui.adsabs.harvard.edu/abs/1995PNAS...92.9977B"}}, {"type": "a", "start": 34898, "end": 34901, "id": 1241, "features": {"href": "/wiki/Doi_(identifier)", "class": "mw-redirect", "title": "Doi (identifier)"}}, {"type": "a", "start": 34902, "end": 34925, "id": 1242, "features": {"rel": "nofollow", "class": "external text", "href": "https://doi.org/10.1073%2Fpnas.92.22.9977"}}, {"type": "a", "start": 34927, "end": 34930, "id": 1243, "features": {"href": "/wiki/PMC_(identifier)", "class": "mw-redirect", "title": "PMC (identifier)"}}, {"type": "span", "start": 34931, "end": 34936, "id": 1244, "features": {"class": "cs1-lock-free", "title": "Freely accessible"}}, {"type": "a", "start": 34931, "end": 34936, "id": 1245, "features": {"rel": "nofollow", "class": "external text", "href": "//www.ncbi.nlm.nih.gov/pmc/articles/PMC40721"}}, {"type": "a", "start": 34938, "end": 34942, "id": 1246, "features": {"href": "/wiki/PMID_(identifier)", "class": "mw-redirect", "title": "PMID (identifier)"}}, {"type": "a", "start": 34943, "end": 34950, "id": 1247, "features": {"rel": "nofollow", "class": "external text", "href": "//pubmed.ncbi.nlm.nih.gov/7479812"}}, {"type": "span", "start": 34951, "end": 34951, "id": 1248, "features": {"title": "ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+of+the+United+States+of+America&rft.atitle=Models+of+natural+language+understanding&rft.volume=92&rft.issue=22&rft.pages=9977-9982&rft.date=1995&rft_id=%2F%2Fwww.ncbi.nlm.nih.gov%2Fpmc%2Farticles%2FPMC40721&rft_id=info%3Apmid%2F7479812&rft_id=info%3Adoi%2F10.1073%2Fpnas.92.22.9977&rft_id=info%3Abibcode%2F1995PNAS...92.9977B&rft.aulast=Bates&rft.aufirst=M&rft_id=%2F%2Fwww.ncbi.nlm.nih.gov%2Fpmc%2Farticles%2FPMC40721&rfr_id=info%3Asid%2Fen.wikipedia.org%3ANatural+language+processing", "class": "Z3988"}}, {"type": "link", "start": 34951, "end": 34951, "id": 1249, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "li", "start": 34952, "end": 35083, "id": 1250, "features": {}}, {"type": "i", "start": 35002, "end": 35041, "id": 1251, "features": {}}, {"type": "link", "start": 35059, "end": 35059, "id": 1252, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "a", "start": 35059, "end": 35063, "id": 1253, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 35064, "end": 35081, "id": 1254, "features": {"href": "/wiki/Special:BookSources/978-0-596-51649-9", "title": "Special:BookSources/978-0-596-51649-9"}}, {"type": "li", "start": 35083, "end": 35219, "id": 1255, "features": {}}, {"type": "i", "start": 35127, "end": 35157, "id": 1256, "features": {}}, {"type": "link", "start": 35195, "end": 35195, "id": 1257, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "a", "start": 35195, "end": 35199, "id": 1258, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 35200, "end": 35217, "id": 1259, "features": {"href": "/wiki/Special:BookSources/978-0-13-187321-6", "title": "Special:BookSources/978-0-13-187321-6"}}, {"type": "li", "start": 35219, "end": 35383, "id": 1260, "features": {}}, {"type": "i", "start": 35249, "end": 35338, "id": 1261, "features": {}}, {"type": "link", "start": 35362, "end": 35362, "id": 1262, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "a", "start": 35362, "end": 35366, "id": 1263, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 35367, "end": 35381, "id": 1264, "features": {"href": "/wiki/Special:BookSources/978-1848218482", "title": "Special:BookSources/978-1848218482"}}, {"type": "li", "start": 35383, "end": 35555, "id": 1265, "features": {}}, {"type": "i", "start": 35413, "end": 35510, "id": 1266, "features": {}}, {"type": "link", "start": 35534, "end": 35534, "id": 1267, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "a", "start": 35534, "end": 35538, "id": 1268, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 35539, "end": 35553, "id": 1269, "features": {"href": "/wiki/Special:BookSources/978-1848219212", "title": "Special:BookSources/978-1848219212"}}, {"type": "li", "start": 35555, "end": 35775, "id": 1270, "features": {}}, {"type": "i", "start": 35627, "end": 35664, "id": 1271, "features": {}}, {"type": "link", "start": 35694, "end": 35694, "id": 1272, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "a", "start": 35694, "end": 35698, "id": 1273, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 35699, "end": 35716, "id": 1274, "features": {"href": "/wiki/Special:BookSources/978-0-521-86571-5", "title": "Special:BookSources/978-0-521-86571-5"}}, {"type": "a", "start": 35718, "end": 35774, "id": 1275, "features": {"rel": "nofollow", "class": "external text", "href": "http://nlp.stanford.edu/IR-book/"}}, {"type": "li", "start": 35775, "end": 35921, "id": 1276, "features": {}}, {"type": "i", "start": 35826, "end": 35880, "id": 1277, "features": {}}, {"type": "link", "start": 35897, "end": 35897, "id": 1278, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "a", "start": 35897, "end": 35901, "id": 1279, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 35902, "end": 35919, "id": 1280, "features": {"href": "/wiki/Special:BookSources/978-0-262-13360-9", "title": "Special:BookSources/978-0-262-13360-9"}}, {"type": "li", "start": 35921, "end": 36054, "id": 1281, "features": {}}, {"type": "i", "start": 35975, "end": 36011, "id": 1282, "features": {}}, {"type": "link", "start": 36030, "end": 36030, "id": 1283, "features": {"rel": "mw-deduplicated-inline-style", "href": "mw-data:TemplateStyles:r982806391"}}, {"type": "a", "start": 36030, "end": 36034, "id": 1284, "features": {"href": "/wiki/ISBN_(identifier)", "class": "mw-redirect", "title": "ISBN (identifier)"}}, {"type": "a", "start": 36035, "end": 36052, "id": 1285, "features": {"href": "/wiki/Special:BookSources/978-0-387-19557-5", "title": "Special:BookSources/978-0-387-19557-5"}}, {"type": "table", "start": 36054, "end": 36122, "id": 1286, "features": {"role": "presentation", "class": "mbox-small plainlinks sistersitebox", "style": "background-color:#f9f9f9;border:1px solid #aaa;color:#000"}}, {"type": "tbody", "start": 36054, "end": 36122, "id": 1287, "features": {}}, {"type": "tr", "start": 36054, "end": 36122, "id": 1288, "features": {}}, {"type": "td", "start": 36054, "end": 36121, "id": 1289, "features": {"class": "mbox-text plainlist"}}, {"type": "i", "start": 36093, "end": 36120, "id": 1290, "features": {}}, {"type": "b", "start": 36093, "end": 36120, "id": 1291, "features": {}}, {"type": "a", "start": 36093, "end": 36120, "id": 1292, "features": {"href": "https://commons.wikimedia.org/wiki/Category:Natural_language_processing", "class": "extiw", "title": "commons:Category:Natural language processing"}}, {"type": "noscript", "start": 36122, "end": 36122, "id": 1293, "features": {}}, {"type": "img", "start": 36122, "end": 36122, "id": 1294, "features": {"src": "//en.wikipedia.org/wiki/Special:CentralAutoLogin/start?type=1x1&mobile=1", "alt": "", "title": "", "width": "1", "height": "1", "style": "border: none; position: absolute;"}}, {"type": "div", "start": 36122, "end": 36227, "id": 1295, "features": {"class": "printfooter"}}, {"type": "a", "start": 36139, "end": 36225, "id": 1296, "features": {"dir": "ltr", "href": "https://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=983013403"}}, {"type": "div", "start": 36227, "end": 36227, "id": 1297, "features": {"class": "post-content", "id": "page-secondary-actions"}}, {"type": "footer", "start": 36227, "end": 36791, "id": 1298, "features": {"class": "mw-footer minerva-footer", "role": "contentinfo"}}, {"type": "div", "start": 36227, "end": 36268, "id": 1299, "features": {"class": "last-modified-bar"}}, {"type": "div", "start": 36227, "end": 36268, "id": 1300, "features": {"class": "post-content last-modified-bar__content"}}, {"type": "span", "start": 36227, "end": 36227, "id": 1301, "features": {"class": "last-modified-bar__icon mw-ui-icon mw-ui-icon-mw-ui-icon-small mw-ui-icon-wikimedia-history-base20 "}}, {"type": "a", "start": 36227, "end": 36268, "id": 1302, "features": {"class": "last-modified-bar__text modified-enhancement", "href": "/wiki/Special:History/Natural_language_processing", "data-user-name": "Tom.Reding", "data-user-gender": "male", "data-timestamp": "1602441907"}}, {"type": "span", "start": 36227, "end": 36267, "id": 1303, "features": {}}, {"type": "span", "start": 36268, "end": 36268, "id": 1304, "features": {"class": "mw-ui-icon mw-ui-icon-small mw-ui-icon-mf-expand-gray mf-mw-ui-icon-rotate-anti-clockwise indicator"}}, {"type": "div", "start": 36268, "end": 36791, "id": 1305, "features": {"class": "post-content footer-content"}}, {"type": "div", "start": 36268, "end": 36268, "id": 1306, "features": {"id": "mw-data-after-content"}}, {"type": "div", "start": 36268, "end": 36268, "id": 1307, "features": {"class": "read-more-container"}}, {"type": "h2", "start": 36268, "end": 36268, "id": 1308, "features": {}}, {"type": "img", "start": 36268, "end": 36268, "id": 1309, "features": {"src": "/static/images/mobile/copyright/wikipedia-wordmark-en.svg", "width": "119", "height": "18", "alt": "Wikipedia"}}, {"type": "div", "start": 36268, "end": 36332, "id": 1310, "features": {"class": "license"}}, {"type": "a", "start": 36295, "end": 36307, "id": 1311, "features": {"class": "external", "rel": "nofollow", "href": "https://creativecommons.org/licenses/by-sa/3.0/"}}, {"type": "ul", "start": 36332, "end": 36670, "id": 1312, "features": {"class": "footer-info hlist hlist-separated"}}, {"type": "li", "start": 36332, "end": 36395, "id": 1313, "features": {"id": "footer-info-lastmod"}}, {"type": "span", "start": 36387, "end": 36393, "id": 1314, "features": {"class": "anonymous-show"}}, {"type": "li", "start": 36395, "end": 36670, "id": 1315, "features": {"id": "footer-info-copyright"}}, {"type": "a", "start": 36423, "end": 36470, "id": 1316, "features": {"rel": "license", "href": "//en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License"}}, {"type": "a", "start": 36470, "end": 36470, "id": 1317, "features": {"rel": "license", "href": "//creativecommons.org/licenses/by-sa/3.0/", "style": "display:none;"}}, {"type": "a", "start": 36538, "end": 36550, "id": 1318, "features": {"href": "//foundation.wikimedia.org/wiki/Terms_of_Use"}}, {"type": "a", "start": 36555, "end": 36569, "id": 1319, "features": {"href": "//foundation.wikimedia.org/wiki/Privacy_policy"}}, {"type": "a", "start": 36615, "end": 36641, "id": 1320, "features": {"href": "//www.wikimediafoundation.org/"}}, {"type": "ul", "start": 36670, "end": 36791, "id": 1321, "features": {"class": "footer-places hlist hlist-separated"}}, {"type": "li", "start": 36670, "end": 36685, "id": 1322, "features": {"id": "footer-places-privacy"}}, {"type": "a", "start": 36670, "end": 36684, "id": 1323, "features": {"href": "https://foundation.wikimedia.org/wiki/Privacy_policy", "class": "extiw", "title": "wmf:Privacy policy"}}, {"type": "li", "start": 36685, "end": 36701, "id": 1324, "features": {"id": "footer-places-about"}}, {"type": "a", "start": 36685, "end": 36700, "id": 1325, "features": {"href": "/wiki/Wikipedia:About", "title": "Wikipedia:About"}}, {"type": "li", "start": 36701, "end": 36713, "id": 1326, "features": {"id": "footer-places-disclaimer"}}, {"type": "a", "start": 36701, "end": 36712, "id": 1327, "features": {"href": "/wiki/Wikipedia:General_disclaimer", "title": "Wikipedia:General disclaimer"}}, {"type": "li", "start": 36713, "end": 36731, "id": 1328, "features": {"id": "footer-places-contact"}}, {"type": "a", "start": 36713, "end": 36730, "id": 1329, "features": {"href": "//en.wikipedia.org/wiki/Wikipedia:Contact_us"}}, {"type": "li", "start": 36731, "end": 36744, "id": 1330, "features": {"id": "footer-places-terms-use"}}, {"type": "a", "start": 36731, "end": 36743, "id": 1331, "features": {"href": "//m.wikimediafoundation.org/wiki/Terms_of_Use"}}, {"type": "li", "start": 36744, "end": 36752, "id": 1332, "features": {"id": "footer-places-desktop-toggle"}}, {"type": "a", "start": 36744, "end": 36751, "id": 1333, "features": {"id": "mw-mf-display-toggle", "href": "//en.wikipedia.org/w/index.php?title=Natural_language_processing&mobileaction=toggle_view_desktop"}}, {"type": "li", "start": 36752, "end": 36763, "id": 1334, "features": {"id": "footer-places-developers"}}, {"type": "a", "start": 36752, "end": 36762, "id": 1335, "features": {"href": "https://www.mediawiki.org/wiki/Special:MyLanguage/How_to_contribute"}}, {"type": "li", "start": 36763, "end": 36774, "id": 1336, "features": {"id": "footer-places-statslink"}}, {"type": "a", "start": 36763, "end": 36773, "id": 1337, "features": {"href": "https://stats.wikimedia.org/#/en.wikipedia.org"}}, {"type": "li", "start": 36774, "end": 36791, "id": 1338, "features": {"id": "footer-places-cookiestatement"}}, {"type": "a", "start": 36774, "end": 36790, "id": 1339, "features": {"href": "https://foundation.wikimedia.org/wiki/Cookie_statement"}}, {"type": "div", "start": 36791, "end": 36791, "id": 1340, "features": {"class": "mw-notification-area", "data-mw": "interface"}}], "next_annid": 1341}}, "text": "\nNatural language processing - Wikipedia\nOpen main menu\nHome\nRandom\nNearby\nLog in\nSettings\nDonate\nAbout Wikipedia\nDisclaimers\nSearch\nNatural language processing\nLanguage\nWatch\nEdit\nNatural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.\n \nAn automated online assistant providing customer service on a web page, an example of an application where natural language processing is a major component.[1]\nChallenges in natural language processing frequently involve speech recognition, natural language understanding, and natural-language generation.\nContents\n1 History\n1.1 Symbolic NLP (1950s - early 1990s)\n1.2 Statistical NLP (1990s - 2010s)\n1.3 Neural NLP (present)\n2 Methods: Rules, statistics, neural networks\n2.1 Statistical methods\n2.2 Neural networks\n3 Common NLP Tasks\n3.1 Text and speech processing\n3.2 Morphological analysis\n3.3 Syntactic analysis\n3.4 Lexical semantics (of individual words in context)\n3.5 Relational semantics (semantics of individual sentences)\n3.6 Discourse (semantics beyond individual sentences)\n3.7 Higher-level NLP applications\n4 Cognition and NLP\n5 See also\n6 References\n7 Further reading\nHistoryEdit\nFurther information: History of natural language processing\nNatural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled \"Computing Machinery and Intelligence\" which proposed what is now called the Turing test as a criterion of intelligence, a task that involves the automated interpretation and generation of natural language, but at the time not articulated as a problem separate from artificial intelligence.\nSymbolic NLP (1950s - early 1990s)Edit\nThe premise of symbolic NLP is well-summarized by John Searle's Chinese room experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it is confronted with.\n1950s: The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem.[2]  However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced.  Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.\n1960s: Some notably successful natural language processing systems developed in the 1960s were SHRDLU, a natural language system working in restricted \"blocks worlds\" with restricted vocabularies, and ELIZA, a simulation of a Rogerian psychotherapist, written by Joseph Weizenbaum between 1964 and 1966.  Using almost no information about human thought or emotion, ELIZA sometimes provided a startlingly human-like interaction. When the \"patient\" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to \"My head hurts\" with \"Why do you say your head hurts?\".\n1970s: During the 1970s, many programmers began to write \"conceptual ontologies\", which structured real-world information into computer-understandable data.  Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981).  During this time, the first many chatterbots were written (e.g., PARRY).\n1980s: The 1980s and early 1990s mark the hey-day of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of HPSG as a computational operationalization of generative grammar), morphology (e.g., two-level morphology[3]), semantics (e.g., Lesk algorithm), reference (e.g., within Centering Theory[4]) and other areas of natural language understanding (e.g., in the Rhetorical Structure Theory). Other lines of research were continued, e.g., the development of chatterbots with Racter and Jabberwacky. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period.[5]\nStatistical NLP (1990s - 2010s)Edit\nUp to the 1980s, most natural language processing systems were based on complex sets of hand-written rules.  Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing.  This was due to both the steady increase in computational power (see Moore's law) and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing.[6]\n1990s: Many of the notable early successes on statistical methods in NLP occurred in the field of machine translation, due especially to work at IBM Research.  These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government.  However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data.\n2000s: With the growth of the web, increasing amounts of raw (unannotated) language data has become available since the mid-1990s. Research has thus increasingly focused on unsupervised and semi-supervised learning algorithms.  Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data.  Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data.  However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results if the algorithm used has a low enough time complexity to be practical.\nNeural NLP (present)Edit\nIn the 2010s, representation learning and deep neural network-style machine learning methods became widespread in natural language processing, due in part to a flurry of results showing that such techniques[7][8] can achieve state-of-the-art results in many natural language tasks, for example in language modeling,[9] parsing,[10][11] and many others.\nMethods: Rules, statistics, neural networksEdit\nIn the early days, many language-processing systems were designed by symbolic methods, i.e., the hand-coding of a set of rules, coupled with a dictionary lookup:[12][13] such as by writing grammars or devising heuristic rules for stemming.\nMore recent systems based on machine-learning algorithms have many advantages over hand-produced rules: \nThe learning procedures used during machine learning automatically focus on the most common cases, whereas when writing rules by hand it is often not at all obvious where the effort should be directed.\nAutomatic learning procedures can make use of statistical inference algorithms to produce models that are robust to unfamiliar input (e.g. containing words or structures that have not been seen before) and to erroneous input (e.g. with misspelled words or words accidentally omitted). Generally, handling such input gracefully with handwritten rules, or, more generally, creating systems of handwritten rules that make soft decisions, is extremely difficult, error-prone and time-consuming.\nSystems based on automatically learning the rules can be made more accurate simply by supplying more input data. However, systems based on handwritten rules can only be made more accurate by increasing the complexity of the rules, which is a much more difficult task. In particular, there is a limit to the complexity of systems based on handwritten rules, beyond which the systems become more and more unmanageable. However, creating more data to input to machine-learning systems simply requires a corresponding increase in the number of man-hours worked, generally without significant increases in the complexity of the annotation process.\nDespite the popularity of machine learning in NLP research, symbolic methods are still (2020) commonly used\nwhen the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the Apertium system,\nfor preprocessing in NLP pipelines, e.g., tokenization, or\nfor postprocessing and transforming the output of NLP pipelines, e.g., for knowledge extraction from syntactic parses.\nStatistical methodsEdit\nSince the so-called \"statistical revolution\"[14][15] in the late 1980s and mid-1990s, much natural language processing research has relied heavily on machine learning. The machine-learning paradigm calls instead for using statistical inference to automatically learn such rules through the analysis of large corpora (the plural form of corpus, is a set of documents, possibly with human or computer annotations) of typical real-world examples.\nMany different classes of machine-learning algorithms have been applied to natural-language-processing tasks. These algorithms take as input a large set of \"features\" that are generated from the input data. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to each input feature. Such models have the advantage that they can express the relative certainty of many different possible answers rather than only one, producing more reliable results when such a model is included as a component of a larger system.\nSome of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules.  However, part-of-speech tagging introduced the use of hidden Markov models to natural language processing, and increasingly, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models.  Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.\nSince the neural turn, statistical methods in NLP research have been largely replaced by neural networks. However, they continue to be relevant for contexts in which statistical interpretability and transparency is required.\nNeural networksEdit\nFurther information: Artificial neural network\nA major drawback of statistical methods is that they require elaborate feature engineering. Since the early 2010s,[16] the field has thus largely abandoned statistical methods and shifted to neural networks for machine learning. Popular techniques include the use of word embeddings to capture semantic properties of words, and an increase in end-to-end learning of a higher-level task (e.g., question answering) instead of relying on a pipeline of separate intermediate tasks (e.g., part-of-speech tagging and dependency parsing). In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural network-based approaches may be viewed as a new paradigm distinct from statistical natural language processing. For instance, the term neural machine translation (NMT) emphasizes the fact that deep learning-based approaches to machine translation directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word alignment and language modeling that was used in statistical machine translation (SMT).\nCommon NLP TasksEdit\nThe following is a list of some of the most commonly researched tasks in natural language processing. Some of these tasks have direct real-world applications, while others more commonly serve as subtasks that are used to aid in solving larger tasks.\nThough natural language processing tasks are closely intertwined, they can be subdivided into categories for convenience. A coarse division is given below.\nText and speech processingEdit\nOptical character recognition (OCR)\nGiven an image representing printed text, determine the corresponding text.Speech recognition\nGiven a sound clip of a person or people speaking, determine the textual representation of the speech.  This is the opposite of text to speech and is one of the extremely difficult problems colloquially termed \"AI-complete\" (see above).  In natural speech there are hardly any pauses between successive words, and thus speech segmentation is a necessary subtask of speech recognition (see below). In most spoken languages, the sounds representing successive letters blend into each other in a process termed coarticulation, so the conversion of the analog signal to discrete characters can be a very difficult process. Also, given that words in the same language are spoken by people with different accents, the speech recognition software must be able to recognize the wide variety of input as being identical to each other in terms of its textual equivalent.\nSpeech segmentation\nGiven a sound clip of a person or people speaking, separate it into words.  A subtask of speech recognition and typically grouped with it.Text-to-speech\nGiven a text, transform those units and produce a spoken representation. Text-to-speech can be used to aid the visually impaired.[17]Word segmentation (Tokenization)\nSeparate a chunk of continuous text into separate words. For a language like English, this is fairly trivial, since words are usually separated by spaces. However, some written languages like Chinese, Japanese and Thai do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the vocabulary and morphology of words in the language. Sometimes this process is also used in cases like bag of words (BOW) creation in data mining.\nMorphological analysisEdit\nLemmatization\nThe task of removing inflectional endings only and to return the base dictionary form of a word which is also known as a lemma.\nMorphological segmentation\nSeparate words into individual morphemes and identify the class of the morphemes. The difficulty of this task depends greatly on the complexity of the morphology (i.e., the structure of words) of the language being considered. English has fairly simple morphology, especially inflectional morphology, and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g., \"open, opens, opened, opening\") as separate words. In languages such as Turkish or Meitei,[18] a highly agglutinated Indian language, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms.\nPart-of-speech tagging\nGiven a sentence, determine the part of speech (POS) for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, \"book\" can be a noun (\"the book on the table\") or verb (\"to book a flight\"); \"set\" can be a noun, verb or adjective; and \"out\" can be any of at least five different parts of speech. Some languages have more such ambiguity than others.[dubious  \u2013 discuss] Languages with little inflectional morphology, such as English, are particularly prone to such ambiguity. Chinese is prone to such ambiguity because it is a tonal language during verbalization. Such inflection is not readily conveyed via the entities employed within the orthography to convey the intended meaning.Stemming\nThe process of reducing inflected (or sometimes derived) words to their root form. (e.g., \"close\" will be the root for \"closed\", \"closing\", \"close\", \"closer\" etc.).\nSyntactic analysisEdit\nGrammar induction[19]\nGenerate a formal grammar that describes a language's syntax.\nSentence breaking (also known as \"sentence boundary disambiguation\")\nGiven a chunk of text, find the sentence boundaries. Sentence boundaries are often marked by periods or other punctuation marks, but these same characters can serve other purposes (e.g., marking abbreviations).\nParsing\nDetermine the parse tree (grammatical analysis) of a given sentence. The grammar for natural languages is ambiguous and typical sentences have multiple possible analyses: perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human). There are two primary types of parsing: dependency parsing and constituency parsing. Dependency parsing focuses on the relationships between words in a sentence (marking things like primary objects and predicates), whereas constituency parsing focuses on building out the parse tree using a probabilistic context-free grammar (PCFG) (see also stochastic grammar).\nLexical semantics (of individual words in context)Edit\nLexical semantics\nWhat is the computational meaning of individual words in context?\nDistributional semantics\nHow can we learn semantic representations from data?\nNamed entity recognition (NER)\nGiven a stream of text, determine which items in the text map to proper names, such as people or places, and what the type of each such name is (e.g. person, location, organization). Although capitalization can aid in recognizing named entities in languages such as English, this information cannot aid in determining the type of named entity, and in any case, is often inaccurate or insufficient.  For example, the first letter of a sentence is also capitalized, and named entities often span several words, only some of which are capitalized.  Furthermore, many other languages in non-Western scripts (e.g. Chinese or Arabic) do not have any capitalization at all, and even languages with capitalization may not consistently use it to distinguish names. For example, German capitalizes all nouns, regardless of whether they are names, and French and Spanish do not capitalize names that serve as adjectives.Sentiment analysis (see also multimodal sentiment analysis)\nExtract subjective information usually from a set of documents, often using online reviews to determine \"polarity\" about specific objects. It is especially useful for identifying trends of public opinion in social media, for marketing.Terminology extractionThe goal of terminology extraction is to automatically extract relevant terms from a given corpus.\nWord sense disambiguation\nMany words have more than one meaning; we have to select the meaning which makes the most sense in context.  For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary or an online resource such as WordNet.\nRelational semantics (semantics of individual sentences)Edit\nRelationship extraction\nGiven a chunk of text, identify the relationships among named entities (e.g. who is married to whom).\nSemantic Parsing\nGiven a piece of text (typically a sentence), produce a formal representation of its semantics, either as a graph (e.g., in AMR parsing) or in accordance with a logical formalism (e.g., in DRT parsing). This challenge typically includes aspects of several more elementary NLP tasks from semantics (e.g., semantic role labelling, word sense disambiguation) and can be extended to include full-fledged discourse analysis (e.g., discourse analysis, coreference; see Natural Language Understanding below).\nSemantic Role Labelling (see also implicit semantic role labelling below)\nGiven a single sentence, identify and disambiguate semantic predicates (e.g., verbal frames), then identify and classify the frame elements (semantic roles).\nDiscourse (semantics beyond individual sentences)Edit\nCoreference resolution\nGiven a sentence or larger chunk of text, determine which words (\"mentions\") refer to the same objects (\"entities\"). Anaphora resolution is a specific example of this task, and is specifically concerned with matching up pronouns with the nouns or names to which they refer. The more general task of coreference resolution also includes identifying so-called \"bridging relationships\" involving referring expressions. For example, in a sentence such as \"He entered John's house through the front door\", \"the front door\" is a referring expression and the bridging relationship to be identified is the fact that the door being referred to is the front door of John's house (rather than of some other structure that might also be referred to).\nDiscourse analysis\nThis rubric includes several related tasks.  One task is discourse parsing, i.e., identifying the discourse structure of a connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation, contrast).  Another possible task is recognizing and classifying the speech acts in a chunk of text (e.g. yes-no question, content question, statement, assertion, etc.).Implicit Semantic Role Labelling\nGiven a single sentence, identify and disambiguate semantic predicates (e.g., verbal frames) and their explicit semantic roles in the current sentence (see Semantic Role Labelling above). Then, identify semantic roles that are not explicitly realized in the current sentence, classify them into arguments that are explicitly realized elsewhere in the text and those that are not specified, and resolve the former against the local text. A closely related task is zero anaphora resolution, i.e., the extension of coreference resolution to pro-drop languages.Recognizing Textual entailment\nGiven two text fragments, determine if one being true entails the other, entails the other's negation, or allows the other to be either true or false.[20]Topic segmentation and recognition\nGiven a chunk of text, separate it into segments each of which is devoted to a topic, and identify the topic of the segment.\nHigher-level NLP applicationsEdit\nAutomatic summarization (text summarization)\nProduce a readable summary of a chunk of text.  Often used to provide summaries of the text of a known type, such as research papers, articles in the financial section of a newspaper.\nBook generation\nNot an NLP task proper but an extension of Natural Language Generation and other NLP tasks is the creation of full-fledged books. The first machine-generated book was created by a rule-based system in 1984 (Racter, The policemen's beard is half-constructed).[21] The first published work by a neural network was published in 2018, 1 the Road, marketed as a novel, contains sixty million words. Both these systems are basically elaborate but non-sensical (semantics-free) language models. The first machine-generated science book was published in 2019 (Beta Writer, Lithium-Ion Batteries, Springer, Cham).[22] Unlike Racter and 1 the Road, this is grounded on factual knowledge and based on text summarization.\nDialogue management\nComputer systems intended to converse with a human.\nMachine translation\nAutomatically translate text from one human language to another.  This is one of the most difficult problems, and is a member of a class of problems colloquially termed \"AI-complete\", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) to solve properly.\nNatural language generation (NLG):\nConvert information from computer databases or semantic intents into readable human language.\nNatural language understanding (NLU)\nConvert chunks of text into more formal representations such as first-order logic structures that are easier for computer programs to manipulate. Natural language understanding involves the identification of the intended semantic from the multiple possible semantics which can be derived from a natural language expression which usually takes the form of organized notations of natural language concepts. Introduction and creation of language metamodel and ontology are efficient however empirical solutions. An explicit formalization of natural language semantics without confusions with implicit assumptions such as closed-world assumption (CWA) vs. open-world assumption, or subjective Yes/No vs. objective True/False is expected for the construction of a basis of semantics formalization.[23]\nQuestion answering\nGiven a human-language question, determine its answer.  Typical questions have a specific right answer (such as \"What is the capital of Canada?\"), but sometimes open-ended questions are also considered (such as \"What is the meaning of life?\"). Recent works have looked at even more complex questions.[24]\nCognition and NLPEdit\nCognition refers to \"the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses.\"[25] Cognitive science is the interdisciplinary, scientific study of the mind and its processes.[26] Cognitive linguistics is an interdisciplinary branch of linguistics, combining knowledge and research from both psychology and linguistics.[27] George Lakoff offers a methodology to build Natural language processing (NLP) algorithms through the perspective of Cognitive science, along with the findings of Cognitive linguistics:[28]\nThe first defining aspect of this cognitive task of NLP is the application of the theory of Conceptual metaphor, explained by Lakoff as \u201cthe understanding of one idea, in terms of another\u201d which provides an idea of the intent of the author.[29]\nFor example, consider some of the meanings, in English, of the word \u201cbig\u201d. When used as a Comparative, as in \u201cThat is a big tree,\u201d a likely inference of the intent of the author is that the author is using the word \u201cbig\u201d to imply a statement about the tree being \u201dphysically large\u201d in comparison to other trees or the authors experience.  When used as a Stative verb, as in \u201dTomorrow is a big day\u201d, a likely inference of the author\u2019s intent it that \u201dbig\u201d is being used to imply \u201dimportance\u201d.  These examples are not presented to be complete, but merely as indicators of the implication of the idea of Conceptual metaphor.  The intent behind other usages, like in \u201dShe is a big person\u201d will remain somewhat ambiguous to a person and a cognitive NLP algorithm alike without additional information.\nThis leads to the second defining aspect of this cognitive task of NLP, namely Probabilistic context-free grammar (PCFG) which enables cognitive NLP algorithms to assign relative measures of meaning  to a word, phrase, sentence or piece of text based on the information presented before and after the piece of text being analyzed. The mathematical equation for such algorithms is presented in US patent 9269353\u00a0:\nR\nM\nM\n(\nt\no\nk\ne\nn\nN\n)\n=\nP\nM\nM\n(\nt\no\nk\ne\nn\nN\n)\n\u00d7\n1\n2\nd\n(\n\u2211\ni\n=\n\u2212\nd\nd\n(\n(\nP\nM\nM\n(\nt\no\nk\ne\nn\nN\n\u2212\n1\n)\n\u00d7\nP\nF\n(\nt\no\nk\ne\nn\nN\n,\nt\no\nk\ne\nn\nN\n\u2212\n1\n)\n)\ni\n)\n{\\displaystyle {RMM(token_{N})}={PMM(token_{N})}\\times {\\frac {1}{2d}}\\left(\\sum _{i=-d}^{d}{((PMM(token_{N-1})}\\times {PF(token_{N},token_{N-1}))_{i}}\\right)}\n\u00a0\nWhere,\n\u00a0 \u2003 \u00a0RMM, is the Relative Measure of Meaning\n\u00a0 \u2003 \u00a0token, is any block of text, sentence, phrase or word\n\u00a0 \u2003 \u00a0N, is the number of tokens being analyzed\n\u00a0 \u2003 \u00a0PMM, is the Probable Measure of Meaning based on a corpora\n\u00a0 \u2003 \u00a0d, is the location of the token along the sequence of N-1 tokens\n\u00a0 \u2003 \u00a0PF, is the Probability Function specific to a language\nSee alsoEdit\n1 the Road\nAutomated essay scoring\nBiomedical text mining\nCompound term processing\nComputational linguistics\nComputer-assisted reviewing\nControlled natural language\nDeep learning\nDeep linguistic processing\nDistributional semantics\nForeign language reading aid\nForeign language writing aid\nInformation extraction\nInformation retrieval\nLanguage and Communication Technologies\nLanguage technology\nLatent semantic indexing\nNative-language identification\nNatural language programming\nNatural language search\nOutline of natural language processing\nQuery expansion\nQuery understanding\nReification (linguistics)\nSpeech processing\nSpoken dialogue system\nText-proofing\nText simplification\nTransformer (machine learning model)\nTruecasing\nQuestion answering\nWord2vec\nReferencesEdit\n^ Kongthon, Alisa; Sangkeettrakarn, Chatchawal; Kongyoung, Sarawoot; Haruechaiyasak, Choochart (October 27\u201330, 2009). Implementing an online help desk system based on conversational agent. MEDES '09: The International Conference on Management of Emergent Digital EcoSystems. France: ACM. doi:10.1145/1643823.1643908.\n^ Hutchins, J. (2005). \"The history of machine translation in a nutshell\" (PDF).[self-published source]\n^ Koskenniemi, Kimmo (1983), Two-level morphology: A general computational model of word-form recognition and production (PDF), Department of General Linguistics, University of Helsinki\n^ Joshi, A. K., & Weinstein, S. (1981, August). Control of Inference: Role of Some Aspects of Discourse Structure-Centering. In IJCAI (pp. 385-387).\n^ Guida, G.; Mauri, G. (July 1986). \"Evaluation of natural language processing systems: Issues and approaches\". Proceedings of the IEEE. 74 (7): 1026\u20131035. doi:10.1109/PROC.1986.13580. ISSN\u00a01558-2256. S2CID\u00a030688575.\n^ Chomskyan linguistics encourages the investigation of \"corner cases\" that stress the limits of its theoretical models (comparable to pathological phenomena in mathematics), typically created using thought experiments, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in corpus linguistics.  The creation and use of such corpora of real-world data is a fundamental part of machine-learning algorithms for natural language processing.  In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called \"poverty of the stimulus\" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing.  As a result, the Chomskyan paradigm discouraged the application of such models to language processing.\n^ Goldberg, Yoav (2016). \"A Primer on Neural Network Models for Natural Language Processing\". Journal of Artificial Intelligence Research. 57: 345\u2013420. arXiv:1807.10854. doi:10.1613/jair.4992. S2CID\u00a08273530.\n^ Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. MIT Press.\n^ Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui (2016). Exploring the Limits of Language Modeling. arXiv:1602.02410. Bibcode:2016arXiv160202410J.\n^ Choe, Do Kook; Charniak, Eugene. \"Parsing as Language Modeling\". Emnlp 2016.\n^ Vinyals, Oriol;  et al. (2014). \"Grammar as a Foreign Language\" (PDF). Nips2015. arXiv:1412.7449. Bibcode:2014arXiv1412.7449V.\n^ Winograd, Terry (1971). Procedures as a Representation for Data in a Computer Program for Understanding Natural Language (Thesis).\n^ Schank, Roger C.; Abelson, Robert P. (1977). Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures. Hillsdale: Erlbaum. ISBN\u00a0\n0-470-99033-3\n.\n^ Mark Johnson. How the statistical revolution changes (computational) linguistics. Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics.\n^ Philip Resnik. Four revolutions. Language Log, February 5, 2011.\n^ Socher, Richard. \"Deep Learning For NLP-ACL 2012 Tutorial\". www.socher.org. Retrieved 2020-08-17. This was an early Deep Learning tutorial at the ACL 2012, and met with both interest and (at the time) scepticism by most participants. Until then, neural learning was basically rejected because of its lack of statistical interpretability. Until 2015, deep learning had evolved into the major framework of NLP.\n^ Yi, Chucai; Tian, Yingli (2012), \"Assistive Text Reading from Complex Background for Blind Persons\", Camera-Based Document Analysis and Recognition, Springer Berlin Heidelberg, pp.\u00a015\u201328, CiteSeerX\u00a010.1.1.668.869, doi:10.1007/978-3-642-29364-1_2, ISBN\u00a0\n9783642293634\n^ Kishorjit, N.; Vidya, Raj RK.; Nirmal, Y.; Sivaji, B. (2012). \"Manipuri Morpheme Identification\" (PDF). Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP). COLING 2012, Mumbai, December 2012: 95\u2013108.CS1 maint: location (link)\n^ Klein, Dan; Manning, Christopher D. (2002). \"Natural language grammar induction using a constituent-context model\" (PDF). Advances in Neural Information Processing Systems.\n^ PASCAL Recognizing Textual Entailment Challenge (RTE-7) https://tac.nist.gov//2011/RTE/\n^ \"U B U W E B\u00a0:: Racter\". www.ubu.com. Retrieved 2020-08-17.\n^ Writer, Beta (2019). Lithium-Ion Batteries. doi:10.1007/978-3-030-16800-1. ISBN\u00a0\n978-3-030-16799-8\n.\n^ Duan, Yucong; Cruz, Christophe (2011). \"Formalizing Semantic of Natural Language through Conceptualization from Existence\". International Journal of Innovation, Management and Technology. 2 (1): 37\u201342. Archived from the original on 2011-10-09.\n^ Mittal (2011). \"Versatile question answering systems: seeing in synthesis\" (PDF). International Journal of Intelligent Information and Database Systems. 5 (2): 119\u2013142. doi:10.1504/IJIIDS.2011.038968.\n^ \"Cognition\". Lexico. Oxford University Press and Dictionary.com. Retrieved 6 May 2020.\n^ \"Ask the Cognitive Scientist\". American Federation of Teachers. 8 August 2014. Cognitive science is an interdisciplinary field of researchers from Linguistics, psychology, neuroscience, philosophy, computer science, and anthropology that seek to understand the mind.\n^ Robinson, Peter (2008). Handbook of Cognitive Linguistics and Second Language Acquisition. Routledge. pp.\u00a03\u20138. ISBN\u00a0\n978-0-805-85352-0\n.\n^ Lakoff, George (1999). Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Philosophy; Appendix: The Neural Theory of Language Paradigm. New York Basic Books. pp.\u00a0569\u2013583. ISBN\u00a0\n978-0-465-05674-3\n.\n^ Strauss, Claudia (1999). A Cognitive Theory of Cultural Meaning. Cambridge University Press. pp.\u00a0156\u2013164. ISBN\u00a0\n978-0-521-59541-4\n.\nFurther readingEdit\nBates, M (1995). \"Models of natural language understanding\". Proceedings of the National Academy of Sciences of the United States of America. 92 (22): 9977\u20139982. Bibcode:1995PNAS...92.9977B. doi:10.1073/pnas.92.22.9977. PMC\u00a040721. PMID\u00a07479812.\nSteven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O'Reilly Media. ISBN\u00a0978-0-596-51649-9.\nDaniel Jurafsky and James H. Martin (2008). Speech and Language Processing, 2nd edition. Pearson Prentice Hall. ISBN\u00a0978-0-13-187321-6.\nMohamed Zakaria Kurdi (2016). Natural Language Processing and Computational Linguistics: speech, morphology, and syntax, Volume 1. ISTE-Wiley. ISBN\u00a0978-1848218482.\nMohamed Zakaria Kurdi (2017). Natural Language Processing and Computational Linguistics: semantics, discourse, and applications, Volume 2. ISTE-Wiley. ISBN\u00a0978-1848219212.\nChristopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\u00fctze (2008). Introduction to Information Retrieval. Cambridge University Press. ISBN\u00a0978-0-521-86571-5. Official html and pdf versions available without charge.\nChristopher D. Manning and Hinrich Sch\u00fctze (1999). Foundations of Statistical Natural Language Processing. The MIT Press. ISBN\u00a0978-0-262-13360-9.\nDavid M. W. Powers and Christopher C. R. Turk (1989). Machine Learning of Natural Language. Springer-Verlag. ISBN\u00a0978-0-387-19557-5.\nWikimedia Commons has media related to Natural language processing.\n\nRetrieved from \"https://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=983013403\"\nLast edited on 11 October 2020, at 18:45\nContent is available under CC BY-SA 3.0 unless otherwise noted.\n This page was last edited on 11 October 2020, at 18:45\u00a0(UTC).\nText is available under the Creative Commons Attribution-ShareAlike License;\nadditional terms may apply.  By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia\u00ae is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.\nPrivacy policy\nAbout Wikipedia\nDisclaimers\nContact Wikipedia\nTerms of Use\nDesktop\nDevelopers\nStatistics\nCookie statement\n \n", "features": {}, "offset_type": "j", "name": ""}
    </script>
    <script type="text/javascript">
        gatenlp_run("ELGGRWQFXQ-");
    </script>
  </div>

</div></div>



The markup present in the original HTML file is converted into annotations in the annotation set with the name "Original markups". For example all the HTML links are present as annotations of type "a" (there are 449 of those), the level 3 headings are present as annotations of type "h3" and so on. 



## Loading and saving using various document formats

GateNlp documents can be loaded from a number of different text representations. When you run
`Document.load(filepath)`, gatenlp tries to automatically determine the format of the document from the file 
extensions, but if that fails, it is possible to explicitly specify the format using the `fmt=` keyword argument 
which can take a memnonic or a mime type specification for the format. 

The following formats are known, the list shows first the memnonic, if one exists, then the mime type, and then 
the description of the format. All the following formats can be loaded and saved:  

* `text`, `text/plain`:  Plain text, extension `.txt`, 
  by default this is expected to be encoded in "UTF-8" but a different encoding
  can be specified using the `encoding=` keyword argument. 
* `text/plain+gzip`: Gzip compressed plain text, same as `text` but gzip compressed.
* `bdocjs`, `json`, `text/bdocjs`: BDOC Json Format, extension `.bdocjs`,  which can be exchanged with Java GATE via the 
  format BDOC plugin (https://gatenlp.github.io/gateplugin-Format_Bdoc/)
* `bdocjsgz`, `jsongz`, `text/bdocjs+gzip`: BDOC Json Format, GZip compressed, extension `.bdocjs.gz`
* `yaml`, `text/bdocym`: BDOC Yaml Format, extension `.bdocym`,  which can be exchanged with Java GATE 
   via the format BDOC plugin. This format allows for serialization of shared nested arrays/maps and exchange
   of these between Java GATE and Python GateNLP. 
* `yamlgz`, `text/bdocym+gzip`: BDOC Yaml Format, GZip compressed, extesion, `.bdocym.gz`
* `msgpack`, `application/msgpack`: BDOC Message Pack format, extension `.bdocmp`. Can be exchanged with Java
   GATE via the format BDOC plugin
   
The following formats can only be loaded:

* `html`, `text/html`: HTML files can be loaded and will be parsed to obtain the text and to create annotations
   that correspond to the HTML markup (these annotations are in annotation set "Original markups"). Note that
   not all HTML can be parsed without problems and this will NOT load the *rendered* form of the HTML page, i.e.
   anything created or influenced by JavaScript code on the page is not loaded. 
* `gatexml`: Java GATE XML format, extension `.xml` can be loaded, but Java-specific data is not supported.
   If e.g. features have Java lists or arrays or similar as a value, the load will fail unless the keyword 
   argument `ignore_unknown_types=True` is specified. 

The following formats can only be  saved:
* `html-ann-viewer`: This creates a HTML file which can be used to visualize the document. The following 
  keyword arguments can be used: `notebook=True` to create a div instead of a complete html document, 
  `offline=True` to include all Javascript code necessary for visualization in the document instead of loading
  it from the internet, `htmlid="somename"` to make all HTML, CSS and Javascript definitions for the generated
  HTML code unique, so that several different pieces of HTML code can be embedded in the same page. 
  
Documents can also be saved and loaded using Python pickle.

Documents can also be convert to and from a Python-only representation using the methods `doc.to_dict()` and `Document.from_dict(thedict)` which can be used to serialize or transfer the document in many other formats. 


```python
# Convert the document to a dictionary representation:
as_dict = doc.to_dict()
as_dict
```




    {'annotation_sets': {'Set1': {'name': 'Set1',
       'annotations': [{'type': 'Word',
         'start': 0,
         'end': 4,
         'id': 0,
         'features': {'what': 'our first annotation'}},
        {'type': 'Word',
         'start': 5,
         'end': 7,
         'id': 1,
         'features': {'what': 'our second annotation'}},
        {'type': 'Sentence',
         'start': 0,
         'end': 24,
         'id': 2,
         'features': {'what': 'our first sentence annotation'}}],
       'next_annid': 3}},
     'text': 'This is a test document.\n\nIt contains just a few sentences. \nHere is a sentence that mentions a few named entities like \nthe persons Barack Obama or Ursula von der Leyen, locations\nlike New York City, Vienna or Beijing or companies like \nGoogle, UniCredit or Huawei. \n\nHere we include a URL https://gatenlp.github.io/python-gatenlp/ \nand a fake email address john.doe@hiscoolserver.com as well \nas #some #cool #hastags and a bunch of emojis like 😽 (a kissing cat),\n👩\u200d🏫 (a woman teacher), 🧬 (DNA), \n🧗 (a person climbing), \n💩 (a pile of poo). \n\nHere we test a few different scripts, e.g. Hangul 한글 or \nsimplified Hanzi 汉字 or Farsi فارسی which goes from right to left. \n\n\n',
     'features': {'loaded-from': 'https://gatenlp.github.io/python-gatenlp/testdocument1.txt',
      'purpose': 'test document for gatenlp',
      'someotherfeature': 22,
      'andanother': {'what': 'a dict', 'alist': [1, 2, 3, 4, 5]}},
     'offset_type': 'p',
     'name': ''}




```python
# create a copy by creating a new Document from the dictionary representation
doc_copy = Document.from_dict(as_dict)
doc_copy
```




<div><style>#WTNWKPGHBI-wrapper { color: black !important; }</style>
<div id="WTNWKPGHBI-wrapper">

<div>
<style>
#WTNWKPGHBI-content {
    width: 100%;
    height: 100%;
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
}

.WTNWKPGHBI-row {
    width: 100%;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}

.WTNWKPGHBI-col {
    border: 1px solid grey;
    display: inline-block;
    min-width: 200px;
    padding: 5px;
    /* white-space: normal; */
    /* white-space: pre-wrap; */
    overflow-y: auto;
}

.WTNWKPGHBI-hdr {
    font-size: 1.2rem;
    font-weight: bold;
}

.WTNWKPGHBI-label {
    margin-bottom: -15px;
    display: block;
}

.WTNWKPGHBI-input {
    vertical-align: middle;
    position: relative;
    *overflow: hidden;
}

#WTNWKPGHBI-popup {
    display: none;
    color: black;
    position: absolute;
    margin-top: 10%;
    margin-left: 10%;
    background: #aaaaaa;
    width: 60%;
    height: 60%;
    z-index: 50;
    padding: 25px 25px 25px;
    border: 1px solid black;
    overflow: auto;
}

.WTNWKPGHBI-selection {
    margin-bottom: 5px;
}

.WTNWKPGHBI-featuretable {
    margin-top: 10px;
}

.WTNWKPGHBI-fname {
    text-align: left !important;
    font-weight: bold;
    margin-right: 10px;
}
.WTNWKPGHBI-fvalue {
    text-align: left !important;
}
</style>
  <div id="WTNWKPGHBI-content">
        <div id="WTNWKPGHBI-popup" style="display: none;">
        </div>
        <div class="WTNWKPGHBI-row" id="WTNWKPGHBI-row1" style="height:67vh; min-height:100px;">
            <div id="WTNWKPGHBI-text-wrapper" class="WTNWKPGHBI-col" style="width:70%;">
                <div class="WTNWKPGHBI-hdr" id="WTNWKPGHBI-dochdr"></div>
                <div id="WTNWKPGHBI-text">
                </div>
            </div>
            <div id="WTNWKPGHBI-chooser" class="WTNWKPGHBI-col" style="width:30%; border-left-width: 0px;"></div>
        </div>
        <div class="WTNWKPGHBI-row" id="WTNWKPGHBI-row2" style="height:30vh; min-height: 100px;">
            <div id="WTNWKPGHBI-details" class="WTNWKPGHBI-col" style="width:100%; border-top-width: 0px;">
            </div>
        </div>
    </div>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script><script src="https://unpkg.com/gatenlp-ann-viewer@1.0.9/gatenlp-ann-viewer.js"></script>
    <script type="application/json" id="WTNWKPGHBI-data">
    {"annotation_sets": {"Set1": {"name": "detached-from:Set1", "annotations": [{"type": "Word", "start": 0, "end": 4, "id": 0, "features": {"what": "our first annotation"}}, {"type": "Word", "start": 5, "end": 7, "id": 1, "features": {"what": "our second annotation"}}, {"type": "Sentence", "start": 0, "end": 24, "id": 2, "features": {"what": "our first sentence annotation"}}], "next_annid": 3}}, "text": "This is a test document.\n\nIt contains just a few sentences. \nHere is a sentence that mentions a few named entities like \nthe persons Barack Obama or Ursula von der Leyen, locations\nlike New York City, Vienna or Beijing or companies like \nGoogle, UniCredit or Huawei. \n\nHere we include a URL https://gatenlp.github.io/python-gatenlp/ \nand a fake email address john.doe@hiscoolserver.com as well \nas #some #cool #hastags and a bunch of emojis like \ud83d\ude3d (a kissing cat),\n\ud83d\udc69\u200d\ud83c\udfeb (a woman teacher), \ud83e\uddec (DNA), \n\ud83e\uddd7 (a person climbing), \n\ud83d\udca9 (a pile of poo). \n\nHere we test a few different scripts, e.g. Hangul \ud55c\uae00 or \nsimplified Hanzi \u6c49\u5b57 or Farsi \u0641\u0627\u0631\u0633\u06cc which goes from right to left. \n\n\n", "features": {"loaded-from": "https://gatenlp.github.io/python-gatenlp/testdocument1.txt", "purpose": "test document for gatenlp", "someotherfeature": 22, "andanother": {"what": "a dict", "alist": [1, 2, 3, 4, 5]}}, "offset_type": "j", "name": ""}
    </script>
    <script type="text/javascript">
        gatenlp_run("WTNWKPGHBI-");
    </script>
  </div>

</div></div>




```python
# Save the document in bdocjs format
doc.save("tmpdoc.bdocjs") 

# show what the document looks like 
with open("tmpdoc.bdocjs", "rt") as infp:
    print(infp.read())
```

    {"annotation_sets": {"Set1": {"name": "Set1", "annotations": [{"type": "Word", "start": 0, "end": 4, "id": 0, "features": {"what": "our first annotation"}}, {"type": "Word", "start": 5, "end": 7, "id": 1, "features": {"what": "our second annotation"}}, {"type": "Sentence", "start": 0, "end": 24, "id": 2, "features": {"what": "our first sentence annotation"}}], "next_annid": 3}}, "text": "This is a test document.\n\nIt contains just a few sentences. \nHere is a sentence that mentions a few named entities like \nthe persons Barack Obama or Ursula von der Leyen, locations\nlike New York City, Vienna or Beijing or companies like \nGoogle, UniCredit or Huawei. \n\nHere we include a URL https://gatenlp.github.io/python-gatenlp/ \nand a fake email address john.doe@hiscoolserver.com as well \nas #some #cool #hastags and a bunch of emojis like \ud83d\ude3d (a kissing cat),\n\ud83d\udc69\u200d\ud83c\udfeb (a woman teacher), \ud83e\uddec (DNA), \n\ud83e\uddd7 (a person climbing), \n\ud83d\udca9 (a pile of poo). \n\nHere we test a few different scripts, e.g. Hangul \ud55c\uae00 or \nsimplified Hanzi \u6c49\u5b57 or Farsi \u0641\u0627\u0631\u0633\u06cc which goes from right to left. \n\n\n", "features": {"loaded-from": "https://gatenlp.github.io/python-gatenlp/testdocument1.txt", "purpose": "test document for gatenlp", "someotherfeature": 22, "andanother": {"what": "a dict", "alist": [1, 2, 3, 4, 5]}}, "offset_type": "p", "name": ""}



```python
# load the document from the saved bdocjs format file
Document.load("tmpdoc.bdocjs")
```

    RUNNING load with from_ext= tmpdoc.bdocjs  from_mem= None
    DEBUG: not a URL !!!





<div><style>#IGYGCIMSON-wrapper { color: black !important; }</style>
<div id="IGYGCIMSON-wrapper">

<div>
<style>
#IGYGCIMSON-content {
    width: 100%;
    height: 100%;
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
}

.IGYGCIMSON-row {
    width: 100%;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}

.IGYGCIMSON-col {
    border: 1px solid grey;
    display: inline-block;
    min-width: 200px;
    padding: 5px;
    /* white-space: normal; */
    /* white-space: pre-wrap; */
    overflow-y: auto;
}

.IGYGCIMSON-hdr {
    font-size: 1.2rem;
    font-weight: bold;
}

.IGYGCIMSON-label {
    margin-bottom: -15px;
    display: block;
}

.IGYGCIMSON-input {
    vertical-align: middle;
    position: relative;
    *overflow: hidden;
}

#IGYGCIMSON-popup {
    display: none;
    color: black;
    position: absolute;
    margin-top: 10%;
    margin-left: 10%;
    background: #aaaaaa;
    width: 60%;
    height: 60%;
    z-index: 50;
    padding: 25px 25px 25px;
    border: 1px solid black;
    overflow: auto;
}

.IGYGCIMSON-selection {
    margin-bottom: 5px;
}

.IGYGCIMSON-featuretable {
    margin-top: 10px;
}

.IGYGCIMSON-fname {
    text-align: left !important;
    font-weight: bold;
    margin-right: 10px;
}
.IGYGCIMSON-fvalue {
    text-align: left !important;
}
</style>
  <div id="IGYGCIMSON-content">
        <div id="IGYGCIMSON-popup" style="display: none;">
        </div>
        <div class="IGYGCIMSON-row" id="IGYGCIMSON-row1" style="height:67vh; min-height:100px;">
            <div id="IGYGCIMSON-text-wrapper" class="IGYGCIMSON-col" style="width:70%;">
                <div class="IGYGCIMSON-hdr" id="IGYGCIMSON-dochdr"></div>
                <div id="IGYGCIMSON-text">
                </div>
            </div>
            <div id="IGYGCIMSON-chooser" class="IGYGCIMSON-col" style="width:30%; border-left-width: 0px;"></div>
        </div>
        <div class="IGYGCIMSON-row" id="IGYGCIMSON-row2" style="height:30vh; min-height: 100px;">
            <div id="IGYGCIMSON-details" class="IGYGCIMSON-col" style="width:100%; border-top-width: 0px;">
            </div>
        </div>
    </div>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script><script src="https://unpkg.com/gatenlp-ann-viewer@1.0.9/gatenlp-ann-viewer.js"></script>
    <script type="application/json" id="IGYGCIMSON-data">
    {"annotation_sets": {"Set1": {"name": "detached-from:Set1", "annotations": [{"type": "Word", "start": 0, "end": 4, "id": 0, "features": {"what": "our first annotation"}}, {"type": "Word", "start": 5, "end": 7, "id": 1, "features": {"what": "our second annotation"}}, {"type": "Sentence", "start": 0, "end": 24, "id": 2, "features": {"what": "our first sentence annotation"}}], "next_annid": 3}}, "text": "This is a test document.\n\nIt contains just a few sentences. \nHere is a sentence that mentions a few named entities like \nthe persons Barack Obama or Ursula von der Leyen, locations\nlike New York City, Vienna or Beijing or companies like \nGoogle, UniCredit or Huawei. \n\nHere we include a URL https://gatenlp.github.io/python-gatenlp/ \nand a fake email address john.doe@hiscoolserver.com as well \nas #some #cool #hastags and a bunch of emojis like \ud83d\ude3d (a kissing cat),\n\ud83d\udc69\u200d\ud83c\udfeb (a woman teacher), \ud83e\uddec (DNA), \n\ud83e\uddd7 (a person climbing), \n\ud83d\udca9 (a pile of poo). \n\nHere we test a few different scripts, e.g. Hangul \ud55c\uae00 or \nsimplified Hanzi \u6c49\u5b57 or Farsi \u0641\u0627\u0631\u0633\u06cc which goes from right to left. \n\n\n", "features": {"loaded-from": "https://gatenlp.github.io/python-gatenlp/testdocument1.txt", "purpose": "test document for gatenlp", "someotherfeature": 22, "andanother": {"what": "a dict", "alist": [1, 2, 3, 4, 5]}}, "offset_type": "j", "name": ""}
    </script>
    <script type="text/javascript">
        gatenlp_run("IGYGCIMSON-");
    </script>
  </div>

</div></div>




```python
# clean up the document
import os
os.remove("tmpdoc.bdocjs")
```


```python

```
