MLP also supports entity extraction using lists of predefined entities. These lists come with MLP:
* Companies (Estonian)
* Addresses (Estonian & Russian)
* Addresses (Estonian and Russian)
* Currencies (Estonian, Russian, and English)
## Custom List-based Entities
MLP also supports defining custom entity lists. Custom lists must be placed in the **entity_mapper** directory residing in **data** directory.
Entities are defined as JSON files:
```
{
"MY_ENTITY": [
"foo",
"bar"
]
}
```
## Usage
...
...
@@ -126,142 +147,3 @@ You can choose the parsers like so:
```
>>> mlp.process(analyzers=["lemmas", "phone_high_precision"], raw_text= "My phone number is 12 34 56 77.")
```
### Concatenate close entities
Let`s test MLP() and Concatenator() on the following three letters.
Letter 1:
```
Dear all,
Let`s not forget that I intend to concure the whole of Persian Empire!
Best wishes,
Alexander Great
aleksandersuur356eKr@mail.ee
phone: 76883266
```
Letter 2:
```
От: Terry Pratchett < tpratchett@gmail.com >
Кому: Joe Abercrombie < jabercrombie@gmail.com >
Название: Разъяснение
Дорогой Joe,
Как вы? Надеюсь, у тебя все хорошо. Последний месяц я писал свой новый роман,
который обещал представить в начале лета. Я тоже немного почитал и обожаю твою
новую книгу!
Я просто хотел уточнить, что Alexander Great жил в Македонии.
Лучший,
Terry
```
Letter 3:
```
Dear Terry!
Terry Pratchett already created Discworld. This name is taken. Other than that I found
the piece fascanating and see great potential in you! I strongly encourage you to take
action in publishing your works. Btw, if you would like to show your works to Pratchett
as well, he`s interested. I talked about you to him. His email is tpratchett@gmail.com.
Feel free to write him!
Joe
From: Terry Berry < bigfan@gmail.com >
To: Joe Abercrombie < jabercrombie@gmail.com >
Title: Question
Hi Joe,
I finally finished my draft and I`m sending it to you. The hardest part
was creating new places. What do you think of the names of the places I created?
Terry Berry
```
Let`s read all those letters into a list called "mailbox". We will process the letters as discribed above and save them into a jsonlines file.
```
from texta_mlp.mlp import MLP
mlp = MLP(language_codes=["et","en","ru"])
processed_letters = []
for letter in mailbox:
processed_letters += [mlp.process(letter)]
import jsonlines
with jsonlines.open("letters.jsonl", mode="w") as writer:
writer.write_all(processed_letters)
```
MLP() already creates a fact BOUNDED which bounds the closest entities within the letter together. In order to sort out the info in whole mailbox we have to concatenate the BOUNDED facts. It means creating a database of personal info gotten from different letters. For that we use the Concatenator(), which input is processed letters.
We can also use Elasticsearch with Concatenator(). Here`s a snippet for getting from Elasticsearch and processing documents already processed by MLP() and then uploading them to a new index.