If the passed document contains language in field "{doc_path}_mlp.language.detected", use this language as detected_language
instead of applying langdetect again.
1.21.0
Add support for forcing a specific language used for analysing the document with Stanza analyzers.
1.21.0
Current behaviour:
Language detected with langdetect is only returned, IF it is present in supported langs (function process_docs
).
Where it goes wrong:
It defaults to default_lang here, if it isn't present in the supported langs: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L351
The result of step one is then passed to function generate_document
: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L383
As the language was passed in step 2, there is no need to detect the language again: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L185
Finally, the result of step 1 is returned as dominant_language in here: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L208
Expected behaviour:
Language detected with langdetect is ALWAYS returned (function process_docs
).
Removing rows https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L351 and https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L352 should fix this issue, unless it will break something else...
1.21.0
Marit Asula (1b7bfd72) at 29 Nov 11:16
Merge branch 'manual_set_language' into 'master'
Marit Asula (a4550f86) at 29 Nov 09:24
Merge branch 'manual_set_language' into 'master'
... and 6 more commits
Changelog: update
Marit Asula (01e4e7d5) at 29 Nov 09:23
update VERSION to 1.21.0
Changelog: update
Marit Asula (937b237f) at 28 Nov 13:28
pass lang to analysis_lang and use None for detected lang in mlp.pr...
If the passed document contains language in field "{doc_path}_mlp.language.detected", use this language as detected_language
instead of applying langdetect again.
Add support for forcing a specific language used for analysing the document with Stanza analyzers.
Marit Asula (b1878890) at 28 Nov 12:36
fix tests
Marit Asula (efb6bb53) at 28 Nov 11:59
add support for using detected language retrieved from the document...
Raul Sirel (0a2d21a2) at 15 Nov 09:23
add lithuanian support
Raul Sirel (628f7b6e) at 15 Nov 09:22
add lithuanian support
Marit Asula (ca3475c7) at 06 Nov 15:11
Added references to Latvian NER model