Always return the language detected with langdetect in process_docs
Current behaviour:
Language detected with langdetect is only returned, IF it is present in supported langs (function process_docs
).
Where it goes wrong:
-
It defaults to default_lang here, if it isn't present in the supported langs: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L351
-
The result of step one is then passed to function
generate_document
: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L383 -
As the language was passed in step 2, there is no need to detect the language again: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L185
-
Finally, the result of step 1 is returned as dominant_language in here: https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L208
Expected behaviour:
Language detected with langdetect is ALWAYS returned (function process_docs
).
Removing rows https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L351 and https://git.texta.ee/texta/texta-mlp-python/-/blob/master/texta_mlp/mlp.py#L352 should fix this issue, unless it will break something else...