1.2 model changes the way punctuation is tokenized, thus creating errors for address and phone parsing.
- Delete the data folder with Stanza models.
- Update Stanza to 1.2.*.
- Run
pytest -v tests
inside the repo, models will take a bit to be re-downloaded.
Some examples:
> AssertionError: assert 'ул. Матросская Тишина , д. 14А' in ['ул. Матросская Тишина ,д.14А']
> AssertionError: assert 'ул. Курчатова 10а' in []
> AssertionError: assert 'vana-lõuna 39' in []
> AssertionError: assert [] == ['74956456601']
> ['4310373'] != ['89104310373']