Evaluation of the detection and correction properties of the reference dictionary of the system for checking and correcting orthography
DOI:
https://doi.org/10.20535/SRIT.2308-8893.2019.2.05Keywords:
typing errors, spell checking, spelling dictionary, detecting properties, correcting propertiesAbstract
The models for evaluating the properties of the reference orthographic dictionary (ROD) of the spelling check and correction system are considered. RODs’ detecting properties are determined by the probability of not detecting the typical error and the probability of a false error notification. The task is formulated to optimize a ROD according to Pareto, a step by step algorithm is proposed for solving it, the results of the experimental evaluation of the algorithm’s effectiveness are given. RODs’ correcting properties are determined by the probabilities of the correct and erroneous correction of the typical errors. Models of their estimation are offered and simulation results are given for the selected dictionaries. It has been shown that ROD optimized for detecting properties also has better correcting properties. In general, the obtained results can be used as the basis for a tool for the comparative assessment, selection and improvement of the potential properties of a specific ROD for a given subject matter.References
Nechetkij poisk v tekste i slovare [Digital source]. — Available at: https://habrahabr.ru/post/114997/.
Rasstojanie Levenshtejna v MySQL i algoritmy nechetkogo poiska sredstvami PHP [Digital source]. — Available at: https://habrahabr.ru/post/342434/.
Foneticheskie algoritmy [Digital source]. — Available at: https://habrahabr.ru/post/114947/.
Phonetic Algorithms [Digital source]. — Available at: https://deparkes.co.uk/2017/12/01/phonetic-algorithms/.
Hodge V.J. A comparison of standard spell checking algorithms and a novel binary neural approach / V.J. Hodge, J. Austin // IEEE Transactions on Knowledge and Data Engineering. — 2003. — P. 1073–1081.
de Amorim R.C. Effective Spell Checking Methods Using Clustering Algorithms [Digital source] / R.C. de Amorim, M. Zampieri. — Available at: http://www.aclweb.org/anthology/R13-1023.
Litvinov V.A. Otsenka kontrolirujuschih svojstv bazovogo slovarja dopustimyh slov v sisteme avtomaticheskogo obnaruzhenija oshibok pol'zovatelja / V.A. Litvinov, S.Ja. Majstrenko, K.V. Hurtsilava // Matematychni mashyny i systemy. — 2014. — № 2. — S. 65–70.
Slovari russkogo jazyka [Digital source]. — Available at: http://speakrus.ru/dict.
Slovar' Lopatina [Digital source]. — Available at: http://royallib. ru/book/ lopatin_vladimir/russkiy_orfograficheskiy_slovar.html.
Slovari russkogo jazyka [Digital source]. — Available at: http://speakrus.ru/dict.
Litvinov V.A. Kontrol' dostovernosti i vosstanovlenija informatsii v cheloveko-mashinnyh sistemah / V.A. Lytvynov, V.V. Kramarenko. — K.: Tekhnika, 1986. — 200 s.
Litvinov V.A. Disfunktsija referentnogo slovarja sistemy proverki orfografii i podhod k ee snizheniju / V.A. Litvinov, S.Ja. Majstrenko, K.V. Hurtsilava // Matematychni mashyny i systemy. — 2017. — № 2. — S. 39–48.
Knapsack problem [Digital source]. — Available at: http://en.wikipedia.org/wiki/Knapsack_problem.
Zadacha o rjukzake: zhadnyj algoritm [Digital source]. — Available at: http://traditioru.org/wiki/Задача_о_рюкзаке: жадный_алгоритм.