A parallel search algorithm for formal grammar data types

Authors

DOI:

https://doi.org/10.20535/SRIT.2308-8893.2018.4.05

Keywords:

grammar, search, parallelism, concurrency, heuristics

Abstract

In this paper, we developed a concurrent generic heuristic algorithm for parallel parsing and searching in structured text datasets. The main objective of the algorithm was to increase an efficiency of central processing unit dependent operations when parsing large-scale datasets by using a parallel approach. The developed algorithm uses heuristics to find requested data without needing to process the whole file and without syntax tree building. It can be applied to any data formats. An increase in efficiency was discovered when input-output operations take significantly less time than the process of searching, the file is loaded into random access memory or when an efficient non-sequential access to file is possible. We also developed a prototype implementation of the algorithm for use in performance comparisons. The prototype supports searching in large-scale XML datasets using a subset of XPath expressions to specify search request. Our experimental results show that the developed algorithm is faster than classical algorithms, when all the requirements are met and the desired data is located closer to the beginning of the dataset. In worst cases, our algorithm gives nearly the same results as the others, but consumes more memory.

Author Biography

Anastasiia O. Prodan, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Kyiv

Anastasiia Olegivna Prodan,

a student of the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Kyiv, Ukraine.

References

Extensible Markup Language (XML) 1.0 (Third Edition). — Available at: http: //www.w3.org/TR/2004/REC-xml-20040204/. — 2004.

Clark J. XML Path Language (XPath) Version 1.0. / J. Clark, S. DeRose. — Available at: https://www.w3.org/TR/1999/REC-xpath-19991116/. — 1999.

Chang J.H. Parallel Parsing on a One-Way Array of Finite-State Machines / J.H. Chang, O.H. Ibarra, M.A. Palis. — 1987. — P. 64–75.

Veillard D. Libxm12 project web page / D. Veillard. — Available at: http:// xmlsoft.org/. — 2004.

Chiu K. A compiler-based approach to schema-specific xml parsing / K. Chiu, W. Lu. — Available at: https://www.researchgate.net/publication/228586122_A_compiler-based_approach_to_schema-specific_XML_parsing. — 2004.

Noga M.L. Lazy xml processin / M.L. Noga, S. Schott, W. Lowe. — 2002. — P. 4–7.

Rao V.N. Parallel depth first search. part 1. Implementation / V.N. Rao and V. Kumar. — 1987. — P. 15–21.

Downloads

Published

2018-12-18

Issue

Section

Progressive information technologies, high-efficiency computer systems