Result: Deterministic Regular Expressions in Linear Time

Title:
Deterministic Regular Expressions in Linear Time
Contributors:
Modeling Tree Structures, Machine Learning, and Information Extraction (MOSTRARE), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de l'Université de Lille, Institut National de Recherche en Informatique et en Automatique (Inria), National ICT Australia [Sydney] (NICTA), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)
Source:
PODS-31th ACM Symposium on Principles of Database Systems. :12-12
Publisher Information:
CCSD, 2012.
Publication Year:
2012
Collection:
collection:UNIV-LILLE3
collection:CNRS
collection:INRIA
collection:INRIA-LILLE
collection:LIFL
collection:INRIA_TEST
collection:TESTALAIN1
collection:INRIA2
collection:INRIA-AUSTRALIE
Subject Geographic:
Original Identifier:
HAL:
Document Type:
Conference conferenceObject<br />Conference papers
Language:
English
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.inria.00618451v1
Database:
HAL

Further Information

Deterministic regular expressions are widely used in XML processing. For instance, all regular expressions in DTDs and XML Schemas are required to be deterministic. In this paper we show that determinism of a regular expression e can be tested in linear time. The best known algorithms, based on the Glushkov automaton, require O(σ|e|) time, where σ is the number of distinct symbols in e. We fur- ther show that matching a word w against an expression e can be achieved in combined linear time O(|e| + |w|), for a wide range of deterministic regular expressions: (i) star-free (for multiple input words), (ii) bounded-occurrence, i.e., ex- pressions in which each symbol appears a bounded number of times, and (iii) bounded plus-depth, i.e., expressions in which the nesting depth of alternating plus (union) and con- catenation symbols is bounded. Our algorithms use a new structural decomposition of the parse tree of e. For match- ing arbitrary deterministic regular expressions we present an O(|e| + |w| log log |e|) time algorithm.