![]() |
|
||
Ontology Driven Information Extraction from Tables Using Connectivity AnalysisAshwin Bahulkar and Sreedhar Reddy Tata Consultancy Services Ltd, 54 B Hadapsar Industrial Estate Pune, Indiaashwin.bahulkar@tcs.com sreedhar.reddy@tcs.com Abstract. Table is one of the most common mechanisms used for presenting structured information on the web. A table presents information on a set of related concepts in a domain. A column typically represents a concept or an attribute of a concept that the column header identifies. A row contains corresponding instances and attribute values. However column headers are usually quite noisy and sometimes even missing. While a human reader can figure out the required domain mappings relatively easily by using domain knowledge and surrounding context, discovering them algorithmically poses challenges. In this paper we present an algorithm that exploits the idea that a table only presents information on connected entities of a domain ontology. The algorithm works in two phases. In the first phase it uses local optimization criteria such as lexical matching, instance matching, and so on to find an initial set of mappings. In the second phase it takes these mappings and constructs all possible connected sub graphs of the ontology that can be formed from these mappings. The largest of these sub graphs that has the highest local mapping score is then selected as the underlying domain mapping of the table. We present experimental results demonstrating the effectiveness of the algorithm. Keywords: Ontology, information extraction, web tables LNCS 8185, p. 642 ff. lncs@springer.com
|