This study focuses on the approach of identifying semantic relationships from unstructured textual documents related to river water pollution from websites and proposes a lexical pattern technique to acquire the instances. This study has identified 10 types of concepts (entities), 10 object properties (or semantic relations) and twenty lexico-syntactic patterns have been identified manually, including one from the Hearst hyponym rules. The lexical patterns have linked 45 terms that have the potential as instances. Based on this study, it is believed that determining the lexical pattern at an early stage is helpful in selecting relevant term from a wide collection of terms from the corpus. However, the relations and lexico-syntactic patterns or rules have to be verified by domain expert before employing the rules to the wider collection in an attempt to find more possible rules. This study shows that background knowledge about the domain is essential to develop the TBox ontology diagram that serve as backbone of the domain ontology. This diagram is essential as guideline in discovering lexico-syntactic patterns therefore expedite the knowledge extraction process.
Copyright: © 2018 The Author(s)
Published by Human Resource Management Academic Research Society (www.hrmars.com)
This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at: http://creativecommons.org/licences/by/4.0/legalcode