Abstract: Given the phenomenal abundance in quantity of news available over the electronic medium, extraction of consolidated information from raw data has become important. Extraction of relevant textual content from E-News portals has become increasingly challenging due to CMSs and the dynamic nature of web pages. In this paper, a technique that provides condensed news information based on user preferences is presented. Information retrieval from the articles is based on TF-IDF, augmented by a novel transitive closure algorithm that adds context to the search. The impact of techniques like article tagging and coreference resolution on the retrieval of semantically and contextually related content is studied. The proposed system improves search latency through the separation of text processing and information extraction processes. Once information is extracted, summarized content based on user queries from different source portals is presented. The system achieves high precision and recall for both generic and domain-specific user queries.
Keywords: Information retrieval, natural language processing, text summarisation, text mining
| DOI: 10.17148/IARJSET.2018.5610