Jure Leskovec, Natasa Milic-Frayling, and Marko Grobelnik
We present a method for extracting sentences from an individual document to serve as a document summary or a pre-cursor to creating a generic document abstract. We apply syntactic analysis of the text that produces a logical form analysis for each sentence. We use subject–object–predicate (SOP) triples from individual sentences to create a semantic graph of the original document and the corresponding human extracted summary. Using the Support Vector Machines learning algorithm, we train a classifier to identify SOP triples from the document semantic graph that belong to the summary. The classifier is then used for automatic extraction of summaries from test documents. Our experiments with the DUC 2002 and CAST datasets show that including semantic properties and topological graph properties of logical triples yields statistically significant improvement of the micro-average F1 measure for both the extraction of SOP triples that correspond to the semantic structure of extracts and the extraction of summary sentences. Evaluation based on ROUGE shows similar results for the extracted summary sentences.