Text Mining Series 3: Automatic Summarization

The key technology we used in here, are page rank.

First of all, we split an article into sentences.
Second, we treat sentences as document, to do some data clean, tf-idf transformation, and document-term matrix, then multiply this dtm matrix and the transpose of dtm matrix, get the similar matrix between sentences.
Third, using above similar matrix as input, we take advantage of the function page.rank in igraph package to get the value of page rank as our text rank value. We sort the sentences by the value of page rank, then output the first five sentences as our summarization.

Welcome your advice and suggestion!

Just record, this article was posted at linkedin, and have 36 views to November 2021.