Identify the most important sentences of the text

I am working on a tool that will allow users to summarize selected text.

I want to do this by determining x the number of most important sentences of text (user-defined / calculated based on the length of the text) and then for each of these "main sentences" I want to accompany the sentence x the number of most related / similar sentences in that main sentence. Thus, I hope to cover several important pieces of text with several lines, rather than one large part (topic) of the text. I know that not every text will have multiple items, enough to have multiple core sentences, the number of key sentences and related sentences will depend on the text itself.

To identify these important sentences, I am currently building on the example of this guide , which uses cross-sentence intersection scores to rank each sentence of text. So far, this has led to decent results, but sometimes the results have not been as good.

Therefore, I am looking for other methods to extract the most important sentences. After a little search , Levenshtein distance appeared several times to compare strings.

Can I use Levenshtein distance to compute the LD between each sentence and add the total LD ​​for each sentence, returning x the number of sentences with the lowest aggregated Levenshtein distance number - would this result in a representative ranking of the most important sentences of the text?

If not, should the intersection method be followed or should I consider an alternative?

I am also considering using tf-idf for the sentence preprocess to keep only the valuable words in the text sentences.

+3


source to share





All Articles