Summing up based on keywords

I am wondering if there are any auto-generalization algorithms out there that handle extraction based on a custom dictionary. I have been using textrank based algorithms for a while, but I want to influence the ranking of phrases that the algorithm calculates.

Example

"Thomas A. Anderson is a man living two lives. By day he is an average computer programmer, and by night the hacker is known as Neo. Neo has always questioned his reality, but the truth is far beyond his imagination. Neo is attacked by the police when contacted by Morpheus, the legendary computer hacker, named the terrorist the government Morpheus awakens Neo in the real world, a devastated desert where most of humanity has been invaded by a race of machines that live on human body heat and electrochemical energy and that enclose the mind within an artificial reality as a matrix. The unfocused against the machines Neo must return to the Matrix. He must confront the agents: super powerful computer programs dedicated to repelling Neo and all human rebellion. "

My custom dictionary will look something like this:

super-powerful: [important]
Thomas A. Anderson: [important]

      

My resume should contain the following sentences, even if they are ranked lower than some of the other sentences in the paragraph:

  • "Thomas A. Anderson - A Man Who Lives Two Lives".
  • "He must confront the agents: super-powerful computer programs dedicated to the destruction of Neo and all human rebellion."

I tried to achieve this by adding additional tags to my POS-placed sentences, it looks like this:

[[('Thomas A. Anderson', 'Thomas A. Anderson', ['important']), ('is', 'is', ['VBZ']), ('a', 'a', ['DT']), ('man', 'man', ['NN']), ('living', 'living', ['VBG']), ('two', 'two', ['CD']), ('lives', 'lives', ['NNS'])]]

[[('He', 'He', ['PRP']), ('must', 'must', ['MD']), ('confront', 'confront', ['VB']), ('the', 'the', ['DT']), ('agents', 'agents', ['NNS']), (':', ':', [':']), ('super-powerful', 'super-powerful', ['important', 'JJ']), ('computer', 'computer', ['NN']), ('programs', 'programs', ['NNS']), ('devoted', 'devoted', ['VBD']), ('to', 'to', ['TO']), ('snuffing', 'snuffing', ['VBG']), ('out', 'out', ['RP']), ('Neo', 'Neo', ['NNP']), ('and', 'and', ['CC']), ('the', 'the', ['DT']), ('entire', 'entire', ['JJ']), ('human', 'human', ['JJ']), ('rebellion', 'rebellion', ['NN']), ('.', '.', ['.'])]]

      

But I really don't know how I can tell that the textrank algorithm gives priority on sentences with these tags. I have used Python with nltk and yaml to achieve this output

Help would be greatly appreciated!

+3
python nltk summarization


source to share


No one has answered this question yet

See similar questions:

17
Interpretation of TF-IDF scores from documents

or similar:

9540
What does the yield keyword do?
1419
Select rows from DataFrame based on values ​​in column in pandas
695
How do I sort a list of objects based on an attribute of objects?
425
Which python keyword is used to use?
4
Python Summarizer Sumy
3
NLP Project for Summarizing Comments
2
Concept-based generalization of text (abstraction)
1
LexRank Summation Algorithm
0
Text Highlighting Summary: Placing a Weight Proposition in a Document
-five
Auto Summarize: Extract



All Articles
Loading...
X
Show
Funny
Dev
Pics