Summing up based on keywords

I am wondering if there are any auto-generalization algorithms out there that handle extraction based on a custom dictionary. I have been using textrank based algorithms for a while, but I want to influence the ranking of phrases that the algorithm calculates.

Example

"Thomas A. Anderson is a man living two lives. By day he is an average computer programmer, and by night the hacker is known as Neo. Neo has always questioned his reality, but the truth is far beyond his imagination. Neo is attacked by the police when contacted by Morpheus, the legendary computer hacker, named the terrorist the government Morpheus awakens Neo in the real world, a devastated desert where most of humanity has been invaded by a race of machines that live on human body heat and electrochemical energy and that enclose the mind within an artificial reality as a matrix. The unfocused against the machines Neo must return to the Matrix. He must confront the agents: super powerful computer programs dedicated to repelling Neo and all human rebellion. "

My custom dictionary will look something like this:

super-powerful: [important]
Thomas A. Anderson: [important]

      

My resume should contain the following sentences, even if they are ranked lower than some of the other sentences in the paragraph:

  • "Thomas A. Anderson - A Man Who Lives Two Lives".
  • "He must confront the agents: super-powerful computer programs dedicated to the destruction of Neo and all human rebellion."

I tried to achieve this by adding additional tags to my POS-placed sentences, it looks like this:

[[('Thomas A. Anderson', 'Thomas A. Anderson', ['important']), ('is', 'is', ['VBZ']), ('a', 'a', ['DT']), ('man', 'man', ['NN']), ('living', 'living', ['VBG']), ('two', 'two', ['CD']), ('lives', 'lives', ['NNS'])]]

[[('He', 'He', ['PRP']), ('must', 'must', ['MD']), ('confront', 'confront', ['VB']), ('the', 'the', ['DT']), ('agents', 'agents', ['NNS']), (':', ':', [':']), ('super-powerful', 'super-powerful', ['important', 'JJ']), ('computer', 'computer', ['NN']), ('programs', 'programs', ['NNS']), ('devoted', 'devoted', ['VBD']), ('to', 'to', ['TO']), ('snuffing', 'snuffing', ['VBG']), ('out', 'out', ['RP']), ('Neo', 'Neo', ['NNP']), ('and', 'and', ['CC']), ('the', 'the', ['DT']), ('entire', 'entire', ['JJ']), ('human', 'human', ['JJ']), ('rebellion', 'rebellion', ['NN']), ('.', '.', ['.'])]]

      

But I really don't know how I can tell that the textrank algorithm gives priority on sentences with these tags. I have used Python with nltk and yaml to achieve this output

Help would be greatly appreciated!

+3


source to share





All Articles