Find and print quoted text from a text file using python

I am a python beginner and want python to grab all quoted text from a text file. I've tried the following:

filename = raw_input("Enter the full path of the file to be used: ")
input = open(filename, 'r')
import re
quotes = re.findall(ur'"[\^u201d]*["\u201d]', input)
print quotes

      

I am getting the error:

Traceback (most recent call last):
  File "/Users/nithin/Documents/Python/Capture Quotes", line 5, in <module>
    quotes = re.findall(ur'"[\^u201d]*["\u201d]', input)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

      

Can anyone help me?

+3


source to share


2 answers


As Bakuriu pointed out, you need to add .read()

like this:

quotes = re.findall(ur'[^\u201d]*[\u201d]', input.read())

      

open()

just returns a file object and f.read()

returns a string. Also, I'm assuming that you want to get everything between two quotes instead of zero or more occurrences [\^u201d]

before the quote. So I would try this:

quotes = re.findall(ur'[\u201d][^\u201d]*[\u201d]', input.read(), re.U)

      

re.U

takes unicode into account. Or (if you don't have two sets of correct double quotes and don't need unicode):



quotes = re.findall(r'"[^"]*"', input.read(), re.U)

      

Finally, you can choose a different variable than input

, since input

is a keyword in python.

Your result might look something like this:

>>> input2 = """
cfrhubecf "ehukl wehunkl echnk
wehukb ewni; wejio;"
"werulih"
"""
>>> quotes = re.findall(r'"[^"]*"', input2, re.U)
>>> print quotes
['"ehukl wehunkl echnk\nwehukb ewni; wejio;"', '"werulih"']

      

+2


source


Instead of using regular expressions, you can try some of the built-in python. I let you do the hard work:

message = '''
"some text in quotes", some text not in quotes. Some more text 'In different kinds of quotes'.
'''
list_of_single_quote_items = message.split("'")
list_of_double_quote_items = message.split(""")

      



The tricky part will be to interpret what your separating list means and deal with all the edge conditions (just one quote per line, escape sequences, etc.)

0


source







All Articles