Read byte string from file in python3
The file content is similar to the following and the file encoding is utf-8:
cd232704-a46f-3d9d-97f6-67edb897d65f b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.'
Here is my code:
with open(file, 'r') as f_in:
for line in f_in:
tokens = line.split('\t')
print(tokens[1])
I want to get the right answer - "Gerda Scheyers will be excited this Friday, but she was most worried about what the film will bring."
print(b'\xe2\x80\x94'.decode('utf-8')) #convert into ASCII
But I cannot read the bytes from the file. If I open a file with bytes, I need to decode the string to split it.
+3
source to share
1 answer
You can use ast.literal_eval
to convert byte literals to bytes:
Then, we decode it to get a string object:
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'")
b'excited \xe2\x80\x94 but she\xe2\x80\x99s'
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8')
'excited — but she’s'
with open(file, 'r') as f_in:
for line in f_in:
tokens = line.split('\t')
# if len(tokens) < 2:
# continue
bytes_part = ast.literal_eval(tokens[1])
s = bytes_part.decode('utf-8') # Decode the bytes to convert to a string
+2
source to share