Python opening UTF-16 file to read every byte
I am trying to parse a file which I believe is UTF-16 encoded (file magic is 0xFEFF) and I can open the file the way I want with
f = open(file, 'rb')
But when, for example, I do
print f.read(40)
it prints the actual unicode lines of the file where I would like to access the hex data and read it byte by byte. This might be a silly question, but I haven't been able to figure out how to do it.
Also, as a follow-up question. Once I get this working, I would like to parse the file looking for a specific set of bytes, in this case:
0x00 00 00 43 00 00 00
And after this pattern is found, start parsing the entry. What's the best way to do this? I was thinking about using a generator to loop through each byte, and once this pattern comes up, output the bytes up to the next instance of that pattern? Is there a more efficient way to do this?
EDIT: I am using Python 2.7
source to share
If you need a string of hexadecimal code, you can pass it through binascii.hexlify()
:
with open(filename, 'rb') as f:
raw = f.read(40)
hexadecimal = binascii.hexlify(raw)
print(hexadecimal)
(This also works unchanged in Python 3)
If you want the numeric value of each byte, you can call ord()
on each element, or equivalently, a map()
function over the string:
with open(filename, 'rb') as f:
raw = f.read(40)
byte_list = map(ord, raw)
print byte_list
(This doesn't work on Python 3, but on 3.x you can just iterate over more raw
)
source to share