Python opening UTF-16 file to read every byte

I am trying to parse a file which I believe is UTF-16 encoded (file magic is 0xFEFF) and I can open the file the way I want with

 f = open(file, 'rb')

      

But when, for example, I do

print f.read(40)

      

it prints the actual unicode lines of the file where I would like to access the hex data and read it byte by byte. This might be a silly question, but I haven't been able to figure out how to do it.

Also, as a follow-up question. Once I get this working, I would like to parse the file looking for a specific set of bytes, in this case:

0x00 00 00 43 00 00 00

      

And after this pattern is found, start parsing the entry. What's the best way to do this? I was thinking about using a generator to loop through each byte, and once this pattern comes up, output the bytes up to the next instance of that pattern? Is there a more efficient way to do this?

EDIT: I am using Python 2.7

+3


source to share


2 answers


You can't just do this



string = 'string'
>>> hex(ord(string[1]))
'0x74'

hexString = ''
with open(filename) as f:
    while True:
    #char = f.read(1)
    chars = f.read(40)
    hexString += ''.join(hex(ord(char) for char in chars)
    if not chars:
       break

      

+1


source


If you need a string of hexadecimal code, you can pass it through binascii.hexlify()

:

with open(filename, 'rb') as f:
    raw = f.read(40)
    hexadecimal = binascii.hexlify(raw)
    print(hexadecimal)

      

(This also works unchanged in Python 3)



If you want the numeric value of each byte, you can call ord()

on each element, or equivalently, a map()

function over the string:

with open(filename, 'rb') as f:
    raw = f.read(40)
    byte_list = map(ord, raw)
    print byte_list

      

(This doesn't work on Python 3, but on 3.x you can just iterate over more raw

)

+1


source







All Articles