Pyserial formatting - bytes over 127 are returned as 2 bytes, not one

I have a program running on my Arduino that takes serial input and stores it in a variable. Works in charm. With Arduino apps embedded in the serial monitor, I have successfully sent and received bytes between 0-255.

Using pyserial to send any byte above 127 (or 0b01111111

) pyserial

returns 2 - the value for values ​​above 127, say 0b10000000

2 bytes will be sent, not one.

I believe my problem is related to pyserial

.

ser.write(chr(int('01000000', base=2)).encode('utf-8'))

      

works fine and is correctly adopted on Arduino.

ser.write(chr(int('10000000', base=2)).encode('utf-8')) 

      

returns 2, however - and shows how to Arduino 0b11000010

and 0b10000000

.

+3


source to share


1 answer


As NPE says, this is the encoding for UTF-8 - a byte between 128 and 2047 (8-11 bits) inclusive is converted to two bytes: if the original 11 bits are abcdefghijk, then the utf-8 version is 110abcde 10fghijk. In your example (with padding left 0s to make 11 bits), 00010000000 would be converted to 11000010 10000000 or \ xc2 \ x80, which is exactly what you see. See the Wikipedia article on UTF-8 for more

You can see it in python with this code (I'm replacing int ('10000000', base = 2) with 128):

>>> unichr(128).encode('utf-8')
'\xc2\x80' 

      

What confuses me is that you can use chr (int ('10000000', base = 2)). encode ('utf-8') or equivalent chr (128) .encode ('utf-8) ". When I do this, I get:



>>> chr(int('10000000', base=2)).encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

      

Have you changed the default encoding?

You need an encoding that uses one byte for 0 - 255 and is unicode. So try using "latin_1" instead:

>>> unichr(128).encode('latin_1')
'\x80'

      

+2


source







All Articles