Bioformats-Python error: ascii codec cannot encode character u '\ xb5' when using OMEXML ()

Question

Bioformats-Python error: ascii codec cannot encode character u '\ xb5' when using OMEXML ()

I am trying to use bioformats in Python to read in a microscopic image (.lsm, .czi, .lif, you name it), print the metadata, and display the image. ome = bf.OMEXML(md)

gives me an error (below). I think he is talking about the information stored in md

. I don't like that the information in md

is not all ASCII. But how do I solve this problem? Here's what I wrote:

import Tkinter as Tk, tkFileDialog
import os
import javabridge as jv
import bioformats as bf
import matplotlib.pyplot as plt
import numpy as np

jv.start_vm(class_path=bf.JARS, max_heap_size='12G')

The user selects a file to work with

#hiding root alllows file diaglog GUI to be shown without any other GUI elements
root = Tk.Tk()
root.withdraw()
file_full_path = tkFileDialog.askopenfilename()
filepath, filename = os.path.split(file_full_path)
os.chdir(os.path.dirname(file_full_path))

print('opening:  %s' %filename)
reader = bf.ImageReader(file_full_path)
md = bf.get_omexml_metadata(file_full_path)
ome = bf.OMEXML(md)

Put the image in a numpy array

raw_data = []
    for z in range(iome.Pixels.get_SizeZ()):
    raw_image = reader.read(z=z, series=0, rescale=False)
    raw_data.append(raw_image)
raw_data = np.array(raw_data)

Show desired metadata

iome = ome.image(0) # e.g. first image
print(iome.get_Name())
print(iome.Pixels.get_SizeX())
print(iome.Pixels.get_SizeY())

Here is the error I am getting:

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-22-a22c1dbbdd1e> in <module>()
     11 reader = bf.ImageReader(file_full_path)
     12 md = bf.get_omexml_metadata(file_full_path)
---> 13 ome = bf.OMEXML(md)

/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/omexml.pyc in __init__(self, xml)
    318         if isinstance(xml, str):
    319             xml = xml.encode("utf-8")
--> 320         self.dom = ElementTree.ElementTree(ElementTree.fromstring(xml))
    321 
    322         # determine OME namespaces

<string> in XML(text)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 1623: ordinal not in range(128)

Here is a representative test image with a proprietary microscopic format

+3

python xml ascii bioinformatics biopython

puifais Apr 24 17 at 22:30

source to share

1 answer

BioGeek · Accepted Answer · 2017-04-26T09:59:33+0000

Thanks for adding the sample image. It helped a lot!

First, remove all unnecessary Tkinter code until we have a Minimal, Complete and Verified Example that allows us to reproduce your error message.

import javabridge as jv
import bioformats as bf

jv.start_vm(class_path=bf.JARS, max_heap_size='12G')

file_full_path = '/path/to/Cell1.lsm'

md = bf.get_omexml_metadata(file_full_path)

ome = bf.OMEXML(md)

jv.kill_vm()

At first we get warning messages about 3i SlideBook SlideBook6Reader library not found

, but we can apparently ignore .

Your error message reads UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 1623: ordinal not in range(128)

, so let's see what we can find at position 1623.

If you added print md

after md = bf.get_omexml_metadata(file_full_path)

, all xml with metadata will be printed. Let's zoom in:

>>> print md[1604:1627]
PhysicalSizeXUnit="µm"

So the symbol µ

is the culprit, it cannot be encoded with 'ascii' codec

.

Looking back at the track:

/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/omexml.pyc in __init__(self, xml)
    318         if isinstance(xml, str):
    319             xml = xml.encode("utf-8")
--> 320         self.dom = ElementTree.ElementTree(ElementTree.fromstring(xml))
    321 
    322         # determine OME namespaces

We see that in the lines before the error, we code our xml

in utf-8

, which should solve our problem. So why isn't this happening?

if we add print type(md)

, we go back <type 'unicode'>

and not <type 'str'>

waiting for the code. So this is a bug in omexml.py

!

To fix this problem, follow these steps (you may need root);

Switch to /anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/
delete omexml.pyc
to omexml.py

change line 318 from isinstance(xml, str):

toif isinstance(xml, basestring):

basestring

is the superclass for str

and unicode

. It is used to check if an object is an instance str

or unicode

.

I wanted to log a bug for this, but there seems to be an open issue already .

Bioformats-Python error: ascii codec cannot encode character u '\ xb5' when using OMEXML ()

More articles: