Bioformats-Python error: ascii codec cannot encode character u '\ xb5' when using OMEXML ()
I am trying to use bioformats in Python to read in a microscopic image (.lsm, .czi, .lif, you name it), print the metadata, and display the image. ome = bf.OMEXML(md)
gives me an error (below). I think he is talking about the information stored in md
. I don't like that the information in md
is not all ASCII. But how do I solve this problem? Here's what I wrote:
import Tkinter as Tk, tkFileDialog
import os
import javabridge as jv
import bioformats as bf
import matplotlib.pyplot as plt
import numpy as np
jv.start_vm(class_path=bf.JARS, max_heap_size='12G')
The user selects a file to work with
#hiding root alllows file diaglog GUI to be shown without any other GUI elements
root = Tk.Tk()
root.withdraw()
file_full_path = tkFileDialog.askopenfilename()
filepath, filename = os.path.split(file_full_path)
os.chdir(os.path.dirname(file_full_path))
print('opening: %s' %filename)
reader = bf.ImageReader(file_full_path)
md = bf.get_omexml_metadata(file_full_path)
ome = bf.OMEXML(md)
Put the image in a numpy array
raw_data = [] for z in range(iome.Pixels.get_SizeZ()): raw_image = reader.read(z=z, series=0, rescale=False) raw_data.append(raw_image) raw_data = np.array(raw_data)
Show desired metadata
iome = ome.image(0) # e.g. first image
print(iome.get_Name())
print(iome.Pixels.get_SizeX())
print(iome.Pixels.get_SizeY())
Here is the error I am getting:
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-22-a22c1dbbdd1e> in <module>()
11 reader = bf.ImageReader(file_full_path)
12 md = bf.get_omexml_metadata(file_full_path)
---> 13 ome = bf.OMEXML(md)
/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/omexml.pyc in __init__(self, xml)
318 if isinstance(xml, str):
319 xml = xml.encode("utf-8")
--> 320 self.dom = ElementTree.ElementTree(ElementTree.fromstring(xml))
321
322 # determine OME namespaces
<string> in XML(text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 1623: ordinal not in range(128)
Here is a representative test image with a proprietary microscopic format
source to share
Thanks for adding the sample image. It helped a lot!
First, remove all unnecessary Tkinter code until we have a Minimal, Complete and Verified Example that allows us to reproduce your error message.
import javabridge as jv
import bioformats as bf
jv.start_vm(class_path=bf.JARS, max_heap_size='12G')
file_full_path = '/path/to/Cell1.lsm'
md = bf.get_omexml_metadata(file_full_path)
ome = bf.OMEXML(md)
jv.kill_vm()
At first we get warning messages about 3i SlideBook SlideBook6Reader library not found
, but we can apparently ignore .
Your error message reads UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 1623: ordinal not in range(128)
, so let's see what we can find at position 1623.
If you added print md
after md = bf.get_omexml_metadata(file_full_path)
, all xml with metadata will be printed. Let's zoom in:
>>> print md[1604:1627]
PhysicalSizeXUnit="Β΅m"
So the symbol Β΅
is the culprit, it cannot be encoded with 'ascii' codec
.
Looking back at the track:
/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/omexml.pyc in __init__(self, xml)
318 if isinstance(xml, str):
319 xml = xml.encode("utf-8")
--> 320 self.dom = ElementTree.ElementTree(ElementTree.fromstring(xml))
321
322 # determine OME namespaces
We see that in the lines before the error, we code our xml
in utf-8
, which should solve our problem. So why isn't this happening?
if we add print type(md)
, we go back <type 'unicode'>
and not <type 'str'>
waiting for the code. So this is a bug in omexml.py
!
To fix this problem, follow these steps (you may need root);
- Switch to
/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/
- delete
omexml.pyc
- to
omexml.py
change line 318 fromisinstance(xml, str):
toif isinstance(xml, basestring):
basestring
is the superclass for str
and unicode
. It is used to check if an object is an instance str
or unicode
.
I wanted to log a bug for this, but there seems to be an open issue already .
source to share