Unicode for URL in Python 3 from command line argument
If I build a barebone it works nicely. But in my actual code, I am getting a Unicode related error.
temp_url = " http://search.jd.com/Search?keyword= " + quote (self.keywords)
File "/usr/lib/python3.5/urllib/parse.py", line 706, in quote string = string.encode (encoding, errors)
UnicodeEncodeError: codec 'utf-8' cannot encode character '\ udce8' at position 0: surrogates not allowed
I am using an argument to pass my search string to Scrapy (1.4):
scrapy crawl jdspider -a keywords = "็ต็ฏ"
and the corresponding code looks like this:
# -*- coding: utf-8 -*-
import scrapy, re
from urllib.parse import quote
def __init__(self, keywords=''):
self.keywords = keywords.strip()
temp_url = "http://search.jd.com/Search?keyword=" + quote(self.keywords)
print ( temp_url )
So printing won't even be done - sth. doesn't work with the quotes method.
Python 3.5.2 Scrapy 1.4.0 Kubuntu 16.04
What am I doing wrong?
source to share
Such problems are common when you use Chinese characters or any other characters or symbols.
Try to encode the string with any other application encoder except utf-8.
https://docs.python.org/3/library/codecs.html#standard-encodings
But, the first question, would remove this symbol, making the information useless or perhaps not so useful.
If this is not a problem, try removing the symbol. It seems to be the first character in the line.
Use Try and throw an exception and then - Remove the first char
or better use for a loop to check each char and remove the characters that you cannot encode.
source to share