Unicode for URL in Python 3 from command line argument

If I build a barebone it works nicely. But in my actual code, I am getting a Unicode related error.

temp_url = " http://search.jd.com/Search?keyword= " + quote (self.keywords)

File "/usr/lib/python3.5/urllib/parse.py", line 706, in quote string = string.encode (encoding, errors)

UnicodeEncodeError: codec 'utf-8' cannot encode character '\ udce8' at position 0: surrogates not allowed

I am using an argument to pass my search string to Scrapy (1.4):

scrapy crawl jdspider -a keywords = "็”ต็ฏ"

and the corresponding code looks like this:

# -*- coding: utf-8 -*-
import scrapy, re
from urllib.parse import quote

def __init__(self, keywords=''):
    self.keywords = keywords.strip()

    temp_url = "http://search.jd.com/Search?keyword=" + quote(self.keywords)
    print ( temp_url )

      

So printing won't even be done - sth. doesn't work with the quotes method.

Python 3.5.2 Scrapy 1.4.0 Kubuntu 16.04

What am I doing wrong?

+3


source to share


1 answer


Such problems are common when you use Chinese characters or any other characters or symbols.
Try to encode the string with any other application encoder except utf-8.
https://docs.python.org/3/library/codecs.html#standard-encodings

But, the first question, would remove this symbol, making the information useless or perhaps not so useful.



If this is not a problem, try removing the symbol. It seems to be the first character in the line.

Use Try and throw an exception and then - Remove the first char
or better use for a loop to check each char and remove the characters that you cannot encode.

0


source







All Articles