Use the below code to remove nonascii from your corpus:
ip=open(nonascii.txt,'r')
op=open(ascii.txt,'w')
for line in ip:
line=line.strip().decode("ascii","ignore").encode("ascii")
if line=="":continue
op.write(line)
ip.close()
op.close()
source
to share