Why does Pycharm print less than it writes to a file?

I am testing the following code, I found that the result after "print" does not match the text file. I have set the encoding to "UTF-8". This is mistake? How to fix?

import requests

url = "http://www.aastocks.com/tc/stocks/analysis/company-fundamental/financial-ratios?symbol=0001&period=4"
r = requests.get(url)
print r.content
f = open("test.txt","w")
f.write(r.content)

      

+3


source to share


2 answers


Until I know the exact version of the python used, I would venture to suggest that it is not 3.x due to the use of print statements.

The problem is not with your print application per se, but displaying such long lines (that's the length of 175765) can often be a serious problem. Python (especially on windows) starts to get cranky when dealing with lines that are several KB (176 KB) long. Instead of displaying the entire line in one expression, try breaking it up into multiple parts and then displaying it. You will see that there is no difference between what r.content appears on the screen and what it stores through f.write.

Just for your confirmation, you can do this after your code:

fh = open("test.txt","r")
print fh.read()
fh.close()

      

You will notice that there will be no difference between this and what the previous print statement showed. In the meantime, there is no need to know about it. โ€

I've tried this on python 3.4.x and linux, but the behavior you mentioned is not observed when using this combination of python and platform.

EDIT 1

Here's what I've tried:

import requests
url = "http://www.aastocks.com/tc/stocks/analysis/company-fundamental/financial-ratios?symbol=0001&period=4"
r = requests.get(url)
a = print(str(r.content))
f = open("test.txt","w")
f.write(str(r.content))
f.close()
f = open("test.txt","r")
print(f.read())
f.close()

      

and here is the result: http://pastebin.com/R0j0mYe5

EDIT 2

I didnโ€™t notice that the title was on fire. I tried it in 2.x and saw the behavior. This looks like a problem . Apparently some problems appear when scanning through html and decoding ot to print.

Here's what I saw:



print r.content[0:500]
print "*****"
print r.content[0:1000]

      

Gives and works like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="ๅ…ฌๅธ่ณ‡ๆ–™, ไธป่ฆ่ฒก็ถ“ๆฏ”็Ž‡, ๆตๅ‹•ๆฏ”็Ž‡, ่‚กๆฑๆฌŠ็›Šๅ›žๅ ฑ็Ž‡, ็ธฝ่ณ‡็”ขๅ›žๅ ฑ็Ž‡, ้‚Š้š›ๅˆฉๆฝค็Ž‡, ๆดพๆฏๆฏ”็Ž‡" /><meta name="description" content="ๅ…ฌๅธ่ณ‡ๆ–™, ่ฒกๅ‹™ๆฏ”็Ž‡, ่ฎŠ็พ่ƒฝๅŠ›, ๅ„Ÿๅ‚ต่ƒฝๅŠ›

*****

</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="teonal.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="ๅ…ฌๅธ่ณ‡ๆ–™, ไธป่ฆ่ฒก็ถ“ๆฏ”็Ž‡, ๆตๅ‹•ๆฏ”็Ž‡, ่‚กๆฑๆฌŠ็›Šๅ›žๅ ฑ็Ž‡, ็ธฝ่ณ‡็”ขๅ›žๅ ฑ็Ž‡, ้‚Š้š›ๅˆฉๆฝค็Ž‡, ๆดพๆฏๆฏ”็Ž‡" /><meta name="description" content="ๅ…ฌๅธ่ณ‡ๆ–™, ่ฒกๅ‹™ๆฏ”็Ž‡, ่ฎŠ็พ่ƒฝๅŠ›, ๅ„Ÿๅ‚ต่ƒฝๅŠ›, ๆŠ•่ณ‡ๅ›žๅ ฑ, ็›ˆๅˆฉ่ƒฝๅŠ›, ็‡Ÿ้‹่ƒฝๅŠ›, ๆŠ•่ณ‡ๆ”ถ็›Š, ็ถœๅˆๅ…จๅนด, ็ถœๅˆไธญๆœŸ" /><meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <script type="text/javascript">

      

As we can see, when only the first 500 lines are printed, the operation is performed as expected, but there are errors when we try to do more.

Something strange happens when it tries to decode the entire document.

However, in python 3.4.x I see the following:

print(con[0:500]) #con = r.content
print(con[0:1000])

      

output:

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe4\xb8\xbb\xe8\xa6\x81\xe8\xb2\xa1\xe7\xb6\x93\xe6\xaf\x94\xe7\x8e\x87, \xe6\xb5\x81\xe5\x8b\x95\xe6\xaf\x94\xe7\x8e\x87, \xe8\x82\xa1\xe6\x9d\xb1\xe6\xac\x8a\xe7\x9b\x8a\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe7\xb8\xbd\xe8\xb3\x87\xe7\x94\xa2\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe9\x82\x8a\xe9\x9a\x9b\xe5\x88\xa9\xe6\xbd\xa4\xe7\x8e\x87, \xe6\xb4\xbe\xe6\x81\xaf\xe6\xaf\x94\xe7\x8e\x87" /><meta name="description" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe8\xb2\xa1\xe5\x8b\x99\xe6\xaf\x94\xe7\x8e\x87, \xe8\xae\x8a\xe7\x8f\xbe\xe8\x83\xbd\xe5\x8a\x9b, \xe5\x84\x9f\xe5\x82\xb5\xe8\x83\xbd\xe5\x8a\x9b'

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe4\xb8\xbb\xe8\xa6\x81\xe8\xb2\xa1\xe7\xb6\x93\xe6\xaf\x94\xe7\x8e\x87, \xe6\xb5\x81\xe5\x8b\x95\xe6\xaf\x94\xe7\x8e\x87, \xe8\x82\xa1\xe6\x9d\xb1\xe6\xac\x8a\xe7\x9b\x8a\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe7\xb8\xbd\xe8\xb3\x87\xe7\x94\xa2\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe9\x82\x8a\xe9\x9a\x9b\xe5\x88\xa9\xe6\xbd\xa4\xe7\x8e\x87, \xe6\xb4\xbe\xe6\x81\xaf\xe6\xaf\x94\xe7\x8e\x87" /><meta name="description" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe8\xb2\xa1\xe5\x8b\x99\xe6\xaf\x94\xe7\x8e\x87, \xe8\xae\x8a\xe7\x8f\xbe\xe8\x83\xbd\xe5\x8a\x9b, \xe5\x84\x9f\xe5\x82\xb5\xe8\x83\xbd\xe5\x8a\x9b, \xe6\x8a\x95\xe8\xb3\x87\xe5\x9b\x9e\xe5\xa0\xb1, \xe7\x9b\x88\xe5\x88\xa9\xe8\x83\xbd\xe5\x8a\x9b, \xe7\x87\x9f\xe9\x81\x8b\xe8\x83\xbd\xe5\x8a\x9b, \xe6\x8a\x95\xe8\xb3\x87\xe6\x94\xb6\xe7\x9b\x8a, \xe7\xb6\x9c\xe5\x90\x88\xe5\x85\xa8\xe5\xb9\xb4, \xe7\xb6\x9c\xe5\x90\x88\xe4\xb8\xad\xe6\x9c\x9f" /><meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <script type="text/javascript">\rvar _gaq = _gaq || [];\r_gaq.push([\'_setAccount\', \'UA-20790503-3\']);\r_gaq.push([\'_setDomainName\', \'www.aastocks.com\']);\r_gaq.push([\'_trackPageview\']);\r_gaq.push([\'_trackPageLoadTime\']);\rfunction OA_show(name) {\r} \r</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="te'

      

But the result is similar in 3.x (like 2.x) if I try to decode utf-8:

print(con[0:500].decode('utf-8'))
print(con[0:1000].decode('utf-8'))

      

Op:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="ๅ…ฌๅธ่ณ‡ๆ–™, ไธป่ฆ่ฒก็ถ“ๆฏ”็Ž‡, ๆตๅ‹•ๆฏ”็Ž‡, ่‚กๆฑๆฌŠ็›Šๅ›žๅ ฑ็Ž‡, ็ธฝ่ณ‡็”ขๅ›žๅ ฑ็Ž‡, ้‚Š้š›ๅˆฉๆฝค็Ž‡, ๆดพๆฏๆฏ”็Ž‡" /><meta name="description" content="ๅ…ฌๅธ่ณ‡ๆ–™, ่ฒกๅ‹™ๆฏ”็Ž‡, ่ฎŠ็พ่ƒฝๅŠ›, ๅ„Ÿๅ‚ต่ƒฝๅŠ›

</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="te

      

0


source


There is an internal limit on the number of lines that the startup console buffer can hold. It is limited to about 15k lines.

To increase this limit, you will need to modify the file idea.properties

and add a key idea.cycle.buffer.size

and adjust it accordingly.



See this bug report for a detailed solution.

0


source







All Articles