Ctrl-C ends my script, but it doesn't get caught in the KeyboardInterrupt exception
I have a python script that contains a large loop that reads a file and does some things (I use several packages like urllib2, httplib2 or BeautifulSoup).
It looks like this:
try:
with open(fileName, 'r') as file :
for i, line in enumerate(file):
try:
# a lot of code
# ....
# ....
except urllib2.HTTPError:
print "\n >>> HTTPError"
# a lot of other exceptions
# ....
except (KeyboardInterrupt, SystemExit):
print "Process manually stopped"
raise
except Exception, e:
print(repr(e))
except (KeyboardInterrupt, SystemExit):
print "Process manually stopped"
# some stuff
The problem is that the program stops when I press Ctrl-C, but it doesn't hit either of my two KeyboardInterrupt exceptions, although I'm pretty sure it is currently in a loop (and at least inside a big try / except).
How is this possible? At first I thought it was because one of the packages I am using is not handling exceptions correctly (for example, only with "except:"), but if it does, my script will not stop. But the script REALLY stops and it needs to be caught by at least one of my two other than, right?
Where am I going wrong?
Thanks in advance!
EDIT:
With the addition of a clause finally:
after try-except and printing the trace in both try-except blocks it usually shows up None
when I press Ctrl-C, but I managed to get it one day (it seems that it comes from urllib2, but I don't know if this is the reason for why I can't catch KeyboardInterrupt):
Traceback (last call last):
File "/home/darcot/code/Crawler/crawler.py", line 294, in get_articles_from_file
content = Extractor(extractor='ArticleExtractor', url=url).getText()
File "/usr/local/lib/python2.7/site-packages/boilerpipe/extract/__init__.py", line 36, in __init__
connection = urllib2.urlopen(request)
File "/usr/local/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 409, in _open
'_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1173, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1148, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 4] Interrupted system call>
source to share
I already suggested in my comments to the question that this problem is most likely to be caused by a section of code that is not accounted for in the question. However, the exact code does not have to be relevant, as Python should normally throw an exception KeyboardInterrupt
when the Python code is interrupted by Ctrl-C.
You mentioned in the comments that you are using the boilerpipe
Python package . This Python package is using JPype
to create a Java language binding ... I can reproduce your problem with the following Python program:
from boilerpipe.extract import Extractor
import time
try:
for i in range(10):
time.sleep(1)
except KeyboardInterrupt:
print "Keyboard Interrupt Exception"
If you interrupt this program with Ctrl-C, no exception will be thrown. The program seems to exit immediately, leaving the Python interpreter unable to throw an exception. When the import boilerpipe
is removed the problem goes away ...
A debug session with gdb
indicates that a massive number of threads have started in Python if boilerpipe
imported:
gdb --args python boilerpipe_test.py
[...]
(gdb) run
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffef62b700 (LWP 3840)]
[New Thread 0x7fffef52a700 (LWP 3841)]
[New Thread 0x7fffef429700 (LWP 3842)]
[New Thread 0x7fffef328700 (LWP 3843)]
[New Thread 0x7fffed99a700 (LWP 3844)]
[New Thread 0x7fffed899700 (LWP 3845)]
[New Thread 0x7fffed798700 (LWP 3846)]
[New Thread 0x7fffed697700 (LWP 3847)]
[New Thread 0x7fffed596700 (LWP 3848)]
[New Thread 0x7fffed495700 (LWP 3849)]
[New Thread 0x7fffed394700 (LWP 3850)]
[New Thread 0x7fffed293700 (LWP 3851)]
[New Thread 0x7fffed192700 (LWP 3852)]
gdb
session without import boilerpipe
:
gdb --args python boilerpipe_test.py
[...]
(gdb) r
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7529533 in __select_nocancel () from /usr/lib/libc.so.6
(gdb) signal 2
Continuing with signal SIGINT.
Keyboard Interrupt Exception
[Inferior 1 (process 3904) exited normally
So my guess is that your Ctrl-C signal is being processed on a different thread, or that it JPype
is doing other odd things that break the processing of Ctrl-C.
The EDIT: . As a possible workaround, you can register a signal handler that catches the signal SIGINT
that the process receives when you press Ctrl-C. The signal handler runs even if boilerpipe
and are imported JPype
. This way you will receive a notification when the user presses Ctrl-C, and you can handle this event at the central point of your program. You can terminate the script if you like in this handler. If you don't, the script continues where it left off after the signal handler function returns. See example below:
from boilerpipe.extract import Extractor
import time
import signal
import sys
def interuppt_handler(signum, frame):
print "Signal handler!!!"
sys.exit(-2) #Terminate process here as catching the signal removes the close process behaviour of Ctrl-C
signal.signal(signal.SIGINT, interuppt_handler)
try:
for i in range(10):
time.sleep(1)
# your_url = "http://www.zeit.de"
# extractor = Extractor(extractor='ArticleExtractor', url=your_url)
except KeyboardInterrupt:
print "Keyboard Interrupt Exception"
source to share