How do I determine if a thread has died and then restart it?
I have an application that starts a series of threads. Sometimes one of these threads dies (usually due to a network problem). How can I correctly detect a thread crashing and restart only that thread? Here's some sample code:
import random
import threading
import time
class MyThread(threading.Thread):
def __init__(self, pass_value):
super(MyThread, self).__init__()
self.running = False
self.value = pass_value
def run(self):
self.running = True
while self.running:
time.sleep(0.25)
rand = random.randint(0,10)
print threading.current_thread().name, rand, self.value
if rand == 4:
raise ValueError('Returned 4!')
if __name__ == '__main__':
group1 = []
group2 = []
for g in range(4):
group1.append(MyThread(g))
group2.append(MyThread(g+20))
for m in group1:
m.start()
print "Now start second wave..."
for p in group2:
p.start()
In this example, I start 4 threads, then I start 4 more threads. Each thread randomly generates int
between 0 and 10. If this int
is equal 4
, it throws an exception. Please note that I am not join
threads. I want a list of threads to execute group1
and group2
. I found that if I join threads, it will wait for the thread to end. My thread must be a daemon process, so rarely (if ever) gets caught in an ValueError
Exception, this example shows the code and should run continuously. After joining it, the next set of threads does not start.
How can I detect that a specific thread has died and only restarts one thread?
I tried to execute the next loop right after my loop for p in group2
.
while True:
# Create a copy of our groups to iterate over,
# so that we can delete dead threads if needed
for m in group1[:]:
if not m.isAlive():
group1.remove(m)
group1.append(MyThread(1))
for m in group2[:]:
if not m.isAlive():
group2.remove(m)
group2.append(MyThread(500))
time.sleep(5.0)
I took this method from this question.
The problem with this is that isAlive()
it always seems to return True
because the threads are never restarted.
Edit
Would it be more appropriate to use multiprocessing in this situation? I found this tutorial. Is it more appropriate to have separate processes if I need to restart a process? It seems like restarting the thread is difficult.
It was pointed out in the comments that I should check is_active()
for a stream. I don't see this in the documentation, but I can see isAlive
which one I am currently using. As I mentioned above, this returns True
, so I can never see the thread died.
source to share
You can try trying other than where you expect it to crash (if it could be anywhere, you can do it around the entire startup function) and have an indicator variable that has its status.
So something like the following:
class MyThread(threading.Thread):
def __init__(self, pass_value):
super(MyThread, self).__init__()
self.running = False
self.value = pass_value
self.RUNNING = 0
self.FINISHED_OK = 1
self.STOPPED = 2
self.CRASHED = 3
self.status = self.STOPPED
def run(self):
self.running = True
self.status = self.RUNNING
while self.running:
time.sleep(0.25)
rand = random.randint(0,10)
print threading.current_thread().name, rand, self.value
try:
if rand == 4:
raise ValueError('Returned 4!')
except:
self.status = self.CRASHED
Then you can use your loop:
while True:
# Create a copy of our groups to iterate over,
# so that we can delete dead threads if needed
for m in group1[:]:
if m.status == m.CRASHED:
value = m.value
group1.remove(m)
group1.append(MyThread(value))
for m in group2[:]:
if m.status == m.CRASHED:
value = m.value
group2.remove(m)
group2.append(MyThread(value))
time.sleep(5.0)
source to share
I had a similar problem and came across this question. I found that join takes a timeout argument and is_alive will return False after joining the stream. So my audit for each thread:
def check_thread_alive(thr):
thr.join(timeout=0.0)
return self.thr.is_alive()
This reveals the thread of death for me.
source to share