Python SimpleHTTPServer running on a thread won't close port

I have the following code:

import os
from ghost import Ghost
import urlparse, urllib

import SimpleHTTPServer
import SocketServer

import sys, traceback

from threading import Thread, Event
from time import sleep

please_die = Event() # this is my enemy

httpd = None
PORT = 8001
address = 'http://localhost:'+str(PORT)+'/'
search_dir = './category'

def main():
    """
      basic run script routine, 
      FIXME: is supossed to exits gracefully
    """
    thread = Thread(target = simpleServe)
    try:
      thread.start()
      run()
    except KeyboardInterrupt:
      print "Shutdown requested"
    except Exception:
      traceback.print_exc(file=sys.stdout)

    shutdown()
    sys.exit(0)

def shutdown():
  global httpd
  global please_die
  print "Shutting down"
  # A try - except for the shutdown routine
  try:
    please_die.wait() # how do you do? 

    httpd.shutdown() # Please! I whant to run you multiple times. 
    print "Have you died?"
  except Exception:
    traceback.print_exc(file=sys.stdout)

def path2url(path):
  """
  constructs an url from a relative path / concatenates the global address
  variable with the path given
  """
  global address
  return urlparse.urljoin(address, urllib.pathname2url(path))

def simpleServe():
  global httpd, PORT

  please_die.set() # Attaching the event to this thread

  # Start the service
  Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

  httpd = SocketServer.TCPServer(("", PORT), Handler)

  print "serving at port", PORT
  # And loop infinetly in the hope that I can stop you later
  httpd.serve_forever()

def run():
  global search_dir;

  ghost = Ghost() # the webkit facade

  with ghost.start() as session:

    session.set_viewport_size(2560, 1600) # "retina" size

    for directory, subdirectories, files in os.walk(search_dir):
        for file in files:
            path = os.path.join(directory, file)
            urlPath = path2url(path)
            process(session, urlPath);

def process(session, urlPath):
  page, resources = session.open(urlPath)
  assert page.http_status == 200
  # ... other asserts here 


if __name__ == '__main__':
  main()

      

The idea is to create a script that starts a "simple HTTP server", makes some requests, and then exits.

The first time it starts up without problems:

...
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /static/img/glyphicons-halflings.png HTTP/1.1" 200 -
Shutting down
Have you died?

      

Running it a second time results in a crash in that:

The address is already in use

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "download-images.py", line 51, in simpleServe
    httpd = SocketServer.TCPServer(("", PORT), Handler)
  File "/usr/lib/python2.7/SocketServer.py", line 420, in __init__
    self.server_bind()
  File "/usr/lib/python2.7/SocketServer.py", line 434, in server_bind
    self.socket.bind(self.server_address)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use

      

If I kill all python processes than the script starts up again, and because of this, I assume I was using this thread incorrectly, but cannot find where.

Update

Forgot to mention that

my OS:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.04
Release:        15.04
Codename:       vivid

      

The pit I am using is:

$ python --version
Python 2.7.9

      

$ netstat -putelan | grep 8001 prints:

$ netstat -putelan | grep 8001
(Not all processes could be identified, non-owned process info
    cp        0      0 127.0.0.1:34691         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34866         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34798         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34588         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34647         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34915         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34674         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34451         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34930         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34606         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34505         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34717         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34670         0      0 127.0.0.1:8001          127.0.0.1:34626         
...

      

I cannot post the whole sequence (from outside of post stackoverflow). The rest is the same with port 34 *** mixed with port 8001 in a uniform sequence.

+3


source to share


3 answers


As @LFJ says, this is probably due to the attribute allow_reuse_address

TCPServer

.

httpd = SocketServer.TCPServer(("", PORT), Handler, bind_and_activate=False)
httpd.allow_reuse_address = True

try:
    httpd.server_bind()
    httpd.server_activate()
except:
    httpd.server_close()
    raise

      

Equivalent code:

SocketServer.TCPServer.allow_reuse_address = True
https = SocketServer.TCPServer(("", PORT), Handler)

      

Let's explain a little why.

When you include it TCPServer.allow_reuse_address

, it adds an option to the socket:



class TCPServer:
    [...]
    def server_bind(self):
        if self.allow_reuse_address:
            self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        [...]

      

What is it socket.SO_REUSEADDR

?

This socket option tells the kernel that even if this port is busy (in
the TIME_WAIT state), go ahead and reuse it anyway.  If it is busy,
but with another state, you will still get an address already in use
error.  It is useful if your server has been shut down, and then
restarted right away while sockets are still active on its port.  You
should be aware that if any unexpected data comes in, it may confuse
your server, but while this is possible, it is not likely.       

      

In fact, it allows you to reuse the binding address of the socket socket. If another process tries to bind while the socket is not listening, the process will be allowed to use that socket bind address.

The reason you need to enable this is because you are not closing yours correctly TCPServer

. To close it correctly, you have to run a method shutdown

that will close the thread that was running server_forever

and then close the socket correctly by calling the method server_close

.

def shutdown():
    global httpd
    global please_die
    print "Shutting down"

    try:
        please_die.wait() # how do you do? 
        httpd.shutdown() # Stop the serve_forever
        httpd.server_close() # Close also the socket.
    except Exception:
        traceback.print_exc(file=sys.stdout)

      

+9


source


I saw the TCPServer source code:

def server_bind(self):
        """Called by constructor to bind the socket.

        May be overridden.

        """
        if self.allow_reuse_address:
            self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        self.socket.bind(self.server_address)
    self.server_address = self.socket.getsockname()

      



allow_reuse_address must be set for binding. so try this:

SocketServer.TCPServer.allow_reuse_address=True
httpd = SocketServer.TCPServer(("", PORT), Handler)

      

+2


source


You do not clean up the server after it is closed. This means that you leave dead socket resources lying around that the OS does not clean up immediately after the process ends.

You need to call httpd.server_close()

the finally block after your call to httpd.serve_forever()

. This call tells the OS to release all resources that might have been associated with this server instance.

try:
    httpd.serve_forever()
finally:
    httpd.server_close()

      

+2


source







All Articles