Get full url to shorten url using python

Question

Get full url to shorten url using python

I have a list of urls like

l=['bit.ly/1bdDlXc','bit.ly/1bdDlXc',.......,'bit.ly/1bdDlXc']

I just want to see the full url from the short one for each item in this list.

Here is my approach,

import urllib2

for i in l:
    print urllib2.urlopen(i).url

But when the list contains thousands of URLs, the program takes a long time.

My question is, is there a way to shorten the execution time or any other approach that I should follow?

+3

python

nawarkhede 11 Aug 14 at 14:22

source to share

2 answers

I would try a twisted asynchronous web client. Be careful with this, however, it does not limit the speed.

#!/usr/bin/python2.7

from twisted.internet import reactor
from twisted.internet.defer import Deferred, DeferredList, DeferredLock
from twisted.internet.defer import inlineCallbacks
from twisted.web.client import Agent, HTTPConnectionPool
from twisted.web.http_headers import Headers
from pprint import pprint
from collections import defaultdict
from urlparse import urlparse
from random import randrange
import fileinput

pool = HTTPConnectionPool(reactor)
pool.maxPersistentPerHost = 16
agent = Agent(reactor, pool)
locks = defaultdict(DeferredLock)
locations = {}

def getLock(url, simultaneous = 1):
    return locks[urlparse(url).netloc, randrange(simultaneous)]

@inlineCallbacks
def getMapping(url):
    # Limit ourselves to 4 simultaneous connections per host
    # Tweak this as desired, but make sure that it no larger than
    # pool.maxPersistentPerHost
    lock = getLock(url,4)
    yield lock.acquire()
    try:
        resp = yield agent.request('HEAD', url)
        locations[url] = resp.headers.getRawHeaders('location',[None])[0]
    except Exception as e:
        locations[url] = str(e)
    finally:
        lock.release()


dl = DeferredList(getMapping(url.strip()) for url in fileinput.input())
dl.addCallback(lambda _: reactor.stop())

reactor.run()
pprint(locations)

0

Robᵩ 11 Aug 14 at 14:38

source to share

Roberto reale · Accepted Answer · 2014-08-11T14:36:07+0000

Method one

As suggested, one way to accomplish this would be to use the official weight loss api , which however has limitations (e.g. no more than 15 shortUrl

per request).

Method two

Alternatively, one could simply avoid getting the content for example. using the HEAD

HTTP method instead GET

. Here's just some sample code that uses the excellent requests package:

import requests

l=['bit.ly/1bdDlXc','bit.ly/1bdDlXc',.......,'bit.ly/1bdDlXc']

for i in l:
    print requests.head("http://"+i).headers['location']

Get full url to shorten url using python

More articles: