Quick Way to Check for Url Redirects
Recently I had to go through a bunch of URLs and identify whether or not they redirect to another URL. Since the list was rather long (several hundred entries), I thought I should rather write a script to do this for me.
I used the Requests library to make the initial request for a given URL. If the response code was 3XX
, then I assumed it is being redirected. There are a couple of other edge cases (timeouts and generic connection errors, either due to non-existent domain, or network error) that the script attempts to cover for.
The script accepts a single parameter, which should be a filename containing URL strings. If the filename is not provided, the script will look for domains.txt
in the current directory.
#!/usr/bin/env python
import sys
import requests
def check_for_redirects(url):
try:
r = requests.get(url, allow_redirects=False, timeout=0.5)
if 300 <= r.status_code < 400:
return r.headers['location']
else:
return '[no redirect]'
except requests.exceptions.Timeout:
return '[timeout]'
except requests.exceptions.ConnectionError:
return '[connection error]'
def check_domains(urls):
for url in urls:
url_to_check = url if url.startswith('http') else "http://%s" % url
redirect_url = check_for_redirects(url_to_check)
print("%s => %s" % (url_to_check, redirect_url))
if __name__ == '__main__':
fname = 'domains.txt'
try:
fname = sys.argv[1]
except IndexError:
pass
urls = (l.strip() for l in open(fname).readlines())
check_domains(urls)
Below is an example of URLs. If protocol is not defined, http
will be assumed.
google.com
http://gmail.com
www.ibm.com
redhat.com
example.com
i-believe-this-domain-does-not-exist-123abc.com
And the results:
http://google.com => http://www.google.com/
http://gmail.com => http://mail.google.com/mail/
http://www.ibm.com => http://www.ibm.com/us/en/
http://redhat.com => http://www.redhat.com/
http://example.com => http://www.iana.org/domains/example/
http://i-believe-this-domain-does-not-exist-123abc.com => [connection error]