How to resolve URL redirects?

2.9k Views Asked by At

I have a txt document with many short urls.Each url is seperated by a single line.I want to resolve the URLs to get the final link.Also some URLs are redirected twice.How to automate this to get the final urls with output format of one url per line? Update: Input text file:

http://www.example.com/go/post-page-1 
http://www.example.com/go/post-page-2 
http://www.example.com/go/post-page-3 

Output format needed in txt file:

http://www.example.org/post-page-name
http://www.example.org/post-page-name
http://www.example.org/post-page-name

Here is how the links are redirected:

Initial URL:http://www.example.com/go/post-page 
    ==>301 Permanent Redirect

Intermediate url:http://click.affiliate.com/tracking?url=http://www.example.org/post-page-name
==>302 Temporary Redirect

Final URL: http://www.example.org/post-page-name

Here is the code i tried but it doesn't resolve URLs to the final link but rather to the intermediate link.

#!/bin/bash
rm resolved_urls.txt
for url in $(cat url.txt); do
        wget -S "$url" 2>&1 | grep ^Location >> resolved_urls.txt
done
2

There are 2 best solutions below

0
On

So, it's not 100% clear on what you're asking for. But what I'm seeing, and what I'm guessing, I think this'll do it for you:

#! /bin/bash
# Use the urls.txt as your input file for wget
# Use the url-redirect.txt as your output file from wget.

wget -S -i urls.txt -o url-redirect.txt

# Grep for your "Final URL" output, extract the URL, assuming
#   the output you provided is what you're looking for, and is 
#   uniform, and redirect to your resolved_urls.txt file.

grep 'Final URL' url-redirect.txt | cut -d ' ' -f3>resolved_urls.txt

# Remove your trash temp file.
rm url-redirect.txt

This could probably be a lot faster without all the redirects, but I think this satisfies what you're looking for.

0
On

Try something like this:

#!/bin/bash

function getFinalRedirect {
    local url=$1
    while true; do
        nextloc=$( curl -s -I $url | grep ^Location: )
        if [ -n "$nextloc" ]; then
            url=${nextloc##Location: }
        else
            break
        fi
    done

    echo $url
}

url="http://stackoverflow.com/q/25485374/1563512"
getFinalRedirect $url

Beware of infinite redirects. This produces:

$ ./test.bash 
http://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects

Then, to call the function on your file:

while read url; do
    getFinalRedirect $url
done < urls.txt > finalurls.txt