Unscramble words Challenge - improve my bash solution

775 Views Asked by At

There is a Capture the Flag challenge

I have two files; one with scrambled text like this with about 550 entries

dnaoyt
cinuertdso
bda
haey
tolpap
...

The second file is a dictionary with about 9,000 entries

radar
ccd
gcc
fcc
historical
...

The goal is to find the right, unscrambled version of the word, which is contained in the dictionary file.

My approach is to sort the characters from the first word from the first file and then look up if the first word from the second file has the same length. If so then sort that too and compare them.

This is my fully functional bash script, but it is very slow.

#!/bin/bash

while IFS="" read -r p || [ -n "$p" ]
do
    var=0
    ro=$(echo $p | perl -F -lane 'print sort @F')
    len_ro=${#ro}
    while IFS="" read -r o || [ -n "$o" ]
    do
        ro2=$(echo $o | perl -F -lane 'print sort @ F')
        len_ro2=${#ro2}
        let "var+=1"
        if [ $len_ro == $len_ro2 ]; then
            if  [ $ro == $ro2 ]; then
                echo $o >> new.txt
                echo $var >> whichline.txt
            fi
        fi
    done < dictionary.txt
done < scrambled-words.txt

I have also tried converting all characters to ASCII integers and sum each word, but while comparing I realized that the sum of a different char pattern may have the same sum.

[edit] For the records: - no anagrams contained in dictionary - to get the flag, you need to export the unscrambled words as one blob and ans make a SHA-Hash out of it (thats the flag) - link to ctf for guy who wanted the files https://challenges.reply.com/tamtamy/user/login.action

4

There are 4 best solutions below

0
On BEST ANSWER

You're better off creating a lookup dictionary (keyed by the sorted word) from the dictionary file.

Your loop body is executed 550 * 9,000 = 4,950,000 times (O(N*M)).

The solution I propose executes two loops of at most 9,000 passes each (O(N+M)).

Bonus: It finds all possible solutions at no cost.

#!/usr/bin/perl

use strict;
use warnings qw( all );
use feature qw( say );

my $dict_qfn      = "dictionary.txt";
my $scrambled_qfn = "scrambled-words.txt";

sub key { join "", sort split //, $_[0] }

my %dict;
{
   open(my $fh, "<", $dict_qfn)
      or die("Can't open \"$dict_qfn\": $!\n");

   while (<$fh>) {
      chomp;
      push @{ $dict{key($_)} }, $_;
   }
}

{
   open(my $fh, "<", $scrambled_qfn)
      or die("Can't open \"$scrambled_qfn\": $!\n");

   while (<$fh>) {
      chomp;
      my $matches = $dict{key($_)};
      say "$_ matches @$matches" if $matches;
   }
}

I wouldn't be surprised if this only takes one millionths of the time of your solution for the sizes you provided (and it scales so much better than yours if you were to increase the sizes).

0
On

I would do something like this with gawk

gawk '
NR == FNR {
    dict[csort()] = $0
    next
}

{
    print dict[csort()]
}

function csort(    chars, sorted) {
    split($0, chars, "")
    asort(chars)
    for (i in chars)
        sorted = sorted chars[i]

    return sorted
}' dictionary.txt scrambled-words.txt
0
On

I tried something very alike, but a bit different.

#!/bin/bash

exec 3<scrambled-words.txt
while read -r line <&3; do
   printf "%s" ${line} | perl -F -lane 'print sort @F'
done>scrambled-words_sorted.txt
exec 3>&-

exec 3<dictionary.txt
while read -r line <&3; do
   printf "%s" ${line} | perl -F -lane 'print sort @F'
done>dictionary_sorted.txt
exec 3>&-

printf "" > whichline.txt
exec 3<scrambled-words_sorted.txt
while read -r line <&3; do
   counter="$((++counter))"
   grep -n -e "^${line}$" dictionary_sorted.txt | cut -d ':' -f 1 | tr -d '\n' >>whichline.txt   printf "\n" >>whichline.txt
done   
exec 3>&-

As you can see I don't create a new.txt file; instead I only create whichline.txt with a blank line where the word doesn't match. You can easily paste them up to create new.txt.

The logic behind the script is nearly the logic behind yours, with the exception that I called perl less times and I save two support files. I think (but I am not sure) that creating them and cycle only one file will be better than ~5kk calls of perl. This way "only" ~10k times is called.

Finally, I decided to use grep because it's (maybe) the fastest regex matcher, and searching for the entire line the lenght is intrinsic in the regex.

Please, note that what @benjamin-w said is still valid and, in that case, grep will reply badly and I did not managed it!

I hope this could help [:

0
On

Here's perl-free solution I came up with using sort and join:

sort_letters() {
    # Splits each letter onto a line, sorts the letters, then joins them
    #   e.g. "hello" becomes "ehllo"
    echo "${1}" | fold-b1 | sort | tr -d '\n'
}


# For each input file...
for input in "dict.txt" "words.txt"; do
    # Convert each line to [sorted] [original]
    #  then sort and save the results with a .sorted extension
    while read -r original; do
        sorted=$(sort_letters "${original}")
        echo "${sorted} ${original}"
    done < "${input}" | sort > "${input}.sorted"
done

# Join the two files on the [sorted] word
#   outputting the scrambled and unscrambed words
join -j 1 -o 1.2,2.2 "words.txt.sorted" "dict.txt.sorted"