Newbie: unix bash, nested if statement, results from a loop results from sql

202 Views Asked by At

Newbie here, please pardon any confusing wording that I use. A common task I have is to take a list of names and do a MySQL query to look the names up in a table and see if they are "live" on our site.

Doing this one at a time, my SQL query works fine. I then wanted to do the query using a loop from a file listing multiple names. This works fine, too.

I added this query loop to my bash profile so that I can quickly do the task by typing this:

$ ValidOnSite fileName

This works fine, and I even added an usage statement for my process to remind myself of the syntax. Below is what I have that works fine:

validOnSite() {


        if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
                echo "Usage:"
                echo " $ validOnSite [filename]"
                echo " Where validOnSite uses specified file as variables in sql query:"
                echo " SELECT name, active FROM dbDb WHERE name=lines in file"
        else

                cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" ; done

                fi

Using a file "list.txt" which contains:

nameA
nameB

I would then type:

validOnSite list.txt

and both entries in list.txt meet my query criteria and are found in sql. My results will be:

nameA 1
nameB 1

Note the "1" after each result. I assume this is some sort of "yes" status.

Now, I add a third name to my list.txt, one that I know is not a match in sql. Now list.txt contains:

nameA
nameB
foo

When I again run this command for my list with 3 rows:

validOnSite list.txt

My results are the same as when I used the 1st version of file.txt, and I cannot see which lines failed, I still only see which lines were a success:

nameA 1
nameB 1

I have been trying all kinds of things to add a nested if statement, something that says, "If $line is a match, echo "pass", else echo "fail."

I do not want to see a "1" in my results. Using file.txt with 2 matches and 1 non-match, I would like my results to be:

nameA pass
nameB pass
foo fail

Or even better, color code a pass with green and a fail with red.

As I said, newbie here... :)

Any pointers in the right direction would help. Here is my latest sad attempt, but I realize I may be going in a wrong direction entirely:

validOnSite() {


        if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
                echo "Usage:"
                echo " $ validOnSite [filename]"
                echo " Where validOnSite uses specified file as variables in sql query:"
                echo " SELECT name, active FROM dbDb WHERE name=lines in file"
        else

                cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" > /dev/null ; done
 if ( "status") then
      echo $line "failed"
      echo $line "failed" >> outfile
else
     echo $line "ok"
     echo $line "ok" >>outfile
     clear
     cat outfile
     fi
fi

If something looks crazy in my last attempt, it's because it is - I am just googling around and trying as many things as I can while trying to learn. Any help appreciated, I feel stuck after working on this for a long time, but I am excited to move forward and find a solution! I think there is something I'm missing about understanding stdout, and also confusion about nested if's.

Note: I do not need an outfile, but it's ok if one is needed to accomplish the goal. stdout result alone would suffice, and is preferred.

Note: hgssql is just the name of our MySQL server. The MySQL part works fine, I am looking for a better way to deal with my bash output, and I think there is something about stderr that I'm missing. I'm looking for a fairly simple answer as I'm a newbie!

2

There are 2 best solutions below

0
On

I found a way to get to my solution by piecing together the few basic things that I know. Not elegant, but it works well enough for now. I created a file "[filename]Results" with the output:

nameA 1
nameB 1

I then cut out the "1"s and made a new file. I then did a comparison with "[fileName]results" to list.txt in order to see what lines exist in file.txt but do not exist in results.

Note: I have the following in my .zshrc file.

validOnSite() {


        if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
                echo "Usage:"
                echo " $ validOnSite [filename]"
                echo " Where validOnSite uses specified file as variables in sql query:"
                echo " SELECT name, active FROM dbDb WHERE name=lines in file"
        else

        cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name from dbDb where name='$line' and active='1'" >> $1"Pass"; done

autoload -U colors
colors
echo $fg_bold[magenta]Assemblies active on site${reset_color}
echo
cat $1"Pass"
echo
echo $fg_bold[red]Not active or not found on site${reset_color}
comm -23 $1 $1"Pass" 2> /dev/null
echo
echo

mv $1"Pass" ~cath/myFiles/validOnSiteResults
echo "Results file containing only active assemblies resides in ~cath/myFiles/validOnSiteResults"

fi
}

list.txt:

nameA
nameB
foo

My input:

validOnSite list.txt

My output:

Assemblies active on site (<--this font is magenta)

nameA
nameB

Not active or not found on site (<--this font is red)
foo


Results file containing only active assemblies resides in ~me/myFiles/validOnRRresults
2
On

I guess, by hgsql you mean some Mercurial extension that allows to perform MySQL queries. I don't know how hgsql works, but I know that MySQL returns only the matching rows. But in terms of shell scripting, the result is a string that may contain extra information even if the number of matched rows is zero. For example, some MySQL client may return the header or a string like "No rows found", although it is unlikely.

I'll show how it is done with the official mysql client. I'm sure you will manage to adapt hgsql with the help of its documentation to the following example.

if [ -t 1 ]; then
  red_color=$(tput setaf 1)
  green_color=$(tput setaf 2)
  reset_color=$(tput sgr0)
else
  red_color=
  green_color=
  reset_color=
fi


colorize_flag() {
  local color

  if [ "$1" = 'fail' ]; then
    color="$red_color"
  else
    color="$green_color"
  fi

  printf '%s' "${color}${1}${reset_color}"
}


sql_fmt='SELECT IF(active, "pass", "fail") AS flag FROM dbDb WHERE name = "%s"'

while IFS= read -r line; do
  sql=$(printf "$sql_fmt" "$line")

  flag=$(mysql --skip-column-names dbname -e "$sql")
  [ -z "$flag" ] && flag='fail'

  printf '%-20s%s\n' "$line" "$(colorize_flag "$flag")"
done < file

The first block detects if the script is running in interactive mode by checking if the file descriptor 1 (standard output) is opened on a terminal (see help test). If it is opened in a terminal, the script considers that the script is running interactively, i.e. the standard output is connected to the user's terminal directly, but not via pipe, for example. For interactive mode, it assigns variables to the terminal color codes with the help of tput command.

colorize_flag function accepts a string ($1) and outputs the string with the color codes applied according to its value.

The last block reads file line by line. For each line builds an SQL query string (sql) and invokes mysql command with the column names stripped off the output. The output of the mysql command is assigned to flag by means of command substitution. If "$flag" is empty, it is assigned to 'fail'. The $line and the colorized flag are printed to standard output.


You can test the non-interactive mode by chaining the output via pipe, e.g.:

./script | tee -a

I must warn you that it is generally bad idea to pass the shell variables into SQL queries unless the values are properly escaped. And the popular shells do not provide any tools to escape MySQL strings. So consider running the queries in Perl, PHP, or any programming language that is capable of building and running the queries safely.

Also note that in terms of performance it is better to run a single query and then parse the result set in a loop instead of running multiple queries in a loop, with the exception of prepared statements.