Removing only part of character line in txt file - chr string

541 Views Asked by At

I would like to remove strings "chr" in a following txt file, using bash:

    FO538757.1      chr1:183937
    AL669831.3      chr1:601436
    AL669831.3      chr1:601667
    AL669831.3      chr1:609395
    AL669831.3      chr1:609407
    AL669831.3      chr1:611317

So that end file looks like:

FO538757.1      1:183937
AL669831.3     1:601436
AL669831.3     1:601667
AL669831.3     1:609395
AL669831.3     1:609407
AL669831.3     1:611317
  1. I checked previous threads and tried:

    sed 's/^chr//' 
     awk 'BEGIN {OFS=FS="\t"} {gsub(/chr1/,"1",$2)}2'
    
  2. none of them worked. Is here any better option than awk?

Thank you!

4

There are 4 best solutions below

0
On

Using the bash shell with Parameter Expansion and mapfile aka readarray

#!/usr/bin/env bash

shopt -s extglob

mapfile -t array < file.txt

array=("${array[@]##+([[:space:]])}")

printf '%s\n' "${array[@]/chr}"

Inside the script extglob must be enable but in the command line it might be enabled already, so in-one-line

mapfile -t array < file.txt; array=("${array[@]##+([[:space:]])}"); printf '%s\n' "${array[@]/chr}"

It will be very slow for large set of data/files, jfyi

0
On

I suspect all you really need is:

sed 's/chr//' file
0
On

You can do that quite easily with sed and two expressions, (1) the first to remove chr and the second to remove leading whitespace, e.g.

sed -e 's/chr//' -e 's/^[[:blank:]]*//'  file

Example Use/Output

With your input in the file named file, you would have

$ sed -e 's/chr//' -e 's/^[[:blank:]]*//'  file
FO538757.1      1:183937
AL669831.3      1:601436
AL669831.3      1:601667
AL669831.3      1:609395
AL669831.3      1:609407
AL669831.3      1:611317
0
On

With your shown samples, please try following. Simple explanation would be: substituting starting chr with NULL in 2nd field and printing the line then, which will cause reconstruct of current line and initial spaces will be removed too from line.

awk '{sub(/^chr/,"",$2)} 1' Input_file

In case your Input_file is tab delimited and having tabs in starting of file then try following:

awk 'BEGIN{FS=OFS="\t"} {sub(/^chr/,"",$3);sub(/^\t+/,"")} 1' Input_file