Prepend leading zeros to each line of a file

269 Views Asked by At

I have a file that looks like this:

1:line1
14:line2
135:line3
15:line4

I need to prepend leading zeros to each line to make it look like this:

00001:line1
00014:line2
00135:line3
00015:line4

Is there an easy way to do this in Linux?

I have tried using

awk '{printf "%05d:%s\n", FNR, $0}' file

but this outputted:

00001:1:line1
00002:14:line2
00003:135:line3
00004:15:line4

I should note I did not write this command, I got it from Google and don't really understand how it works

10

There are 10 best solutions below

1
wayofthepie On BEST ANSWER

There are many ways, one way is to use awk

awk -F":" '{OFS=FS; $1 = sprintf("%05d", $1); print}' "${filename}"

To break it down:

  • -F":" set the field seperator to ":", awk will split the lines into columns for each :.
  • OFS=FS set the output field separator to the field separator, this essentially puts ":" back into each line when we output it.
  • $1 = sprintf("%05d", $1) set the first column, $1, to be itself padded with 0's and of length 5.
  • print print the line.
2
dawg On

You can do:

awk 'BEGIN{FS=OFS=":"} 
{$1=sprintf("%05d", $1)} 1' file 

Prints:

00001:line1
00014:line2
00135:line3
00015:line4

From comments, a cool version:

awk '$1=sprintf("%05d", $1)' FS=: OFS=: file
# same
1
Thor On

A coreutils alternative:

paste -d: <(printf "%05d\n" $(cut -d: -f1 infile)) <(cut -d: -f2- infile)
0
Freeman On

check it out too (thnx to @EdMorton)

awk -F':' '{
    p = index($0, ":")
    tag = substr($0, 1, p-1)
    val = substr($0, p+1) 
    # or tag=val=$0; sub(/:.*/,"",tag); sub(/[^:]+:/,"",val)
    printf "%05d:%s\n", tag, val
}' input.txt

Any time you have tag:value pairs if you don't KNOW and completely control exactly which chars value can contain it's safer to do p=index($0,":"); tag=substr($0,1,p-1); val=substr($0,p+1) or tag=val=$0; sub(/:.*/,"",tag); sub(/[^:]+:/,"",val) or similar than tag=$1; val=$2.

you can also solve this with ruby :

ruby -ne 'puts "%05d:%s" % $_.split(":")' input.txt

or perl

perl -pe 's/(\d+):/sprintf "%05d:", $1/e' input.txt

output

00001:line1
00014:line2
00135:line3
00015:line4
0
Timur Shtatland On

Use this Perl one-liner:

 perl -lpe 's{^\d+}{sprintf "%05d", $&}e;' infile > outfile

To change the file in-place:

 perl -i.bak -lpe 's{^\d+}{sprintf "%05d", $&}e;' infile

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.

The regex uses this modifier:
/e : Evaluate REPLACEMENT as an expression in s/PATTERN/REPLACEMENT/

^ : The beginning of the line.
\d+ : One or more digits. The match is captured into variable $&, which we use later inside sprintf.
sprintf "%05d", $& : returns a string where the digits captured in $& are padded with 0s to give a number of length 5.

See also:

0
Ed Morton On

Using any awk:

$ awk '{p=index($0,":"); printf "%05d%s\n", substr($0,1,p-1), substr($0,p)}' file
00001:line1
00014:line2
00135:line3
00015:line4
0
RavinderSingh13 On

Adding one more way of doing it in awk. Setting field separator and output field separator as : and adding spaces to 1st field to make it 5 only as per need, then substituting each space with 0.

awk -F':' -v OFS=':' '{$1=sprintf("%5s",$1);gsub(/ /,"0",$1)} 1' Input_file
2
KingWealth On

Here is a python script. Here we are just splitting the line and zfilling the number to the required length.

def format_data(filename: str, int_length: int):
    new_lines = []
    if not os.path.exists(filename):
        print("No file found!")
        return 1
    with open(filename, "r") as fileobj:
        for line in fileobj.readlines():
            try:
                pre, post = line.split(":")
                new_lines.append(f"{pre.zfill(int_length)}:post")
            except Exception:
                print(f"Some error")
                new_lines.append(line)
    write_filename = f"updated_{filename}"
    with open(write_filename, "w") as fileobj:
        fileobj.writelines([string + '\n' for string in new_lines])

    print(f"Saved updated file {write_filename}")
    return 0

Call the function with the filename and the total length of the number as needed

    format_data("sample.txt", 5)

The output file is written to updated_<filename> in the same directory.

1
Gustavo Castro On

Error in you code is that your print FNR equivalent to Number of Record and $0 (all field). If you set the Field separator (-F=":") field $1 are valued that add zeros, and $2 is the second field with line value. Thus print $1 (to add zero with %05d) into printf statment and $2(line value)

 awk -F"\:" '{printf "%05d:%s\n", $1, $2}' file

add "\" for string with regular expression ":"

0
RARE Kpop Manifesto On

UPDATED with streamlined version that retains chains of :, while preserving precision for any input length :

echo '
2:line1
00123:line
::::::::line3
00000000000005923555555555555555555555555877777777:line9' |  

gtee >( gcat -b >&2; ) | 

mawk '$!NF = substr("00000",__ = index($(NF += OFS = _),
                     ":"), (__ < 6) * 5) $_'    FS='^0+' 

 1  2:line1
 2  00123:line2
 3  ::::::::line3
 4  00000000000005923555555555555555555555555877777777:line9
00002:line1
00123:line2
00000::::::::line3
5923555555555555555555555555877777777:line9

======================================================

It should be split into 3 scenarios :

skip the sub-strings if it's already perfectly 5


Zero trim the leading excess if it's longer, but in a precision preserving manner.


pad to 5 when it's shorter

echo '
1:line1
14:line2
135:line3
15:line4
00000003523532:line5' | 

 mawk '{ gsub(/::+/, ":") } ! (_ = 6-index($__, ":")) || 
       $!NF = _<-_ ? substr($__, match($__, /[^0]*.....:/)) \
                   : sprintf("%.*d",_,__) $__' 

00001:line1
00014:line2
00135:line3
00015:line4
3523532:line5
  • Solutions utilizing %05d perform both a string-to-num then another num-to-string conversion when neither is needed,

Using %05d for longer inputs risk altering the input itself ::

echo '9992315351235323253252317:line9' | 

gawk '{p=index($0,":"); printf "%05d%s\n", substr($0,1,p-1), substr($0,p)}'

               |
9992315351235322445824000:line9 
               |->
9992315351235323253252317:line9
               |             

All digits at or to the right of the marked line has been corrupted by %05d. bigint certainly can mitigate this risk, but why require risk mitigation when there weren't any to begin with ?