How do I tell tail -f it has finished - cleanly?

321 Views Asked by At

I am copying a LOGFILE to a remote server as it is being created.

tail -f LOGILE | gzip -c >> /faraway/log.gz

However, when the original LOGFILE is closed, and moved to a storage directory, my tail -f seems to get some odd data.

How can I ensure that tail -f stops cleanly and that the compressed file /faraway/log.gz is a true copy of LOGFILE?

EDIT 1

I did a bit more digging.

/faraway/log.gz terminated badly - halfway through a FIX message. This must be because I ctrlCed the whole piped command above.

IF ignore this last line, then the original LOGFILE and log.gz match EXACTLY! That's for a 40G file transmitted across the atlantic.

I am pretty impressed by that as it does exactly what I want. Does any reader think I was just "lucky" in this case - is this likely NOT to work in future?

Now, I just need to get a clean close of gzip. Perhaps sending a kill -9 to the tail PID as suggested below may do allow GZIP to finish its compression properly.

2

There are 2 best solutions below

9
On

To get a full copy, use

tail -n +1 -f your file

If your don't use -n +1 option, you only get the tail part of the file.

Yet this does not solve the deleted/moved file problem.. In fact, the deleting/moving file problem is an IPC (inter-process communication) problem, or an inter-process co-operation problem. If you don't have the correct behavior model of the other process(es), you can't resolve the problem.

For example, if the other program COPY the log file to somewhere else, and then delete the current one, and the program then log outputs to that new log file... Apparently your tail can not read those outputs.

A related feature of unix (and unix-like system) worth of mentioning:

When a file is opened for read by process A, but is then deleted by process B, the physical contents will not be immediately deleted, since its reference count is not zero (someone is still using it, i.e. process A). Process A can still access the file, until it closes the file. Moving file is another question: If Process B, say, moves file to the same physical file system (Note: you may have many physical file system attached on your system), process A can still access the file, even the file is growing. This kind of moving is just to change name (path name + file name), nothing more. The identity of the file (a.k.a. "i-node" in unix) does not change. Yet if the file is moved to another physical file system, local or remote, it is as if the file is copied and then removed. So the remove rule mentioned can be applied.

The missing lines problem you mentioned is interesting, and may need more analysis on the behavior of the programs/processes which generate and move/delete the log file.

--update--

Happy to see you got some progress. Like I said, a process like tail can still access data after the file is deleted, in a unix-like system.

You can use ( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) | gzip -c - > yourZipFile.gz

to gzip your log file, and kill the tail program by

kill -TERM `cat /tmp/PID_tail`

The gzip should finish by itself without error. Even if you are worried about that gzip will receive a broken pipe signal, you can use this alternative way to prevent from the broken pipe:

 (  ( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) ; true ) | gzip -c - > yourZipFile.gz

The broken pipe is protected by a true, which prints nothing, but ends itself.

0
On

From the tail manpage: Emphasis mine

With --follow (-f), tail defaults to following the file descriptor, which means that even if a tail'ed file is renamed, tail will continue to track its end. This default behavior is not desirable when you really want to track the actual name of the file, not the file descriptor (e.g., log rotation). Use --follow=name in that case. That causes tail to track the named file in a way that accommodates renaming, removal and creation.

Therefore the solution to the problem you proposed is to use:

tail --follow=name LOGILE | gzip -c >> /faraway/log.gz

This way, when the file is deleted, tail stops reading it.