I'm trying to, I think, replicate the cat
functionality of the Linux shell in a platform-agnostic way such that I can take two text files and merge their contents in the following manner:
file_1 contains:
42 bottles of beer on the wall
file_2 contains:
Beer is clearly the answer
Merged file should contain:
42 bottles of beer on the wall
Beer is clearly the answer
Most of the techniques I've read about, however, end up producing:
42 bottles of beer on the wallBeer is clearly the answer
Another issue is that the actual files with which I'd like to work are incredibly large text files (FASTA formatted protein sequence files) such that I think most methods reading line-by-line are inefficient. Hence, I have been trying to figure out a solution using shutil
, as below:
def concatenate_fasta(file1, file2, newfile):
destination = open(newfile,'wb')
shutil.copyfileobj(open(file1,'rb'), destination)
destination.write('\n...\n')
shutil.copyfileobj(open(file2,'rb'), destination)
destination.close()
However, this produces the same problem as earlier except with "..." in between. Clearly, the newlines are being ignored but I'm at a loss with how to properly manage it.
Any help would be most appreciated.
EDIT:
I tried Martijn's suggestion, but the line_sep
value returned is None
, which throws an error when the function attempts to write that to the output file. I have gotten this working now via the os.linesep
method mentioned as less-optimal as follows:
with open(newfile,'wb') as destination:
with open(file_1,'rb') as source:
shutil.copyfileobj(source, destination)
destination.write(os.linesep*2)
with open(file_2,'rb') as source:
shutil.copyfileobj(source, destination)
destination.close()
This gives me the functionality I need, but I'm still at a bit of a loss as to why the (seemingly more elegant) solution is failing.
You have opened the files in binary mode, so no newline translation will take place. Different platforms use different line endings, and if you are on Windows
\n
is not enough.The simplest method would be to write
os.linesep
here:but this could violate the actual newline convention used in the files.
The better approach would be to open the text files in text mode, read a line or two, then inspect the
file.newlines
attribute to see what the convention is for that file:You may want to test
file_2
as well, perhaps raising an exception if the newline convention used doesn't match the first file.