Python - Split a text file into multiple files with a character length limit

328 Views Asked by At

I am trying to split a large text file in PYTHON into multiple sub-files with the below conditions:

  1. Sub-files must have a character limit of 1024 characters
  2. Complete English sentences (i.e. fullstop to next fullstop) must be end up in the same file.
  3. To also ensure that if a line does not end up in i.e. 1.txt, then it must be in 2.txt and 2.txt file's length would have to be re-calculated (and not exceed 1024 characters).

The code I have been trying is as below (I was able to adhere to condition 1 but unable to satisfy condition 2 & 3):

maxChar = len(doc_text) #doc_text is the string containing the large text
excesstext = []
times = [1024 * i for i in range(0,int(maxChar/1024))]

for i in range(0, len(times)-1):
  tempchar = ''
  tempchar = tempchar + doc_text[times[i]:times[i+1]]
  tempchar = tempchar.rsplit('.',1)
  excesstext.append(tempchar[1])
  with open( f'/content/trunc/{i}.txt', encoding='utf-8', mode='w') as f:
    if len(excesstext)>1:
      print(tempchar[0] + excesstext[i-1])
      f.write(tempchar[0] + excesstext[i-1])
    else:
      print(tempchar[0])
      f.write(tempchar[0])

Kindly help me out if possible. Thank you!!

0

There are 0 best solutions below