error of finding the total line number of a large text file using the Windows command prompt

370 Views Asked by At

I would like to find the total line number of a text file ( > 60 GB) using the Windows command prompt.

I used:

 findstr /R /N "^" file.txt | find /C ":"

But, the returned result is a negative number. Is it overflow? The file have not more than 5 billion lines. For an integer (4 Bytes), its max range is From −2,147,483,648 to 2,147,483,647. So, I need to design a script to count the number by dividing the result with 1000 ?

If yes, please help me with how to design the Windows batch file.

2

There are 2 best solutions below

10
On BEST ANSWER

You could try a JScript solution. JavaScript numbers are always a 64-bit float data type, accurate up to 15 digits as integers. It'll take a while though. It takes me about 15 seconds to count the lines in a 100 meg XML file with this script.

Edit: Since the float datatype wasn't large enough, I modified the script to use an array as a counter, then output the result as a joined string. As long as fso.OpenTextFile().SkipLine() doesn't choke (for which there is no solution but to try a different language, maybe Python or Perl?), this should work, and hopefully it won't be too expensive a hit on performance. I tested it on a 4.3 gig ISO file and it took about 8 minutes.

@if (@a==@b) @end /*

:: countlines.bat
:: usage: countlines.bat filetocount.log

:: batch portion does nothing remarkable
:: but relaunches itself with jscript interpreter
@echo off

cscript /nologo /e:jscript "%~f0" "%~f1"

goto :EOF

:: end of batch / begin JScript */

var fso, f, file = WSH.Arguments(0), longVal = [0],
ForReading = 1, ForWriting = 2, b4 = new Date();

// inherits global array longVal[]
// increments each element from right to left
function inc() {
    for (var i=longVal.length - 1; i>=0; i--) {
        if (++longVal[i] == 10) {
            longVal[i] = 0;
            if (!i) {
                longVal.splice(0, 0, 0);
                i++;
            }
            continue;
        }
        else break;
    }
}

fso = new ActiveXObject("Scripting.FileSystemObject");
f = fso.OpenTextFile(file, ForReading);
while (!f.AtEndOfStream) {
    f.SkipLine();
    inc();
}
WSH.Echo(longVal.join(''));
f.Close();

var stopwatch = 'Line count completed in ' + ((new Date() - b4) / 1000.0) + 's';
WSH.StdErr.WriteLine(stopwatch);
3
On

Here's a bat file to count the lines. Yes, you're hitting a 32 bit int limit and the same thing would happen with set /a calculations... so some kind of division is certainly a good idea.

@echo off
setlocal ENABLEDELAYEDEXPANSION

set Singles=0
set Thousands=0
for /f "tokens=1,* delims=:" %%a in ('findstr /nr "^" "%1"') do (
    rem echo %%a   %%b
    set /a Singles+=1
    if !Singles! equ 1000 (
        set /a Thousands+=1
        set Singles=0
    )
)
set Singles=x000%Singles%
echo %Thousands%.%Singles:~-3% thousand lines

I included the rem line so that you can check the output from findstr if you need to.

--- ok so the bat file solution is pretty slow ---

Here's a vbs that might be as quick as you can get (maybe even close to or better than the findstr/find times?) To run: You can either just use: scriptname.vbs filename which will output the result to the screen or cscript -nologo scriptname.vbs filename to output at the command prompt.

Short summary of how this works. A textstream has a Line property which is a 32 bit signed int. As long as we detect each switch from + to - and - to +, we can count the total lines using the final file.Line property.

if WScript.Arguments.Count = 0 then
    WScript.Echo "Missing filename parameter"
    WScript.Quit
end if

Const ForReading = 1, ForWriting = 2, ForAppending = 8
Const bytesToSkip = 2000000000
Dim fso, MyFile, count, direction, position, totalSize
Set fso = CreateObject("Scripting.FileSystemObject")

' Open the file to count.
Set MyFile = fso.OpenTextFile(WScript.Arguments(0), ForReading)

totalSize = fso.GetFile(WScript.Arguments(0)).Size
count=0
direction=1
position=0

' Jump through the file in big blocks
Do While position < totalSize
    MyFile.Skip(bytesToSkip) 'If going past of the end of the file, this doesn't error the first time
    position = position + bytesToSkip

    if MyFile.Line = 0 and direction=-1 Then
        ' Have wrapped back to 0
        count=count+1
        direction=1
    elseif direction <> abs(MyFile.Line)/MyFile.Line Then
        'Count each change from + to - or - to +
        count=count+1
        direction=direction*(-1)
    end if
    REM WScript.Echo direction & "    " & position & "    " & MyFile.Line & "    " & count
Loop

' Do final calculations
if MyFile.Line = 0 Then
    Count = Count*(2^31)
elseif direction = 1 Then
    count=Count*(2^31) + MyFile.Line
elseif direction = -1 Then
    count=Count*(2^31) + (2^31 + MyFile.Line)
end if

MyFile.Close

WScript.Echo "Total Lines = " & count