Can't count '|' symbols in a .c file

83 Views Asked by At

Basically I have to write a program that counts all kinds of different symbols in a .c file. I got it to work with all of the needed symbols except the vertical line '|'. For some reason it just won't count them.

Here's the method I'm using:

int countGreaterLesserEquals(char filename[])
{
    FILE *fp = fopen(filename,"r");
    FILE *f;
    int temp = 0; // ASCII code of the character
    int capital = 0;
    int lesser = 0;
    int numbers = 0;
    int comments = 0;
    int lines = 0;
    int spc = 0;

    if (fp == NULL) {
        printf("File is invalid\\empty.\n");
        return 0;
    }

    while ((temp = fgetc(fp)) != EOF) {

        if (temp >= 'a' && temp <= 'z') {
            capital++;
        }
        else if (temp >= 'A' && temp <= 'Z') {
            lesser++;
        }
        else if( temp == '/') temp = fgetc(fp); {
            if(temp == '/')
                comments++;             
        }

        if (temp >= '0' && temp <= '9') {
            numbers++;
        }
        if (temp == '|') {
            spc++;
        }
        if (temp == '\n') {
            lines++;
        }
    }
}
2

There are 2 best solutions below

0
On

On this line:

else if( temp == '/') temp = fgetc(fp); {

I believe you have a misplaced {. As I understand it should come before temp = fgetc(fp);..

You can easily avoid such an errors if following coding style guidelines placing each expression on it's own line and indenting the code properly.

Update: And this fgetc is a corner case. What if you read past EOF here? You are not checking this error.

11
On

Firstly, some compiler warnings:

  • 'f' : unreferenced local variable
  • not all control paths return a value

So, f can be removed, and the function should return a value on success too. It's always a good idea to set compiler warnings at the highest level.

Then, there is a problem with:

else if( temp == '/') temp = fgetc(fp); {
    if(temp == '/')
        comments++;             
}

Check the ; at the end of the else. This means the block following it, is always executed. Also, for this fgetc() there is no check for EOF or an error.

Also, if temp is a /, but the following character is not, it will be skipped, so we need to put the character back into the stream (easiest solution in this case).

Here is a full example:

int countGreaterLesserEquals(char filename[])
{
    FILE *fp = fopen(filename, "r");
    int temp     = 0; // ASCII code of the character
    int capital  = 0;
    int lesser   = 0;
    int numbers  = 0;
    int comments = 0;
    int lines    = 0;
    int spc      = 0;

    if (fp == NULL) {
        printf("File is invalid\\empty.\n");
        return 0;
    }

    while ((temp = fgetc(fp)) != EOF) {

        // check characters - check most common first
        if      (temp >= 'a' && temp <= 'z') lesser++;
        else if (temp >= 'A' && temp <= 'Z') capital++;
        else if (temp >= '0' && temp <= '9') numbers++;
        else if (temp == '|')                spc++;
        else if (temp == '\n')               lines++;
        else if( temp == '/')
            if ((temp = fgetc(fp)) == EOF)
                break; // handle error/eof
            else
                if(temp == '/')              comments++;
                else ungetc(temp, fp); // put character back into the stream
    }

    fclose (fp); // close as soon as possible

    printf("capital:  %d\nlesser:   %d\ncomments: %d\n"
           "numbers:  %d\nspc:      %d\nlines:    %d\n",
           capital, lesser, comments, numbers, spc, lines
    );

    return 1;
}

While it is usually recommended to put if statements inside curly braces, I think in this case we can place them on the same line for clarity.

Each if can be preceded with an else in this case. That way the program doesn't have to check the remaining cases when one is already found. The checks for the most common characters are best placed first for the same reason (but that was the case).

As an alternative you could use islower(temp), isupper(temp) and isdigit(temp) for the first three cases.


Performance:

For the sake of completeness: while this is probably an exercise on small files, for larger files the data should be read in buffers for better performance (or even using memory mapping on the file).

Update, @SteveSummit's comment on fgetc performance:

Good answer, but I disagree with your note about performance at the end. fgetc already is buffered! So the performance of straightforward code like this should be fine even for large inputs; there's typically no need to complicate the code due to concerns about "efficiency".

Whilst this comment seemed to be valid at first, I really wanted to know what the real difference in performance would be (since I never use fgetc I hadn't tested this before), so I wrote a little test program:

Open a large file and sum every byte into a uint32_t, which is comparable to scanning for certain chars as above. Data was already cached by the OS disk cache (since we're testing performance of the functions/scans, not the reading speed of the hard disk). While the example code above was most likely for small files, I thought I might put the test results for larger files here as well.

Those were the average results:

- using fgetc                                        : 8770
- using a buffer and scan the chars using a pointer  :  188
- use memory mapping and scan chars using a pointer  :  118

Now, I was pretty sure using buffers and memory mapping would be faster (I use those all the time for larger data), the difference in speed is even bigger than expected. Ok, there might be some possible optimizations for fgetc, but even if those would double the speed, the difference would still be high.

Bottom line: Yes, it is worth the effort to optimize this for larger files. For example, if processing the data of a file takes 1 second with buffers/mmap, it would take over a minute with fgetc!