Tokenizing a SIC Assembler source

1.8k Views Asked by At

I've pretty much finished coding a SIC assembler for my systems programming class but I'm stumped on the tokenizing part.

For example, take this line of source code:

The format (free format) is: {LABEL} OPCODE {OPERAND{,X}} {COMMENT}

The curls indicate that the field is optional.

Also, each field must be separated by at least one space or tab.

ENDFIL      LDA     EOF         COMMENT GOES HERE

The code above is a bit easier to organize but the following snippet is giving me difficulties.

        RSUB                COMMENT GOES HERE

My code will read in the first word of the comment as if it were an OPERAND.

Here is my code:

//tokenize line
    if(currentLine[0] != ' ' && currentLine[0] != '\t')
    {
        stringstream stream(currentLine);
        stream >> LABEL;
        stream >> OPCODE;
        stream >> OPERAND;
        stream.str("");


        if(LABEL.length() > 6 || isdigit(LABEL[0]) || !alphaNum(LABEL))
        {
            errors[1] = 1;
        }
        else if(LABEL.length() == currentLine.length())
        {
            justLabel = true;
            errors[6] = 1;
            return;
        }
    }
    else
    {
        stringstream stream(currentLine);
        stream >> OPCODE;
        stream >> OPERAND;
        stream.str("");
    }

My professor requires that the assembler be tested with two versions of the source code--one with errors and one without.

The RSUB OPCODE isn't dependent on an OPERAND so I understand that everything after the RSUB OPCODE can be considered a comment, but If the erroneous source code contains a value in the OPERAND field or if an OPCODE which is dependent on an OPERAND is missing the OPERAND value, how do I compensate for this? I need to flag these as errors and print out the erroneous OPERAND value (or lack thereof).

My question is: How do I prevent the comment portion of the code from being considered an OPERAND?

2

There are 2 best solutions below

3
On BEST ANSWER

In the assembly languages (as in other programming languages) that I've seen, there's a delimiter that marks a comment: for example a semicolon before the comment:

ENDFIL LDA EOF ;COMMENT GOES HERE
RSUB ;ANOTHER COMMENT GOES HERE

In your syntax however, can you tell whether something is a comment by the amount of whitespace which precedes it on the line, e.g. by the fact that there are two (not just one) whitespace events between the opcode and the comment?

{LABEL}<whitespace>OPCODE<whitespace>{OPERAND{,X}}<whitespace>{COMMENT}
2
On

How can you tell if text in a certain line is an operand or a comment? Is it based on the context? For example, if the OPCODE is "RSUB", then you would know that there is no OPERAND required? Then you should perform some magic on the OPERAND based on what OPCODE is read:

if (OPCODE == "RSUB") OPERAND.clear();