Semgrep determine value of function parameter

272 Views Asked by At

I have this Semgrep rule to detect:

any usage of:

new $X.Classic(..., {..., classicName: $Y, ...})

where: classicName's value matches with this regex: .*hello.*

rules:
  - id: tyq
    languages:
      - ts
    message: Hello
    pattern-either:
      - patterns:
          - pattern-either:
              - pattern-inside: |
                  import * as $X from '@abc/abc';
                  ...
          - pattern-either:
              - patterns:
                  - pattern: |
                      new $X.Classic(..., {..., classicName: $Y, ...})
                  - focus-metavariable: $Y
                  - metavariable-comparison:
                      metavariable: $Y
                      comparison: re.match(".*hello.*", $Y)

and this works if $Y is:

  1. hardcoded string: (classicName: "hello")
  2. a variable assignment within scope

ex:

   #1 var varName = "hello";

   #2 new $X.Classic(..., {..., classicName: varName, ...})

or if #2 is within some function and #1 is in root of file

but DOES-NOT work if:

the value of classicName is coming from a function parameter.

like:

class Class {
  constructor() {
    super();
    const classicName = "hello";
    this.createClassic(classicName);
  }

  private createClassic(classicName: string) {
    new abc.Classic(this, 'MyBucket', {
      //ruleid: tyq
      classicName: classicName,
    });
  }
}

Any way I can catch the values of parameters of function. (As in this case parameter value is resolvable)

2

There are 2 best solutions below

7
On BEST ANSWER

If you want to track data across functions, you likely want to leverage taint mode. Taint mode functions by determining if any data from a "source" reaches a "sink". The idea would be to mark any string matching you regex as a source, and then any instance of the usage you want to detect as a sink. The rule below implements this.

rules:
  - id: tyq
    languages:
      - ts
    message: Hello
    mode: taint
    pattern-sources:
      - patterns:
          - pattern: $X
          - metavariable-regex:
              metavariable: $X
              regex: .*hello.*
    pattern-sinks:
      - patterns:
          - pattern-inside: |
              import * as $X from '@abc/abc';
              ...
          - pattern: "new $X.Classic(..., {..., classicName: $Y, ...})"
          - focus-metavariable: $Y
    severity: WARNING

Playground

The focus-metavariable in the sink restricts it such that the source has to reach that specific point, rather than just anywhere in the sink pattern. If we didn't have that, we'd also match instances where our source reached say, the first parameter. I've also swapped out the metavariable-comparison for metavariable-regex as a stylistic choice. Note also that $X in the sources and $X in the sinks are independent. If you wanted to write a rule where they had to be the same thing (as they would if they were combined under a patterns) you could use the unify_taint_mvars option, but that isn't needed here.

One downside to this rule: it is overbroad, since if anything marked as a source is used to compute the value $Y matches, the rule with match. You could use sanitizers or tweak some other options to cut down on the noise.

Here's an example with an additional finding which I assume is an FP (depends on if you want to forbid this, or if you only care about the final value which reaches the object).

There are some limitations to taint analysis in the OSS version, so you'll have to use the pro engine if you want interprocedural or interfile analysis.

4
On

Try and modify your Semgrep rule to trace variable values when they are passed as arguments in function calls. That will require adding a new pattern to handle function parameter tracing.

I tried:

rules:
  - id: tyq
    languages:
      - ts
    message: Hello
    pattern-either:
      - patterns:
          - pattern-inside: |
              import * as $X from '@abc/abc';
              ...
          - pattern-either:
              - patterns:
                  - pattern: |
                      new $X.Classic(..., {..., classicName: $Y, ...})
                  - focus-metavariable: $Y
                  - metavariable-comparison:
                      metavariable: $Y
                      comparison: re.match(".*hello.*", $Y)
      - patterns:
          - pattern-inside: |
              class $CLASS {
                ...
                private $METHOD($PARAM: string) {
                  ...
                }
                ...
              }
          - pattern: |
              new $X.Classic(..., {..., classicName: $PARAM, ...})
          - focus-metavariable: $PARAM
          - metavariable-comparison:
              metavariable: $PARAM
              comparison: re.match(".*hello.*", $PARAM)

You now have an added pattern block, which targets cases where classicName is a parameter of a method within a class.
It searches for the new abc.Classic call inside a method where classicName is passed as an argument.
The focus-metavariable and metavariable-comparison are used to focus on the parameter and apply the regex comparison.

But, it does not seem to work.
Semgrep operates primarily by pattern matching on the code's syntax tree, rather than executing or deeply analyzing the runtime behavior of the code. That means that while it is excellent for detecting certain types of code patterns or antipatterns, it has limitations in tracing the flow of variable values, especially when they are dynamically determined, such as being passed through function parameters.

Given this limitation, that Semgrep rule may not be able to trace the value of classicName when it is passed as a parameter through multiple function calls, especially if the value is set in a different scope or is dynamically generated.

A potential solution, although not foolproof, is to look for direct assignments or initialization of the parameter within the same function scope. However, this will not catch cases where the value is set elsewhere or passed down through multiple layers of function calls.

For the specific case you mentioned, where the value of classicName is coming from a function parameter, a possible workaround is to broaden the scope of your pattern to catch any instance of classicName being passed as a parameter, and then manually review these instances. That approach increases the likelihood of false positives, but can be useful in scenarios where thoroughness is more critical than precision.

Playground

rules:
  - id: tyq
    languages:
      - ts
    message: Hello
    pattern-either:
      - patterns:
          - pattern-inside: |
              import * as $X from '@abc/abc';
              ...
          - pattern-either:
              - patterns:
                  - pattern: |
                      new $X.Classic(..., {..., classicName: $Y, ...})
                  - focus-metavariable: $Y
                  - metavariable-comparison:
                      metavariable: $Y
                      comparison: re.match(".*hello.*", $Y)
      - patterns:
          - pattern-inside: |
              class $CLASS {
                ...
              }
          - pattern: |
              new $X.Classic(..., {..., classicName: $PARAM, ...})
    severity: WARNING