Semmle QL: TaintTracking hasFlow() Problem with Sources that taint their Arguments

397 Views Asked by At

I want to do TaintTracking with functions that taint their arguments with userinput. Example:

fgets(buf, sizeof(buf), stdin); // buf is tainted
[...]
n = strlen(buf); // tainted argument to strlen
[...]
memcpy(somewhere, buf, n) // tainted call to memcpy

Semmle should be able to spot this with a Query like the following (just with fgets->strlen as example). I am borrowing code from SecurityOptions:

import cpp
import semmle.code.cpp.dataflow.TaintTracking

class IsTaintedArg extends string {
  IsTaintedArg() { this = "IsTaintedArg" }
  predicate userInputArgument(FunctionCall functionCall, int arg) {
    exists(string fname |
      functionCall.getTarget().hasGlobalName(fname) and
      exists(functionCall.getArgument(arg)) and (fname = "fgets" and arg = 0) // argument 0 of fgets is tainted
    )
  }

  predicate isUserInput(Expr expr, string cause) {
    exists(FunctionCall fc, int i |
      this.userInputArgument(fc, i) and
      expr = fc.getArgument(i) and
      cause = fc.getTarget().getName()
    )
  }
}

class TaintedFormatConfig extends TaintTracking::Configuration {
  TaintedFormatConfig() { this = "TaintedFormatConfig" }
  override predicate isSource(DataFlow::Node source) {
    exists (IsTaintedArg opts |
      opts.isUserInput(source.asExpr(), _)
    )
  }
  override predicate isSink(DataFlow::Node sink) { 
    exists (FunctionCall fc | sink.asExpr() = fc.getArgument(0) and fc.getTarget().hasName("strlen")) // give me all calls that land in strlen's first argument
  }
}

from TaintedFormatConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select sink, source

Yet it does not look like it is working.

When I just Query cfg.isSource() or cfg.isSink() however, both source and sink are recognized. But hasFlow() still returns nothing - although a path should definitely exist.

  • I am using libssh2 to test my findings, the example flow exists here.

  • My Query to test around is here.

Does anyone have any idea what I might be doing wrong in the Query above?

1

There are 1 best solutions below

3
On BEST ANSWER

The missing bit is in isSource, where the taint tracking starts from the 0'th argument to fgets. Using asExpr will describe the flow from that argument into fgets. What we want is the flow out of fgets through that argument. You'll get that by replacing asExpr with asDefiningArgument.

Here is a link to the results for your query where I've used both asExpr and asDefiningArgument in isSource. This means that if you extend isUserInput in the future, its expressions will be considered sources both as their value and as output arguments.

The new version of the query has 8 results, and some of them are hard to understand when you only see the source and the sink because they can be in different files. I've made a cleaned-up version of the query that

  • generates path explanations between sources and sinks (@kind path-problem),
  • removes the IsTaintedArg class wrapping of the helper predicates,
  • removes a few unused parameters and checks, and
  • adds the call to asDefiningArgument.

Here is the full query:

/**
 * @kind path-problem
 * @id taint-to-strlen
 */
import cpp
import semmle.code.cpp.dataflow.DataFlow
import DataFlow::PathGraph

predicate userInputArgument(FunctionCall functionCall, int arg) {
  functionCall.getTarget().hasGlobalName("fgets") and
  arg = 0 // argument 0 of fgets is tainted
}

predicate isUserInput(Expr expr) {
  exists(FunctionCall fc, int i |
    userInputArgument(fc, i) and
    expr = fc.getArgument(i)
  )
}

class TaintedFormatConfig extends DataFlow::Configuration {
  TaintedFormatConfig() { this = "TaintedFormatConfig" }
  override predicate isSource(DataFlow::Node source) {
    isUserInput(source.asExpr())
    or
    isUserInput(source.asDefiningArgument())
  }
  override predicate isSink(DataFlow::Node sink) { 
    exists (FunctionCall fc | sink.asExpr() = fc.getArgument(0) and fc.getTarget().hasName("strlen"))
  }
}

from TaintedFormatConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "Taint from fgets call in " + source.getNode().getFunction().getFile().getBaseName()

You can see the results at https://lgtm.com/query/4800800615370766111/.