beginning

Why do I need to learn CodeQL? Emm… It’s all about SRP.

CodeQL is a query tool that powers security researchers, which consists of code scanning, vulnerbilities discovering, etc.

As for the reason, one of the requirements of SRP is to use CodeQL to develop a vulnerbilities scanning program. Personnally I think it’s a great challenge for me as I am not skilled at this field before. But I will try my best.

Configuration

I set up the environment on VSCode, Arch Linux, and the general configuration is simple:

Download CodeQL extension in VSCode.
Let the extension automatically download CodeQL CLI.

According to the official documentation, it is not recommended to manually set the path of executable in extension, because sometimes some nasty errors would occur.(but it seems nothing special?)

If you need to use CodeQL CLI outside VSCode, maybe you should manually install another one.

First Demo

There is a repository called vscode-codeql-starter in GitHub, which you can use to run your first query in CodeQL locally.

What’s more, there is a learning repo available in GitHub called codeql-uboot, which aims to use codeql to find 9 vulnerbilities about memcpy.

Hello World

1	select "hello world"

Basic Grammar

Similar to SQL, CodeQL has three essential keywords: from, where, select.

Let’s look through examples from codeql-uboot, whose code language is c++.

Find Function Definitions

import cpp

from Function f
where f.getName() = "memcpy"
select f, "a function named memcpy"

CodeQL has lots of useful APIs, which can be seen conveniently when using auto-completion in VSCode.

How to start a query? Ctrl+Shift+P, type codeql, select Run Query. After a while you can see the result in your right hand side.

Find Macro Definitions

1
2
3

from Macro m
where m.getName() in ["ntohs", "ntohl", "ntohll"]
select

Find Function Calls

1
2
3

from FunctionCall fc
where fc.getTarget().getName() = "memcpy"
select fc

Find Macro Invocations

1
2
3

from MacroInvocation inv
where inv.getMacro().getName().regexpMatch("ntoh(s|l|ll)")
select inv

Find Macro Expressions

1
2
3

from MacroInvocation inv
where inv.getMacro().getName().regexpMatch("ntoh(s|l|ll)")
select inv.getExpr()

Create A Class

Similar to lots of other programming languages, CodeQL has the feature of object-oriented programming.

class NetworkByteSwap extends Expr {
  NetworkByteSwap() {
    exists (MacroInvocation inv | 
      inv.getMacro().getName() in ["ntohs", "ntohl", "ntohll"] and 
      this = inv.getExpr()
    )
  }
}

from NetworkByteSwap n
select n, "Network byte swap"

Here exists can be understood as an existential quantifier in Discrete Mathematics, where we first define the variables then list all propositions the variables should satisfy, with a separator |.

Data Flow in CodeQL

First we introduce two important terminologies: source and sink.

Source is the function that is to blame when vulnerbilities occur, and sink refers to dangerous factors(e.g. pointer, parameter, etc.) in a specific function. Code is vulnerable if tainted data flows from a source to a sink.

CodeQL has implemented two kinds of data flows.

Local data flow. In this condition, tainted data will be tracked only inside a function scope.
Global data flow. In this condition, tainted data can be tracked globally, not limited in a function scope.

Generally, for source and sink, we need to find the characteristics they have and write predicates for them respectively.

import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.dataflow.TaintTracking

class NetworkByteSwap extends Expr {
  NetworkByteSwap() {
    exists (MacroInvocation inv | 
      inv.getMacro().getName() in ["ntohs", "ntohl", "ntohll"] and 
      this = inv.getExpr()
    )
  }
}

class Config extends TaintTracking::Configuration {
  Config() {
    this = "Network2MemFuncLength"
  }

  override predicate isSource(DataFlow::Node source) {
    source.asExpr() instanceof NetworkByteSwap
  }
  override predicate isSink(DataFlow::Node sink) {
    exists (FunctionCall fc |
      sink.asExpr() = fc.getArgument(2) and 
      fc.getTarget().hasQualifiedName("memcpy")
    )
  }
}

from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "Network byte swap flows to memory"

We can inherit TaintTracking::Config and override some specific predicates to customize our data flow. Respectively, inSource and isSink restrict conditions that a DataFlow::Node can be regarded as source and sink. And if we need to remove some false-positive queries, override predicate isSanitizer to make our queries more accurately.

Garen Wang's Blog

CodeQL Learning Notes