Hunting through code for potential security issues generally requires having a very in-depth understanding of the language being reviewed. Most competent code reviewers will have some kind of background or ability to write complex programs in their chosen lanuages.
However, the world is far from perfect and often you’ll find yourself hunting out bugs in a language which isn’t your first choice. Fortunately, a significant number of vulnerability classes are common across all languages and development frameworks. This blog post will give a brief run-down of things you should pay attention to.
Caveat: serious thorough code review requires highly in-depth understanding of a given programming language, and far more than grepping for snippets or functions to do a drive-by check of.
The basic idea of doing a code review against any arbitrary language is to quickly understand how common functionality is implemented in your given language under test and then go hunt it down.
For example, lets say you’re looking over a Python code base and you want to see whether any attacker-controlled data ends up being used in a command-line call which could result in command injection. What do you do? Do a quick google search for how to execute external commands in Python, then go ahead and search across your whole code base for that string.
You’re going to want to be able to search through vast amounts of text using whatever tools you’re most comfortable with. I personally recommend using a unix-esque operating system and do it all from the command line. But use whatever tooling or IDE you’re most comfortable with. It must be able to do search across multiple files.
I personally use:
- ag for text search -
apt-get install silversearcher-ag
- fzf for file search - https://github.com/junegunn/fzf
- ctags/cscope for C code navigation
- vim and VSCode for viewing files
- tmux for managing my various windows
An easy docker image with all tools is available here:
As you go, keep extensive notes on interesting files and interesting functions related to:
- Crypto operations
- Business Logic
- Data persistence
- Network IO
- File IO
- Attacker-controllable data
You’ll likely end up encountering too many new paths or points of interest to follow manually, so take notes and go back to look at interesting pieces of code.
Sometimes it’s reasonable to take notes inline in the code using comments, although be wary that you’ll end up invalidating code line numbers if you plan to refer to those at a later stage. You can also keep notes in a gDoc, markdown file, or whatever else makes sense to you.
Sources & Sinks
At a very high-level, large numbers of vulnerabilities come from an attacker being able to have their data cross a trust boundary and be accepted by an application, via one of multiple sources. A source is any place an application reads in untrusted data, whether it be from a file, a network connection, an environment variable, IPC, or any other trust boundary.
A sink on the other hand, is a location where the data is used in a potentially dangerous way by the application, in this context. Most of the searchable strings in this post are potentially dangerous sinks; library functions that are particularly susceptible to malicious manipulation if an attacker can take some form of control over the data being used by that function.
Whilst it is a very valid approach to search for sources in a code base and follow the data flow inward, in this post I’m taking an approach from the other end: find dangerous uses of attacker controlled data, and search backward to see if it is indeed attacker controllable.
The Basic Workflow
Create a list of potentially dangerous sinks in your target language
Using ag/grep, find every instance of every word on your list
do a quick analysis to see whether an attacker is likely to be able to influence the dangerous sink to achieve an undesirable outcome
The app is written in Python. We find a list of dangerous sinks such as
ag 'os\.exec\(', we find numerous instances of this code in the target application and walk through each example
Assessing each one, we may see different things:
- Things like
os.exec("touch /tmp/iamhere")are not interesting, as an attacker has no influence here
- Things like
os.exec('ls ' + request.GET.get('dir'))look to be immediately exploitable
- Most examples will involve a variable being passed into the sink, so your job will be to trace backwards to see where that variable originates from, and whether an attacker can influence it
- Things like
Operations on Files
Any place an application does file IO is a possible opportunity to read, write, or modify a file on the underlying operating systems disk. This can quickly lead to stealing SSH keys, overwriting config files or code, or inserting new users into a system in order to hijack the whole box. Anywhere files are read and written needs to be examined to see what influence the attacker has over the file contents or the path and filename.
- “how to open a file in language”
Executing a command or external binary on the underlying operating system is a well-known way of introducing arbitrary code execution to an application, and commonly if a developer does plenty of shelling out to the OS instead of using libraries, they’ve also done no filtering or poor filtering of paths and characters allowed into the shell or new process arguments.
XSS often isn’t the easiest vulnerability to find via code review, espcially not stored XSS where the sink and source can be in completely different components and be called from a persistence mechanism in a particular way. That said, there are some low-hanging targets to look at, such as functions which output directly to HTML without any filtering
- XSS In language/framework
- Write to http response in language/framework
- How to prevent encoding in language/framework
<?php echo GET['arg'] ?>
An attacker who can control significant parts of data used in an SQL query can often perform operations against the database outside what is intended. The general approach you’ll want to take here is to find all SQL queries, then verify they’re using parameterized queries and check that both they’re using them correctly, and the library they’re using to do the parameterization is sane (IE: they didn’t build it themselves, poorly)
- database query in language
Reflection allows a program to examine, introspect, and modify it’s own behaviour at runtime - a very powerful primitive for developer and attacker alike. If an attacker can control or influence arguments into reflection functions, they’re possibly going to be able to instantiate objects or call methods that aren’t intended to be exposed to end-users.
- reflection in language
Serialization & deserialization (also refered to as marshalling & unmarshalling) are the techniques use to take complex representation of programming constructs (typically objects) of complex format and convert them into a serialized form which can be sent across a network or saved to disk as a plain string of bytes and back again. Serialization is a huge problem because most languages can be coerced into running arbitrary code when deserializing data: it’s too hard to reliably defend against an attacker who is able to influence creation of objects.
- how to deserialize data in language
Same-Site Request Forgery is a vulnerability in which an attacker can coerce an application into making web requests on their behalf. This has a number of implications when the attacker can define a host and/or path which the system will make a request based on. At a high level, the concerns typically are:
- Using the system to do internal-network port scanning
- Calling cloud metadata services to retrieve sensitive information
- Calling internal applications to exfiltrate data or achieve exploitation
- Talking to internal services which rely on network-level trust
- HTTP request in language/framework
File include vulnerabilities occur when an application uses untrusted data to fetch and evaluate extra code which it wants to incorporate. This is most common when the application makes use of libraries and either references a local file, or even worse calls out to make a web request to a remote file.
The entirety of the C and C++ languages are full of gotcha’s relating to memory access, so listing out anything that could cause a problem would just entail writing everything out. A few of the more common suspects are listed below. For other languages, specific functions allow lower-level memory access and should be investigated to ensure they haven’t done anything dangerous.
- Allocate memory in language
- memory copy in langauge
- safely zeroing out memory in language
Crypto is a whole discussion and code review guide in itself, but at a minimum you want to check that they haven’t rolled-their-own or used any algorithms with known issues. The following high-level list of things should be investigated:
- Hash functions
- Timing side-channels (for example, security-sensitive comparisons)
- Key randomness
- Certificate validation
- “how to disable certificate checking in language/framework”
Random((not crypto secure)
Other points of interest
The following commonly show up in comments or are general terms that probably warrant some further investigation:
AKIA(prefix for Amazon Access Keys)