My personal preference is to use IDA’s Graph mode when doing the majority of my reverse engineering. It provides a graphical representation of the control flow graph and gives visual cues about the structure of the current function that helps me better understand the disassembly.
Graph mode is great until the function becomes complex. IDA is often forced to place adjacent nodes relatively far apart, or have edges in the graph cross and have complex paths. Using the overview graph becomes extremely difficult due to the density of nodes and edges, like in Figure 1.
Figure 1: An annoying function
IDA has a built-in mechanism to help simplify graphs: creating groups of nodes, which replaces all of the selected nodes with a new group node representative. This is done by selecting one or more nodes, right-clicking, and selecting “Group nodes”, shown in Figure 2. Doing this manually is certainly possible, but it becomes tedious to follow edges in complex graphs and correctly select all of the relevant nodes without missing any, and without making mistakes.
Figure 2: Manual group creation
The SimplifyGraph IDA Pro plugin we’re releasing is built to automate IDA’s node grouping capability. The plugin is source-compatible with the legacy IDA SDK in 6.95, and has been ported to the new SDK for IDA 7.0. Pre-built binaries for both are available on the Release tab.
The plugin has several parts, introduced below.
Unique-Reachable nodes are all nodes reachable in the graph from a given start node and that are not reachable from any nodes not currently in the UR set. For example in Figure 3, all of the unique-reachable nodes starting at the green node are highlighted in blue. The grey node is reachable from the green node, but because it is reachable from other nodes not in the current UR set it is pruned prior to group creation.
Figure 3: Example Unique Reachable selection
The plugin allows you to easily create a new group based on the UR definition. Select a node in IDA's graph view to be the start of the reachable search. Right click and select "SimplifyGraph --> Create unique-reachable group". The plugin performs a graph traversal starting at this node, identifies all reachable nodes, and prunes any nodes (and their reachable nodes) that have predecessor nodes not in the current set. It then prompts you for the node text to appear in the new group node.
If you select more than one node (by holding the Ctrl key when selecting nodes) for the UR algorithm, each additional node acts as a sentry node. Sentry nodes will not be included in the new group, and they halt the graph traversal when searching for reachable nodes. For example in Figure 4, selecting the green node first treats it as the starting node, and selecting the red node second treats it as a sentry node. Running the “Create unique-reachable group” plugin option creates a new group made of the green node and all blue nodes. This can be useful when you are done analyzing a subset of the current graph, and wish to hide the details behind a group node so you can concentrate on the rest of the graph.
Figure 4: Unique reachable with sentry
The UR algorithm operates on the currently visible graph, meaning that you can run the UR algorithm repeatedly and nest groups.
Switch statements implemented as jump tables appear in the graph as nodes with a large fan-out, as shown in Figure 5. The SimplifyGraph plugin detects when the currently selected node has more than two successor nodes and adds a right-click menu option “SimplifyGraph --> Create switch case subgraphs”. Selecting this runs the Unique-Reachable algorithm on each separate case branch and automatically uses IDA’s branch label as the group node text.
Figure 5: Switch jumptable use
Figure 6 shows a before and after graph overview of the same function when the switch-case grouping is run.
Figure 6: Before and after of switch statement groupings
Running Edit --> Plugins --> SimplifyGraph brings up a new chooser named "SimplifyGraph - Isolated subgraphs" that begins showing what I call isolated subgraphs of the current graph. A full definition appears later in the appendix including how these are calculated, but the gist is that an isolated subgraph in a directed graph is a subset of nodes and edges such that there is a single entrance node, a single exit node, and none of the nodes (other than the subgraph entry node) are reachable by nodes not in the subgraph.
Finding isolated subgraphs was originally researched to help automatically identify inline functions. It does this, but it turns out that this graph construct occurs naturally in code without inline functions. This isn’t a bad thing as it shows a natural grouping of nodes that could be a good candidate to group to help simplify the overall graph and make analysis easier.
Once the chooser is active, you can double click (or press Enter) on a row in the chooser to highlight the nodes that make up the subgraph. You can create a group for an isolated subgraph by doing one of:
Doing either of these prompts you for text for the new graph node to create. If you manually create/delete groups using IDA you may need to refresh the chooser's knowledge of the current function groups (right-click and select "Refresh groups" in the chooser). You can right click in the chooser and select "Clear highlights" to remove the current highlights. As you navigate to new functions the chooser updates to show isolated subgraphs in the current function. Closing the chooser removes any active highlights. Any custom colors you applied prior to running the plugin are preserved and reapplied when the current highlights are removed.
Isolated subgraph calculations operates on the original control flow graph, so isolated subgroups can't be nested. As you create groups, rows in the chooser turn red indicating a group already exists, or can't be created because there is an overlap with an existing group.
Another note: this calculation does not currently work on functions that do not return (those with an infinite loop). See appendix for details.
Creating groups to simplify the overall control flow graph is nice, but it doesn’t help understand the details of a group that you create. To assist with this, the last feature of the plugin helps view groups in “isolation”. Right clicking on a collapsed group node, or a node that that belongs to an uncollapsed group (as highlighted by IDA in yellow), brings up the plugin option “Complement & expand group” and “Complement group”, respectively. When this runs the plugin creates a group of all nodes other than the group you’re interested in. This has the effect of hiding all graph nodes that you aren’t currently examining and allows you to better focus on analysis of the current group. As you can see, we’re abusing group creation a bit so that we can avoid creating a custom graph viewer, and instead stay within the built-in IDA graph disassembly view which allows us to continue to markup the disassembly as you’re used to.
Complementing the graph gives you view like in Figure 7, where the entire graph is grouped into a node named “Complement of group X”. When you’re done analyzing the current group, right click on the complement node and select IDA’s “Ungroup nodes” command.
Figure 7: Group complement
As an example that exercises the plugin, let’s revisit the function in Figure 1. This is a large command-and-control dispatch function for a piece of malware. It contains a large if-else-if series of inlined strcmp comparisons that branch to the logic for each command when the input string matches the expected command.
Figure 8: Grouped strcmp
Figure 9: Grouped command logic
Figure 10: Group complement
You can tweak some of the configuration by entering data in a file named %IDAUSR%/SimplifyGraph.cfg, where %IDAUSR% is typically %APPDATA%/Hex-Rays/IDA Pro/ unless explicitly set to something else. All of the config applies to the isolated subgraph component. Options:
Example SimplifyGraph.cfg contents:
"MINIMUM_SUBGRAPH_NODE_COUNT"=5 "MAXIMUM_SUBGRAPH_NODE_PERCENTAGE"=75 "SUBGRAPH_HIGHLIGHT_COLOR"=0x00aa1111
I came across semi-related work while working on this: GraphSlick from the 2014 Hex-Rays contest (https://www.hex-rays.com/contests/2014/index.shtml and https://github.com/lallousx86/GraphSlick). That plugin had different goals to automatically identifying (nearly) identical inline functions via CFG and basic block analysis, and patching the program to force mock function calls to the explicit function. It had a separate viewer to present information to the user.
SimplifyGraph is focused on automating tasks when doing manual reverse engineering (group creation) to reduce the complexity of disassembly in graph mode. Future work may incorporate the same prime-products calculations to help automatically find identical isolated subgraphs.
Prebuilt Windows binaries are available from the Releases tab of the GitHub project page. The zip files contains both IDA 32 and IDA 64 plugins for each of the new IDA 7.0 SDK and for the legacy IDA 6.95 SDK. Copy the two plugins for your version of IDA to the %IDADIR%\plugins directory.
This plugin & related files were built using Visual Studio 2013 Update 5.
Environment Variables Referenced by project:
libpaths beneath it.
libpaths beneath it.
libspaths beneath it.
The easiest way is to use the Microsoft command-line build tools:
msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA70_32 /property:Platform=x64 msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA70_64 /property:Platform=x64
msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA695_32 /property:Platform=Win32 msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA695_64 /property:Platform=Win32
This plugin & related files have been built using GCC 6.3.0 and GCC 7.2.0. For x86_64 Linux you must install the multilib GCC packages.
Environment Variables Referenced by project:
libpaths beneath it.
pluginspath beneath it.
Building and installing the plugin are done using GNU make:
IDA_SDK=/home/user/path/to/IdaSdk make all IDA_DIR=/home/user/path/to/Ida6.95 make install
Finding isolated subgraphs relies on calculating the immediate dominator and immediate post-dominator trees for a given function graph. The following is important to know:
The plugin calculates the immediate dominator tree and immediate post-dominator tree of the function control flow graph and looks for the situations where the (idom[i] == j) and (ipdom[j] == i). This means all paths from the function start to node i must go through node j, and all paths from j to the function terminal must go through i. A candidate isolated subgraph thus starts at node j and ends at node i.
For each candidate isolated subgraph, the plugin further verifies only the entry node has predecessor nodes not in the candidate subgraph. The plugin also filters out candidate subgraphs by making sure they have a minimum node count and cover a maximum percentage of nodes (see MINIMUM_SUBGRAPH_NODE_COUNT and MAXIMUM_SUBGRAPH_NODE_PERCENTAGE in the config section).
One complication is that functions often have more than one terminal node – programmers can arbitrarily return from the current function at any point. The immediate post-dominator tree is calculated for every terminal node, and any inconsistencies are marked as indeterminate and are not possible candidates for use. Functions with infinite loops do not have terminal nodes, and are not currently handled.
For a simple example consider the graph in Figure 11.
Figure 11: Example graph
It has the following immediate dominator tree:
It has the following immediate post-dominator tree:
Looking for pairs of (idom[i] == j) and (ipdom[j] == i) gives the following: (0, 8) (1, 3) (3, 6) (6,7)
Figure 12: Example graph with isolated subgraph