DESIGN: - output of analysis is a full-program dataflow graph - nodes are atoms annotated with properties (usage notes) - edges are directed, annotated with - op sequences indicating how the data is transformed or related at the two locations (can be concatenated, simplified, inverted) - control flow indicating how to treat this edge with respect to context sensitivity - graph is traversed through a "relative atom graph" - initialized with respect to a single atom - each node of the RAG corresponds to a single node of the atom graph plus context sensitivity info - nodes are annotated with the relationship between the seed atom and the target atom (the sum of the ops along the path, simplified) - in case of multiple paths to a single node, store all the op-paths. - these paths can traverse reverse edges, the ops are inverted in this case - control flow info prevents meaningless paths from being analyzed - op sequences can be used as functions to transform the node properties of the target atom to be "about" the seed atom - this graph is technically infinite, so it starts off with a single node and a "frontier" of edges that have yet to be explored - methods for expanding the graph from each edge on the frontier - the complexity of this graph is managed through a hierarchical graph, a view on the RAG - arbitrary collapsing of groups of atoms into single "group" nodes, recursively - efficient algorithms passes: + per-block live data flow and manipulations + inter- and intra-block passive register data flow, data flowing across callsites - passive memory flow - given a store, track where live references to it propagate and are loaded - constant propagation - need to re-evaluate ops which are marked as variable-dependant? ideas for reducing chaos: - only allow paths which follow only forward or only backward edges (add edges from referents to references) - discard-with-connection nodes which have no interesting properties - group atoms which refer to the same variable - use sfdp on large graphs - set initial positions based on codeloc position in disassembly graph or something next in the queue: - memcpy procedure - recursive mark layout algorithm - search for node text in current view - start from disassembly atoms - start from function argument atoms - navigate to decompilation TODO - Attempt to expand graph automatically by refusing to recurse - globals... Research directions - Type confusion as weird structures - Pick more applications ReMind Why Johnny Can't reveres malware https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7546501 Dan Voytika Hackers vs testers something ghidra - Michell Mazerik x3 https://dl.acm.org/doi/abs/10.1145/2896499 applications: Type inference decompilation taint tracking targets: rust golang c++ virtual inheritance