67 lines
2.8 KiB
Plaintext
67 lines
2.8 KiB
Plaintext
DESIGN:
|
|
- output of analysis is a full-program dataflow graph
|
|
- nodes are atoms annotated with properties (usage notes)
|
|
- edges are directed, annotated with
|
|
- op sequences indicating how the data is transformed or related at the two locations (can be concatenated, simplified, inverted)
|
|
- control flow indicating how to treat this edge with respect to context sensitivity
|
|
- graph is traversed through a "relative atom graph"
|
|
- initialized with respect to a single atom
|
|
- each node of the RAG corresponds to a single node of the atom graph plus context sensitivity info
|
|
- nodes are annotated with the relationship between the seed atom and the target atom (the sum of the ops along the path, simplified)
|
|
- in case of multiple paths to a single node, store all the op-paths.
|
|
- these paths can traverse reverse edges, the ops are inverted in this case
|
|
- control flow info prevents meaningless paths from being analyzed
|
|
- op sequences can be used as functions to transform the node properties of the target atom to be "about" the seed atom
|
|
- this graph is technically infinite, so it starts off with a single node and a "frontier" of edges that have yet to be explored
|
|
- methods for expanding the graph from each edge on the frontier
|
|
- the complexity of this graph is managed through a hierarchical graph, a view on the RAG
|
|
- arbitrary collapsing of groups of atoms into single "group" nodes, recursively
|
|
- efficient algorithms
|
|
|
|
passes:
|
|
+ per-block live data flow and manipulations
|
|
+ inter- and intra-block passive register data flow, data flowing across callsites
|
|
- passive memory flow - given a store, track where live references to it propagate and are loaded
|
|
- constant propagation - need to re-evaluate ops which are marked as variable-dependant?
|
|
|
|
ideas for reducing chaos:
|
|
- only allow paths which follow only forward or only backward edges (add edges from referents to references)
|
|
- discard-with-connection nodes which have no interesting properties
|
|
- group atoms which refer to the same variable
|
|
- use sfdp on large graphs
|
|
- set initial positions based on codeloc position in disassembly graph or something
|
|
|
|
next in the queue:
|
|
- memcpy procedure
|
|
- recursive mark layout algorithm
|
|
- search for node text in current view
|
|
- start from disassembly atoms
|
|
- start from function argument atoms
|
|
- navigate to decompilation
|
|
|
|
TODO
|
|
- Attempt to expand graph automatically by refusing to recurse
|
|
- globals...
|
|
|
|
Research directions
|
|
- Type confusion as weird structures
|
|
- Pick more applications
|
|
|
|
ReMind
|
|
Why Johnny Can't reveres malware https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7546501
|
|
Dan Voytika
|
|
Hackers vs testers
|
|
something ghidra - Michell Mazerik x3
|
|
|
|
https://dl.acm.org/doi/abs/10.1145/2896499
|
|
|
|
applications:
|
|
Type inference
|
|
decompilation
|
|
taint tracking
|
|
|
|
targets:
|
|
rust
|
|
golang
|
|
c++ virtual inheritance
|