04 — July 2026
Can you train AI to invert AES?
At the core of every https browser session, every wifi connection with WPA2/3, every hardware level device encryption module, sits the AES algorithm. This makes inverting the algorithm - the process of finding the private key or the clear text - one of the most simultaneously dangerous and valuable algorithmic problems today.
As AI eats software, I think it's possible that AI could solve the problem of AES Inversion. Of course, I dont know for sure, but this post lays out the argument for how it might happen. If you want to try and train a model to do what Im talking about, check out the RL env in this github repo.
Why might AES be vulnerable?
In order for a model to be able to solve AES you have to fight two problems - the sparsity of the reward, and complexity theory. Obviously one of these is more fundamental than the other, while a clever researcher can find intermediate rewards for almost any problem, superintelligence doesn't change the mathematical constraints imposed on us by complexity theory. I will address the sparse reward problem in the next section, but for now let us turn our attention to complexity theory.
I wont explain the basics of AES in the post, but if you want a good introduction to it, I recommend this one on the Braincoke blog. The important thing to understand about AES is that it's a substitution-permutation cipher, which means that it works by taking a block of bytes and permuting them according to a set of functions. The goal is to create a something that behaves like a one-way function, which are a hypothesized set of functions that are easy to compute but hard to invert - specifically any probabilistic polynomial time algorithm fails to invert them with probability $p$. They are ideally injections so that if you have a quick algorithm for inversion, you can recover the original text. This means that AES is built from many rounds of relatively simple permutations parameterized by the key(s).
Inverting AES involves inverting a large number of composed permutations and therefore is naturally a combinatorial problem. Now no matter how superintelligence pans out in the long run, no superintelligence can bypass mathematics or complexity theory. For us to believe that AES is efficiently invertible in the first place, I have to make an argument as to why I believe such an efficient inversion likely exists.
I will start by stating that AES can be stated as a satisfiability problem.
Choosing a particular mode of AES
Because AES can actually be run in different modes, we chose a concrete target of AES-XTS, which the mode used for full-disk encryption. It uses two independent AES-256 keys: $K_1$ encrypts the data and $K_2$ derives a per-block tweak. For the block at sector number $s$ and block index $j$. Here $\oplus$ is simply XOR. $\otimes$ is elementwise multiplication in a particular finite field. We first set up our per-block tweak:
$$T_0 = \mathrm{AES}_{K_2}\!\big(\mathrm{LE}_{128}(s)\big), \qquad T_j = T_0 \otimes \alpha^{\,j} \ \text{ in } \mathrm{GF}(2^{128}),$$
And using the per-block tweak we can encrypt our text:$$C = \mathrm{AES}_{K_1}\!\big(P \oplus T_j\big) \oplus T_j,$$
where $\mathrm{LE}_{128}(s)$ is the sector number as a little-endian 128-bit block and $\alpha$ is a multiplicative generator in $\mathrm{GF}(2^{128})$. The data path is ordinary AES-256: an initial AddRoundKey, then $R-1$ rounds of SubBytes, ShiftRows, MixColumns and AddRoundKey, then a final round without MixColumns.
Inversion is the reverse. We observe the ciphertext $C$ and the public indices $s$ and $j$, and we want the plaintext $P$ (and, implicitly, the keys). To attack this with a solver we first have to write the cipher down as a concrete object the solver can pick apart, so before I state the constraints it's worth laying out the model itself.
AES as a circuit
We compile a window of XTS blocks into one flat circuit made of three kinds of record:
- Values are the byte-sized quantities in play. Some are inputs we search over (the plaintext and the key bytes), some are constants (the observed ciphertext), and some are internal wires - the intermediate bytes that appear part-way through an AES computation.
- Ops are the primitive operations that compute one value from others: an S-box lookup, a MixColumns byte, an XOR, the "multiply by $x$" step of the tweak chain, a round-key byte from the key schedule, and so on. The ops are just AES itself, chopped into individual steps.
- Constraints are the predicates that ought to hold at a solution:"this wire equals the op that is supposed to produce it", "this predicted ciphertext byte equals the target". Each constraint is what later turns into a residual if you're doing Lagrangian optimization.
If you were writing a SAT solver to solve this problem, the state the solver actually mutates is just the buffers of plaintext, $K_1$, $K_2$, and (in one of the two encodings below) the wires.
The one real modelling decision is how finely we expose the cipher, and there are two choices that sit at opposite ends.
In the opaque encoding the whole block is a single monolithic op: hand it the plaintext and the keys, it runs the reference cipher end to end, and it returns the predicted ciphertext. There are no wires at all. The only constraints are that the predicted ciphertext equals the target (all 128 bits at once). This is compact and it is a faithful scorer, but as a search landscape it is a black box. The only things you can move are plaintext and key bytes, and because of AES's avalanche a single key-bit flip scrambles all sixteen output bytes unpredictably - there is no meaningful sense of being "partway" through the cipher, so local search has nothing to hold onto.
In the lowered encoding we go towards intermediary rewards. Every AES micro-step becomes its own op writing its own internal wire, and every wire carries a consistency constraint asserting that the wire really does equal the op that defines it, and every ciphertext byte gets its own equality constraint against the target. These intermediate wires are the extra unknowns $W$.
The whole point of lowering is local structure. Instead of a black box whose only handles are the ~64 key and plaintext bytes, the search now has thousands of intermediate wires it can nudge one at a time, re-scoring only the handful of constraints each nudge touches. The price is that all those wires now have to be kept mutually consistent with one another - which is exactly what the consistency constraints measure, and what the solver has to drive to zero alongside everything else.
So we cast inversion as a constraint-satisfaction problem over the unknowns
$$x = \big(\,\underbrace{P}_{\text{plaintext}},\ \underbrace{K_1}_{\text{data key}},\ \underbrace{K_2}_{\text{tweak key}},\ \underbrace{W}_{\text{internal AES wires}}\,\big),$$
whose constraints fall into three classes:
- Consistency (lowered encoding only) - every internal wire equals the op that produces it, i.e. the intermediate bytes really do form one valid AES computation.
- Goal - the predicted ciphertext equals the observed $C$.
- ASCII - Just to make the problem more tractable, we can constrain every plaintext byte to be text ASCII, which is not unreasonable if you have a good sense of what you're decrypting in the AES-XTS setting.
Each constraint $i$ becomes a non-negative residual $r_i(x) \ge 0$ that is zero exactly when the constraint is satisfied. Equality constraints are measured as a Hamming distance in bits rather than a flat right/wrong, so getting "closer" to the target (fewer wrong bits) lowers the residual and gives the solver an energy landscape to follow. The assignment is feasible when
$$H(x) = \sum_i r_i(x) = 0.$$
satisfiability is hard
Immediately if you're at all familiar with complexity theory, this should raise some red flags. The fact that inversion of AES is a satisfiability problem, and satisfiability problems are in general NP-hard, should make you wary of the idea of being able to find an efficient version. Ill make two observations:
- The relevant complexity theoretic fact about about cyphers is not their worst-case complexity - It is their average-case complexity.
- Any individual instance of a class of NP problems being worst-case hard is actually relatively rare
The security of AES is not based on the worst-case complexity class that it lives in. Instead it's based the fact that for any given key-plaintext pair we can be relatively certain that the inversion problem is quite hard. This is due to the fact that the average-case complexity of one-way functions is very high. However, Bogdanov and Trevisan (2005) and Akavia et. al. (2005) show that the average case complexity here is not drawn from the worst case complexity of the class of NP (assuming polynomial hierarchy). In general, it depends quite a bit on which complexity class you're working in, whether the worst-case complexity reduces down to the average-case complexity.
The second fact is more subtle but it's easiest to understand in the case of SAT problems. If you take a satisfiability problem with a given number of clauses $c$ for variables $n$, there is a ratio $q = c/n$ of variables to clauses. What's interesting is that as this ratio changes, two really important things happen: whenever the number of variables is high and the number of clauses is low the problem is relatively easy to satisfy because there are many degrees of freedom. Whenever the ratio is low, the problem is very hard to satisfy, often times impossible.
There's this intermediary point, which is typically described as a phase transition from satisfiable to unsatisfiable, where the problem becomes overdetermined in a way that makes it impossible to satisfy.
±J Ising spin glass with bond frustration set by the hardness at q = 2.00. Near the transition the landscape is glassy and the spins never settle; far from it they order almost immediately.
How the spin-glass animation works
// A 28x28 grid of "spins", each either +1 or -1. Every spin is
// coupled to its four neighbors by a fixed random bond J that is
// either +1 ("these two want to agree") or -1 ("these two want to
// disagree"). The energy of the whole grid is
//
// E = -sum over neighbor pairs of J * spin_a * spin_b
//
// so a bond contributes low energy when it gets what it wants.
const GRID = 28;
const TEMPERATURE = 0.9;
const spin = new Int8Array(GRID * GRID);
const couplingToRight = new Int8Array(GRID * GRID);
const couplingToBelow = new Int8Array(GRID * GRID);
// The lattice wraps around at the edges (a torus), so every site
// has exactly four neighbors.
const wrap = (v) => (v + GRID) % GRID;
const siteAt = (row, col) => wrap(row) * GRID + wrap(col);
function randomizeSpins() {
for (let s = 0; s < spin.length; s++) {
spin[s] = Math.random() < 0.5 ? -1 : 1;
}
}
// "Frustration" is the knob the slider controls. At 0 every bond is
// +1 and the grid is a plain ferromagnet: flipping toward your
// neighbors always helps, so it orders into big domains almost
// immediately. As frustration grows, more bonds are -1 and loops
// appear in which no assignment can satisfy every bond -- the
// defining feature of a spin glass. The energy landscape shatters
// into many competing local minima, which is the same picture as a
// SAT instance near the phase transition.
function rebuildCouplings(frustration) {
const pAntiferro = 0.5 * frustration;
for (let s = 0; s < spin.length; s++) {
couplingToRight[s] = Math.random() < pAntiferro ? -1 : 1;
couplingToBelow[s] = Math.random() < pAntiferro ? -1 : 1;
}
}
// One sweep = one update attempt per site, on average.
function metropolisSweep() {
for (let attempt = 0; attempt < GRID * GRID; attempt++) {
// Pick a random site.
const row = (Math.random() * GRID) | 0;
const col = (Math.random() * GRID) | 0;
const here = siteAt(row, col);
// The "local field" is what the neighbors are voting for:
// each neighbor's spin, weighted by the bond that connects it.
// (Bonds to the left/upper neighbors live on those sites.)
const localField =
couplingToRight[here] * spin[siteAt(row, col + 1)] +
couplingToRight[siteAt(row, col - 1)] * spin[siteAt(row, col - 1)] +
couplingToBelow[here] * spin[siteAt(row + 1, col)] +
couplingToBelow[siteAt(row - 1, col)] * spin[siteAt(row - 1, col)];
// Energy change if we flipped this spin.
const deltaE = 2 * spin[here] * localField;
// Metropolis rule: always take flips that lower the energy;
// take uphill flips with probability exp(-dE/T). The random
// uphill moves are what let the system escape shallow minima.
if (deltaE <= 0 || Math.random() < Math.exp(-deltaE / TEMPERATURE)) {
spin[here] = -spin[here];
}
}
}
// Main loop: two sweeps per animation frame, then paint each spin
// as a red (+1) or cream (-1) cell. Moving the hardness slider
// calls rebuildCouplings(hardness) and randomizeSpins(), restarting
// the relaxation with fresh disorder.Now there's also a second interesting phenomenon that happens, which is that if you were to build a SAT solver, you would find empirically that there are some instances of SAT problems that are relatively easy and some that are quite hard. In particular those that are easy tend to be far away from the phase transition and those that are hard tend to be near the phase transition. There's a lot of work that tries to characterize this more precisely. You can see the Gent and Walsh (1994) and Ding, Sly, and Sun (2014) and many of the papers citing them since it to understand more.
The results I'm describing here actually don't apply to AES because they are for random SAT instances. In our case we don't have random SAT instances. AES is a very particular kind of SAT instance that is actually very overdetermined in the standard formulation. However I want to illustrate the phase transition for random SAT instances because it's the most well known. There's no theoretical reason that the complexity of AES, which largely comes from its algebraic properties not its structure as a satisfiability problem, can't be translated to a regime where the difficulty of solving it is easier.
So now the question is if we find an AES inversion would we be violating some sort of fundamental theorem of complexity theory? My answer is no, based on my reading of the literature, although I'm certainly not a complexity theory expert. Importantly we are looking for an inversion of AES, which is not only one subclass of satisfiability problems. It's actually a single instance of that subclass.
We know that if we state the problem simply as a satisfiability problem as, it falls into that territory of being near the phase transition and therefore being hard for SAT solvers. However if we move beyond thinking purely in terms of SAT and instead think of different ways that we can translate between different NP problems, I think it's possible that we find efficient translations between the SAT instance for AES inversion and other NP problems which exploit the structure of AES. If you think about the hardness of those formulations within the new subclass that we translate to, it's very possible that those don't fall within the hard part of that subclass, and can therefore solve AES inversion efficiently. Whatever translation we do into other NP problems or other formulations (even within the same NP problem) does not make the general SAT problem under that translation simpler. In fact it might not even work for the general SAT problem. Instead it only makes AES easier and exploits the particular structure of AES to move it far from the difficult regime, and make it amenable to structured solvers.
This, of course, is not a new idea and is the basis of a lot of the work on trying to exploit the algebraic structure of AES. Because AES is based off of a set of permutations in two finite fields, there's a lot of mathematical structure which exists, which differentiates it from something that is purely random (the same reason that the phase transition result from random SAT doesn't apply to AES). See Cid et. al (2006) for more details. Nevertheless the fact that no one has successfully done this kind of attack in practice doesn't mean that it doesn't exist. It is very likely that the ability to do this kind of large-scale search over mathematical structure and implement it as part of an optimization solver is something which will be substantially better if done by AI post-trained on both math and coding than by humans.
Importantly, I think AES as a substitution permutation cypher is particularly vulnerable to the kind of attack I'm proposing here, and that is because it relies on many layers of individual combinatorial problems stacked on top of each other. What that means is that I think you can build an incremental attack which beats unstructured search. There are other cyphers, like lattice cyphers, which reduce the average-case complexity to worst-case complexity, which would obviate the argument I have above.
This also implies important things about the post-quantum world where we know that AES at double the key size is equally secure if the best quantum attack we have is Grover's algorithm, which Zalka (1999) has shown to be asymptotically optimal for unstructured search. However, I don't think that that's the case here because if we find AES inversion attacks that work by decomposing it into subproblems, and that decomposition is specific to AES and therefore does not solve the more general set of SAT problems, and those sub-problems are solvable via an oracle or via statistical analysis on many samples from low-energy Ising spin models, then we have an algorithm where using a quantum computer as an oracle for that kind of subproblem can immediately speed up substantially the hardest parts of the problem. This is especially true if the reduction to Ising spin problems is polynomial time and concentrates all the complexity into the spin problems; see Alagic et. al. (2025) for formal post-quantum security foundations of block cipher constructions including XTS, though still under the standard Grover threat model.
I don't think RSA, which is an asymmetric key cypher that relies on factoring near primes, would share the same structure because factoring is best done through search of primes (via sieves). That doesn't mean that you can't set up a reinforcement learning environment for optimal sieves, but it does mean the theoretical basis for why you might expect that to work would be different. My intuition tells me that similar attacks won't work for RSA because all fruitful reductions we have seen are fundamentally number theoretic - even though you can set up factoring as a satisfiability problem (the usual formulation is $a$ and $b$ such that $a*b = c$, where $*$ is an operation expressed in boolean circuit, and $a$ and $b$ are the digits input to the boolean circuit) solving combinatorial subproblems of near prime factorization seems like it would have number theoretic implications for generating or seiving the primes, and so would be no better than sieves. This is in contrast to AES which is by design built from subproblems.
Optimization Problems and RL
Now that I've laid out my case for why I think AES inversion is even theoretically possible, I want to talk about how you might do it. I think you must train a reinforcement learning model to be good at writing heuristic optimization solvers. And even if AES inversion is not the goal, the economic value of having high-quality, state-of-the-art optimization algorithms is very high.
Reinforcement learning has been crucial to training large language models to be able to work on long-horizon coding tasks. I think the same techniques can be brought to bear for creating large language models that are good at building high-quality heuristic optimization algorithms for hard problems, and models that are already post-trained on long-running coding tasks are going to be particularly good as initial models to undergo this post-training for optimization. There are two really big things necessary for this to work:
- A way to determine intermediary rewards
- A good RL environment
In the same way that right now there is so much work around using math as a domain of verifiable rewards, optimization is another domain with many verifiable rewards because solvers have binary yes/no outcomes, that tell you whether the solver was able to solve particular instances of your problems. However the verifiable rewards aren't enough. If they're too sparse, it's impossible for the models to get signal on them. The question is how you build intermediary rewards. I'll first lay out how you build intermediary rewards for general optimization problems and then in the next section we'll talk about the particular intermediary reward for AES.
When building a solver it is important to benchmark it against a wide variety of instances, so that across the entire benchmark suite you cover many different kinds of difficulty because different solvers have different profiles of what kinds of problems they're good at (for example different levels of sparsity and different energy profiles). A good metaheuristic knows how to identify characteristics of problems or sub-problems and is able to accurately switch between different heuristics based off of that. Moreover, the benchmark sets used for optimization solvers will include many different sizes of instances ranging from ones that are trivially solvable on a single CPU within milliseconds to ones that will take hours on large compute clusters. We can use reference solvers, which are relatively strong but not necessarily over-optimized, as a good way to grade the different instances of problems. That means you can translate the sparse reward of a solver solving a general NP problem into the somewhat more dense reward the probability that your solver solves the different instances, weighted by their difficulty.
This means that incrementally better solvers will get higher scores because they will be able to, within a given wall clock time constraint, solve increasingly more instances on a good reference benchmark set.
How to train a model to invert AES
The previous section was about intermediary rewards for optimization in general. For AES the intermediary reward comes from the way we set up the inversion in the first place. Recall that we cast inversion as a constraint-satisfaction problem with non-negative residuals $r_i(x)$ that are zero exactly when each constraint is met. We minimize an augmented Lagrangian for the equality-constrained program:
$$E(x) = \sum_i \Big[\, \lambda_i\, r_i(x) + \tfrac{1}{2}\, \rho_i\, r_i(x)^2 \,\Big].$$
Energy $E(x)$ is zero precisely when $x$ is a valid, ASCII, ciphertext-matching assignment. Each constraint contributes two terms. The first is a linear multiplier term $\lambda_i\, r_i(x)$, where $\lambda_i$ is a Lagrange-multiplier estimate that is adapted over the course of the search. The second is a quadratic penalty term $\tfrac{1}{2}\, \rho_i\, r_i(x)^2$, with penalty coefficient $\rho_i$.
Because every $r_i \ge 0$ and the global target is $r_i = 0$, the energy is non-negative and
$$E(x) = 0 \iff H(x) = 0 \iff x \text{ is feasible.}$$
So minimizing $E$ and hunting for satisfying assignments are the same objective, just reweighted to shape the landscape. In practice the constraints are grouped into a few classes - the ASCII prior, the internal-consistency constraints of the lowered AES model, and the goal constraint that the predicted ciphertext equals $C$ - and each class carries its own weight and multiplier so the optimizer can treat the problem as a true constrained program.
In the the decypher-env github repo there is a reference solver which illustrates how this energy can be used as the objective for an optimization solver. We start by observing that this energy induces a Boltzmann distribution at inverse temperature $\beta$,
$$\pi_\beta(x) \propto e^{-\beta E(x)},$$
which concentrates on the global minima (the feasible decryptions) as $\beta \to \infty$ and is broad and easy to explore at small $\beta$. To get the best of both, the reference solver uses a standard technique in heuristic optimization - parallel tempering/replica exchange: it runs many replicas of the system at different temperatures on a ladder, lets hot replicas roam across energy barriers while cold replicas refine deep minima, and periodically swaps neighboring replicas so that good states discovered by the hot replicas are ferried down to the cold ones. Within each replica the sampler takes Metropolis steps, accepting a proposed move with probability $\min\!\big(1,\, e^{-\beta \Delta E}\big)$.
The multipliers are what make the landscape adapt as the search runs. After each epoch of sweeps they are updated by dual ascent - the outer loop of the method of multipliers - roughly
$$\lambda_i \leftarrow \max\!\big(0,\ \lambda_i + \eta\, \bar r_i\big),$$
where $\bar r_i$ is the mean residual of constraint $i$ across the ensemble and $\eta$ is a step size. Constraints that stay violated on average steadily accrue larger multipliers, which raises the energy cost of violating them and pressures the cold replicas to satisfy them.
One structural trick makes the whole thing tractable. Given a candidate key, the plaintext that reproduces the ciphertext can be written down in closed form by simply decrypting the target under that key. That collapses the goal constraints to zero for free, so feasibility reduces to finding a key whose decryption is ASCII - and the search concentrates on the key space and the language prior rather than on the plaintext directly.
Where the reinforcement learning comes in
Now that I've illustrated how a single solver grinding down the energy $E(x)$ might work, we move on the the reinforcement learning task. The inner loop is ordinary optimization - no learning involved. The reinforcement learning lives in an outer loop wrapped around it, and the two are very easy to conflate, so it is worth separating them cleanly.
The policy is a language model. Its action is not a bit-flip or a swap inside the search - it is an entire solver: a snippet of C++ source implementing a small search interface (run sweeps, report the best assignment found, hand back any feasible states). The environment takes that emitted source, compiles it, and runs it against a curriculum of reduced-round AES-XTS instances. So one action is one whole solver, and scoring an action means actually compiling and running the search that action describes.
What we measure is how well that solver does at the energy-minimization problem above. Each run reports three things: the lowest energy $E_{\min}$ the solver reached, how many consecutive AES rounds it managed to keep internally consistent (a "round staircase"), and whether it ever reached a feasible, zero-residual state. The reward is a deterministic function of those numbers - there is no learned reward model - and its dominant term rewards a low minimum energy, something like
$$r_{\text{energy}} = \exp\!\big(-E_{\min}/s\big),$$
which equals $1$ at zero energy and decays as the weighted residual mass grows, with $s$ normalizing by the circuit size so the scale is comparable across round counts. The round staircase adds dense partial credit - it climbs $1, 2, 3, \dots$ as more rounds fall into place, long before the whole cipher is consistent - and actually reaching feasibility adds a further bonus on top. So the single scalar an action earns is, in effect, a shaped readout of the minimum energy that solver achieved.
The rest of the rubric just orders the failure modes so the gradient always points somewhere useful:
$$\text{doesn't compile} \;<\; \text{compiles but crashes / times out} \;<\; \text{compiles and runs.}$$
The policy is rewarded first simply for emitting a solver that builds and runs at all, then for driving the energy down, and only finally for genuinely inverting the cipher. Because full-round AES is out of reach while reduced-round instances are solvable, the instances are served as a curriculum that ramps the round count, and the very same multiplier updates from the Lagrangian above grow the constraint pressure as the curriculum advances - the residuals a solver leaves behind on one rung shape the difficulty of the next.