Attacking Lattice-based Cryptography with Martin Albrecht

Attacking Lattice-based Cryptography with Martin Albrecht

Returning champion Martin Albrecht joins us to help explain how we measure the security of lattice-based cryptosystems like Kyber and Dilithium against attackers. QRAM, BKZ, LLL, oh my!

Links:


This transcript has been edited for length and clarity.

Deirdre: Hello. Welcome to Security Cryptography Whatever. I’m Deirdre.

Thomas: I’m Thomas.

Deirdre: And we have a returning champion back on the pod today. We have Martin Albrecht back. Hi, Martin. How are you?

Martin: Happy to be here.

Deirdre: Great, thank you. We promised to send you your N-Timers club jacket in the mail. We have another one that we have to mail out in the future. FYI myself and Martin both work for SandboxAQ, at least for part of our time. And we invited Martin back to talk about one of his other areas of expertise, which is lattice based cryptography, and how to analyze the attack security against them, because we wanted to understand more about some of the new NIST schemes that have been selected for post-quantum cryptography, and that includes a lattice based chem key encapsulation mechanism called Kyber. And at least one of the signature schemes called Dilithium. Falcon is also lattice based, right?

Martin: Correct.

Deirdre: But it’s different for reasons, and it uses floating point things, and I don’t like them. But there’s been some discussion about how do we analyze the security levels of not just lattice schemes, but kind of like post-quantum schemes in general that are supposed to be resilient against a classical attacker with a regular computer and theoretically a non noisy. There’s some acronym about how it’s got enough logical qubits and enough error correction and enough handling of noise that happens with quantum computers in the real world to be an actual threat against these cryptographic schemes and actually run things like Shor’s algorithms or some of these other attacks efficiently to target some quantum scheme sorry, cryptography scheme. So we were here to pick Martin’s brain, and maybe he can help us and in the fun medium of not being able to use visual aides to show things like lattices and vectors and bases. So, first off, Martin, can you kind of give us a high level of why we kind of like these schemes like Kyber and Dilithium that are based on lattice assumptions for post-quantum cryptography?

Martin: Right. So after you do your intro that you don’t really like them, let me tell you why you should like them. Okay, one thing is they are fairly efficient, right, for post-quantum schemes. So that’s one of the reasons, is they are based on some problems that we believe to be hard even on a quantum computer, as you just said. And given that their sizes and the computation time somewhat good, while convincing at least many people that they are safe cool, really over here to talk about, I guess, is their security. Right. So what grounds this? And there’s the first question of structurally, why do we think it makes sense to base security on them? And then there’s the question of parameters in what is the security level, which I think is the focus of today’s episode, if I gather this correctly.

Deirdre: Sure.

Martin: So first, the structural thing is, roughly speaking, is if you can break Kyber, then you can solve a problem that is called module learning with errors.

Deirdre: We’ve heard about that before.

Martin: And we believe that this is a hard problem even for a quantum computer. So let me unpack that. Because the module learning with errors problem will also reappear in Dilithium. So it makes sense to spend a bit more time on that. So you do linear algebra mod q. So nothing too fancy there, but you add some small noise. So instead of having a proper solution to a linear system, you have a solution, and that is offset by some noise. And small here means it’s not zero, it doesn’t hit zero, but it hits five or it hits minus three or something like that. So like something small relative to the size of Q. And then the module in there means that we’re actually not doing this just over the integers mod q, but we do this as matrices over polynomial rings mod q.

Deirdre: ‘kay.

Martin: And that kind of then gives you module learning with errors and it’s assumed, we have some reasons to believe that this problem is hard also in a quantum computer. And then the Dilithium in addition to being based on this module LW and LWE problem is also based on the module SIS problem, which is simply it’s a very simple to state problem. So I give you a wide matrix with many columns but few rows, mod Q. It’s uniformly random. And all you have to do is find me a short solution that sums it up to zero. So you sum up the columns to hit zero everywhere and again, short is something, maybe the solution is between 20 and -20.

These are not Dilithium parameters. But to give you a sense and then q is a few thousand. Of course, the problem is really easy if you have something that is quite big. A big solution. But it’s considered to be a hard problem if the solution is small. So that’s the structural reason. So these two problems are believed to be hard on a quantum computer. If you can break either of those two schemes, you can, solve these problems in some parameter selection. So structurally, that’s why we believe this is hard. It’s the same thing as if we had a reduction that if you could solve Diffie-Hellman, you could compute discrete logs.

Thomas: So, like the basic learning with errors problem. I imagine it as taking the standard problem of using Gaussian elimination to find the solution for a system of linear equations. Right. Which is like first week of first semester of linear algebra. Easy problem to do, right. But if you mix a small error term in with the solution you’re trying to find, it’s a hard problem, and that’s because when you multiply out that whole matrix times the error that error term blows up, right? That doesn’t involve polynomials, it doesn’t involve anything really complicated at all maybe except for doing everything mod q, right? And you can build a crypto system off of just normal integer LWE, but nobody does, right? It’s all modular LWE. It’s all over polynomials at this point. I have sort of a basic intuition for how that kind of module LWE kind of works.

I have no intuition for why we do that. Why do we complexify it in that matter?

Martin: Yeah, so first, I mean, there was Frodo that was submitted and some people really like Frodo because they’re conservative and that is based just on the plain LWE problem. So it’s not quite right to say no one does it. But why do people like module LWE is? So before module LWE came ring LWE, which is essentially you replace the matrix that you do your linear algebra on, you replace that by a polynomial. And then if you know this trick of that, then you do a polynomial multiplication module another polynomial, you can phrase this as a matrix multiplication and then you can say, like, so instead of really having a uniformly random matrix, what I have is one that is derived from this polynomial. So now instead of having to store n squared coefficients or entries, I now have to only store n. So that is the reason that immediately kind of like you get it from a quadratic size to a linear size or a quasi-linear size, so that’s nice. And then there’s some polynomials that are nice, some degrees that are nice and some that are not so nice. And you might also think like, oh, maybe it’s a little bit much structure, but mostly for performance reasons.

What if: I hear you like matrices, I hear you like polynomials. What if I put matrices on top of your polynomials? So what you have is a three by three matrix in the case of Kyber768 and each of the kind of like nine blocks that you have is a little polynomial and the matrix there is derived from these polynomials. So that still gives you a saving and some flexibility at parameter choices.

Deirdre: And so the nice and not so nice are is that both in terms of security or performance or size or all of the above?

Martin: So it’s very nice in terms of performance. And it might actually be it’s faster, more efficient, typically to consider this module LWE, at least in the case here, because we like powers of two for reasons that are a bit too boring to go into here for these rings, for these dimensions of these rings. And we really like seven, six, eight as a dimension. It just hits a security parameter sweet spot. It’s not a power of two, is it? And that’s why actually you get nice security versus kind of performance trade off if you just believe parameters. The reason why you might not like it is well, I’m adding more structure, right? And the rules of the game are more or less like the more structure, you add to like a cryptographic primitive, the more you worry that maybe an adversary can exploit the structure. But I should say when we think about how do we pick parameters, we just ignore the structure because we know of no way of using this structure to even give us a factor that is linear in n. It seems like we don’t know how to do much better even for these more structured instances.

Deirdre: Okay, and so this is a bit of a tangent for the parameters that are powers of two, they work nicely for these kind of polynomials you use in Module LWE. The first thing I hear when we like powers of two, I’m like because our regular digital computers are all in binary and they really like computing over powers of two. It seems that it is a happy accident, not quite an accident, that the cryptosystem based over Module LWE also likes those things. And maybe that is why, partially why they’re quote so efficient on our modern digital computers.

Martin: Not quite. Because the power of two here is a dimension of your matrix, right? 768 is like, how big is your matrix? And there okay, not talking about bits. So there was a final list for NIST that actually because there’s still what is Q? Everything is mod Q and what’s Q? And you would typically choose a prime because makes sense. It’s easy to reason about, things are fields, but Saber was a finalist that said, like, no, actually we use powers of tool for the reason that you just mentioned. But if you use that, then actually our computers or actually our mathematics doesn’t like it so much because if you pick the prime, right, then you can use the NTT, the number theoretical transform, to do your polynomial multiplications. And then that gives you a better performance than just doing general polynomial multiplication over arbitrary modulo n.

Deirdre: Got it. Okay, so ignore what I said. But picking parameters, of course, is like this balance between performance, maybe a little bit of how these parameters affect complexity of both the structure, but maybe also implementation complexity and what we know or what the designers know of the best attacks. And generally, if I say best attacks to people, they may think purely in terms of computational complexity of an attack algorithm parameterized by some security level parameter, like lambda or whatever. But there’s more dimensions to this, especially when you have a quantum attack where you’re trying to design a scheme that is resilient against a quantum attack. You have to consider adversaries that have classical modern day computing capabilities and quantum adversaries that have access to an efficient large quantum computer. And, apparently there’s more to consider. Well, not apparently, but there is more to consider than just the algorithmic complexity of attack. So can you tell us a little bit more about what is sort of in the space that you’re considering that influences choosing parameters for these LWE-based schemes, module LWE and module SIS, whatever.

Martin: Yeah. So that is then kind of completing the circle, because we know that if you find a magic way of breaking Kyber, then you can also solve some module LWE instance. So what is the best way that we know how to break Kyber? Actually, it’s attacking the module LWE instance. So you really think of this so you don’t really think about the encryption scheme anymore. You think of this hard computational problem, which is essentially well, it’s a noisy linear system. Can you find the solution, please? And then you ask the question, how would you go about solving module LWE? And then that is a thing that proceeds in you have three levels. One is an overall attack strategy that is known as the primal or the dual attack. Then, in both of these, you run a lattice reduction algorithm, which more or less means people consider the BKZ algorithm.

And this algorithm in turn, now calls a subroutine called a shortest vector problem solver. And this shortest vector problem solver, they’re the fastest algorithm is a lattice sieve. And then the key question is like, well, how expensive is it to run a lattice sieve in a given dimension? So the key question is, because we have a pretty good idea of what parameters we need to run for this BKZ algorithm in order to make progress, to have a solution with either of this primal or dual attack. That gives us something called a block size. And the block size essentially tells us the dimension in which we have to solve the actually hard problem.

Deirdre: Okay.

Martin: And so in the case of Kyber512, the block size is something in the ballpark of 400. So you have to find shortest vectors in the lattice of dimension 400. And the question is, like, how long does this take with a quantum computer, with considering memory and so on? And then that’s the key datum. And there you want to hit whatever magical threshold you pick for saying, my estimate for this cost needs to be higher than that. And then maybe you take the polynomial overhead of running the BKZ algorithm and so on into account. And then that tells you how small you’re allowed to make it.

Deirdre: Cool. I’m writing down each of these steps. It goes deeper and deeper and deeper and deeper. I was like, okay, this is the hot path. This is the heavy hitter way down here. Parameterized by dimensions approximately 400. And you just crank. Now, is this like the best known attack, classical and quantum? Where is the advantage for a quantum attacker to run, say, lattice sieve versus on classical?

Martin: Yeah. So maybe it makes sense to talk a little bit about the algorithm that we use to find short vectors and lattices. So what is a lattice in the context here? Think of it really as a matrix, but a matrix over integers. So not mod q. And then you’re saying I’m going to do integer linear combinations. Say, consider the rows of this matrix and I’m going to add them up. And in what combination of them gives me a vector that is the shortest? Which has the shortest the smallest entries or Euclidean long, but more or less just something that makes me short entries. So that’s the key computational task.

That’s all that’s required of us. And so how do we go about this? What’s the best known algorithm, the most efficient algorithm that we know of?

Thomas: Before you say that, just so I’m sure of what I think I understand here. On a normal episode of this podcast, I put on a delicate balancing act of play acting ignorance about all of these things. In this case, I won’t mean to be acting so much. Actually, I’m never acting, but we’re talking here, when we’re talking about finding short vectors in a lattice. This is from a starting point of a random lattice where the basis of that lattice is like the row space of the matrix of the it’s the vectors that make up the lattice, right? And if it’s a random lattice, then those are all going to be of like weird random big size or whatever. And we’re looking for short vectors which we don’t immediately have in front of us. We have to somehow compute them from this random lattice. How much of that was crazy?

Martin: No, this is fine. So pedantically, it’s a bit difficult to define what a random lattice is because it’s an infinite space. But that is maybe kind of not the thing that we should now kind of dive into. So what you also want to distinguish is the question of which lattice you pick. So that is essentially the span of all these vectors. What can you produce? And then the question is what’s my input? How do I actually encode it? What’s the basis of this lattice?

Deirdre: Right.

Martin: And the key thing is the input that you get is essentially a random basis of this lattice. And that is a notion. It’s a bit clearer what we mean by random. So you more or less compute the Hermite normal form of your integer matrix and because that is canonical, that is a perfectly good input. But this is the thing that I had prepared and let’s see if my analogy lands. So how do we go about finding these short vectors? And it’s just Wagner’s algorithm and I’ve just offended all my friends. So how do we go about this? We essentially say like, well, let me just, I have my input, right, some basis and then I’m going to just kind of produce many more of them. I’m going to sample many vectors kind of that are just linear combinations of them, right? Pick some random linear combinations and you probably pick small coefficients of your rows to kind of make it not blow up because you all care about small stuff, but you sample, like, a lot of them, and then you just check them pairwise look, if I subtract these two, do I get something shorter? Oh, yes, I do. Let me keep that. Oh, no, I don’t. Let me not keep that.

And you keep doing this pairwise comparison until you’re done, and by the end, you have a new list, but they’re all a little bit shorter. You do the same thing again. Oh, yes, these are shorter. Let me keep them. And you keep on going. Until, you eventually and under some heuristics, you hit the shortest thing you can make this way, right? It’s no different from how do you kind of eliminate the Hamming distance, kind of like of a bunch of vectors.

Well, if I add these, does it reduce the Hamming distance? Oh, yeah, let me keep them.

Thomas: This is amazing. I could code this.

Deirdre: I love this.

Martin: Yeah, the whole thing is what I just described is the Gaussian sieve. And that is the first kind of the kind of efficient heuristic sieves. And by now we have a bit of and this will be maybe relevant when we talk about memory slightly better, but the slightly better idea also, I think is fairly quickly explained and that is if I have to and again, let me do Hamming distance because I assume that kind of this is what maybe listeners here are familiar with. If I have three strings, and the first string is close in Hamming distance to kind of like some string that I shall call the center. And the second thing string is also close to that string that I call the center, then probably the two strings that are close to the center, also somewhere close to each other. If they share many bits in common with the center, they will also share some number of bits with each other. That is the key trick for the asymptotically faster sieves that we use. So we just dedicate some vectors and say, like, yeah, let’s just pick some random ones that are kind of, like, somewhat nicely distributed, and then what do you mean by somewhat nicely distributed? Might as well be random.

And then we say like, okay, so now we’re very generous of how we say, like, oh, is it kind of close if I subtract these? This is kind of small, but I got this very loose definition of small. And then we put them in buckets, and then I do this pairwise comparison within these buckets. And then now I have this quadratic search only happens in the things that are close to the center. And I’ve improved the asymptotic complexity. So that is, roughly speaking, the bgj1-sieve, which is implemented in Jessica, which is this open source implementation of sieving. And then if you want to go to the asymptote fastest of these algorithms, instead of picking these random centers, you pick some centers that are essentially error correcting codes, and you just use something that you can find out very easily which of them you’re close to. Right. So you specifically chose these centers, and then instead of having to check all of them, am I close to them? There’s an algorithm that is somewhat better that tells you if you’re close to one of these centers.

Deirdre: Yeah, it’s cool.

Martin: That is the bgj1-sieve, and that is the state of the art for classical sieves. So classical meaning, not on a quantum computer.

Deirdre: Right. So do we get any speed ups on a quantum computer for any of it? Part of it. I mean, the whole comparing all these little vectors and seeing which one is shortest with a little bit of ways of doing bucketing comparisons and stuff like that, it almost smells like you could get quantum speed up on some of that stuff. But how much do you get?

Martin: So the quadratic search that I described, I hold on to one string and I compare it against all the other strings. That’s an exhaustive search and an exhaustive search there we have an improvement using Grover’s algorithm. And so we can just use Grover’s algorithm for this part of the quadratic search. So that means you can at best, hope for a quadratic speed up or a square root. The speed up is not that impressive because we’ve just reduced the quadratic part of the sieve dramatically by using these funky centers and only doing the quadratic pairwise search kind of in these smaller buckets. And so you get a quadratic speed up in the size of these buckets or the running time, which is quadratic in the size of the bucket. And then Grover more or less turns it down to linear in the size of these buckets. But these buckets now are not the whole dimension, but you have they’re smaller.

And so the advantage that you have for kind of a quantum computer are much less than what you would expect for Grover speed up. Because you always assume, like, well, AES-128, you would hope that Grover gives you something of 2^64 quantum operations, whatever that means exactly. And then a quantum operation is much more expensive than classical operation. But yeah, whatever. And then we are much further away from this kind of square root speed up in lattice setting.

Deirdre: Cool. I know isogenies the best. Before everything got broken in SIDH-land, it was like the best classical was like, fourth root speed up. And then if you throw a Grover’s on that, or you apply Grovers as part of what was called a claw attack, you would get sixth root. And so some of the sizes were scaled for that reason. And, yeah, it sounds like you don’t even have to worry about it that much in the lattice world. But then it got broken on the construction basis of torsion points, and everyone was worried about the torsion points for a long time. So we don’t talk about that anymore. That analysis thrown in the bin? Not really.

Martin: There’s another reason why you- so what I’ve just described in the quantum setting, so this Grover speed up. So the database that we’re searching for in the Grover setting here is an exponentially large database of actual vectors.

Deirdre: Oh, right.

Martin: So it’s different from an AES key search where it’s some uniform space over some bit strings. But no, it’s not all vectors that exist in the world, but it’s like an exponentially large collection of vectors. So you actually need to produce these vectors first, and then you need in your Grover algorithm, you need something called QRAM. So you need quantumly accessible classical RAM. So you have RAM, but you now can query it in superposition.

Deirdre: Do we have that yet? Has anyone built that yet?

Martin: No. So QRAM is equivalent to a full scale quantum computer. And there is some debate about that, I’m not an expert on about, what is the cost actually of running QRAM.

Deirdre: Right.

Martin: But roughly speaking, like an intuition for why this is much more expensive than in a classical setting: if I have a very large amount of data, I could, in principle, if I don’t need some data for a while, kind of I could put it in stone or put it on some magnetic tape and bury that somewhere and then leave it there for 200 years and then come back to it. But I don’t need to keep the power on for this thing for that while. But if I want to have superpositions over all my data, I need to have it powered on entire time for my entire computation, because I’m not querying this particular memory cell, but I’m querying some superposition of all memory cells. Again, quantum computing people are probably a bit upset with me about how butcher that analogy. But that is a reason, like the question of memory, cost is one, in the quantum setting, that small speed up that we get from Grover, it would be very surprising if that would even hold, because you need this QRAM, and that seems to be a very expensive resource. So all the costs that we have is that we just assume you get QRAM for free, because otherwise there’s no advantage.

Deirdre: Now, this kind of goes to the point of how do you model your adversary? And one kind of nice thing about designing cryptographic schemes that will theoretically be resilient against a quantum adversary is that you ideally want to be resilient against an incredibly powerful quantum adversary. You want to give your adversary all the advantages that you can kind of get away with, because if you can stand up to that, then you’ve got a lot of margin. But then on the other side, if you want your thing to be adopted, you need to be able to make it performant and usable. And that means picking parameters that are resilient enough against your adversary that you have modeled, but small enough that make it performant and small and all that sort of stuff. So I’m guessing the basic answer is, like, okay, my sort of ideal world is like, if we modeled the quantum adversary running these attacks against these module LWE as having perfect QRAM, how does that totally bone the parameter sizes?

Martin: None at all.

Deirdre: Hey, then why don’t we just assume they have, like, perfect tons of QRAM and then just move on with our day?

Martin: Yeah. And then this is what we do. We wrote a paper there where we tried to estimate that, but we had to set that QRAM is unit cost. And even then, I think, for the largest Kyber parameters, the speed up was something that you wouldn’t care about. I’m trying to open it live while I’m talking to you, but it was very small in large dimensions of the Bartlett you have. And then I don’t know, maybe I’m leaning out of the window a bit too much, but I don’t think that’s going to be an issue. And then also the shouting matches that- people are not shouting at each other about quantum adversaries at the moment. So that’s not the issue that’s at stake.

I think, more or less everybody says at the moment how we understand what they can do is so far from something that we would consider a threat. Like, quantum adversaries is for parameter selection you can more or less ignore, because of this kind of weird thing. You need to have this exponentially large database. You need to produce that first. And then the speed up is in this quadratic part of the sieve, and we learned how to make that a lot smaller using classical techniques.

Deirdre: Cool.

Thomas: So when we’re talking about the strength of these algorithms, the security levels of these algorithms, we’re talking about classical attacks. We’re talking about, like, if you swap out curve25519 or the NIST curves or RSA for an LWE problem, are we getting comparable levels of security against kind of classical attackers? Like the attackers that actually exist in the real world?

Martin: Yeah. That’s, I think it’s at stake.

Thomas: So the lowest level of the stack of this attack is the sieving stuff. And it turns out the sieving is both cool and also not as hard to understand as I expected it to be. Right. And the sieving that we’re doing, we’re using that as kind of like a plugin for the BKZ algorithm. I’m right so far? All right. So I have a sort of intuition for lattice reduction because it comes up in some basic crypto analytic attacks for public key stuff and for GCM and stuff. But like, lattice reduction, LLL, I sort of get like I get Gram-Schmidt, because, you know, I did an undergrad linear algebra with my daughter. Right.

I mentally understand LLL as like, okay. LLL is like Gram-Schmidt, if you had Gram-Schmidt to call as a subroutine. Right. But it’s comparably as complicated as Gram-Schmidt is BKZ, in my head, I have, like, okay, BKZ is like a better version of LLL, and that’s as much as I know. So the idea of a lattice production thing that has an SVP oracle as a plugin, I have no intuition for. How complicated is the BKZ part of this attack?

Martin: I think if you’re comfortable with LLL, then it’s quite easy. So let me first kind of try to do it through the medium of LLL, and then let me try to do it without the medium of LLL. So first through the medium of LLL is BKZ with block size 2. So because what LLL does is it does the Gauss reduction. And so that means it looks at two vectors, and then those span a little lattice, a little projected sublattice, and you just find the shortest vector there. And so now instead of looking at two vectors and finding the shortest vector in there, you have some parameter beta, and you look at that lattice, and there your oracle finds the shortest vector. It proceeds exactly like the LLL. Okay, it isn’t quite exactly, but it’s more or less the same.

Thomas: I’m the one person that’s listening to this that got anything out of that, probably. But that all made perfect sense to me. That’s great. I have at least benefited from this. This is awesome.

Martin: Let me try to do it without LLL. So LLL is a very famous algorithm for doing lattice reduction. And that, might not be the most intuitive, but you can think of this essentially as a sandpile. And then if you know Gram-Schmidt or Forgenization, you can do that for a matrix, and then you can plot the norms of these Gram-Schmidt vectors. You probably want to take the logs of these. And then what you get is essentially after LLL reduction, something that is downward sloping line. So the model that we use to analyze is literally called the sandpile model. So that’s the analogy I’m going to use.

So essentially what this algorithm does, you start with, like, a very steep sandpile. And what you’re trying to do is you’re trying to flatten this. Ideally, at the end of the day, you would want something that’s parallel, that’s flat, right? So just a flat line. So how do you do that is like, well, you pick a little part of your pile and you say, like, I want to flatten that just locally. I’m going to locally kind of, like, make this flat. So I take the peak and I make it flat. And so that means you push the beginning down, and that means because, you know, the sand has to go somewhere, so the rest is pushed up, and then let me move on next to the peak. Now I try to make that flat, and you keep on going until the end.

And it’s like, whoops. I have nothing left to do I’ve kind of started from the peak and I’m at the bottom. Let me go back to the peak and see what’s happened there. Right, and then you kind of flatten that again, that again, pushes some sand out and then that kind of modifies some stuff later and you flatten. You keep on doing that. And so the flattening operation is more or less this SVP oracle, right? So that does that for you. One little block there. You do the flattening.

You just keep on doing that in going round and round and round and keep flattening it in little steps. And then suddenly all of these steps together give you something that is a lot flatter. But as far as much progress as you can make, as much stuff you kind of look at in each little step. And so the bigger the block size, the more you flatten in one go.

Deirdre: Awesome. Thomas, this is your no, it’s not.

Thomas: I agree. That was awesome.

Deirdre: That’s cool.

Thomas: Okay, so I’m back to trying to chart my position in the whole attack as a whole, right? So I have an idea of what we’ve done with BKZ to the original lattice. I have an idea of how we use the sieving thing as kind of a plug into BKZ to kind of locally sort out the shortest vector from a block of vectors or whatever. Right. That all made sense to me. I’ve lost track of where we are in the overall attack.

Martin: Yeah. Okay, so now let’s say we have a BKZ algorithm. So how do I use this kind of in the most straightforward way of solving module LWE. And I already mentioned that we just ignore all the polynomial structure. We just pretend it’s just like some matrix. Right? Because that’s actually how we cost these algorithms, because we don’t know how to do better. So then what you do is like so an LWE is essentially have a matrix.

A is a random matrix, and I multiply it by some vector s, and then I add some small error term e, and it’s called the result A * s + e = C. It’s some vector. And so what I’m really like, how would I go about solving it? Well, one thing I can do is I can try all possible s’s to see if I multiply them by A and subtracting them from C, I get something small. Right. That would be kind of like take me, give all integer linear combinations of A and see if when I subtract them from C, give me something small.

Thomas: We’re trying to find the specific error term there, then. And we know the error term is small because it’s a parameter of the algorithm. Like the rate of whatever the construction is, is by design. It’s a small band of possible errors.

Martin: Yeah. And then the parameters are chosen that there is no other s. That also gives you something small. So if you found it, then it’s unique. And that already should smell a little bit like lattice reduction because I have some vectors and I do some integer combination of them. And if I do the correct integer combination of them, then it’s very small. Then suddenly everything collapses and everything is very small. And that is known as the Unique Shortest Vector Problem.

So it’s a situation where like, oh, I have this thing, this matrix here over the integers. I do some linear integer linear combination to them. And there’s one thing in there that is really quite small. So this BKZ algorithm is really good at finding kind of these short vectors. So I more or less take my other B instance and apply my BKZ algorithm to it. And I have the primal attack. So the primal attack is in a way, a very direct it’s a direct way of you tackling the problem head on. There exists a linear combination of the rows of this matrix that give something very small.

Well, we know a class of algorithms that do exactly that. So let’s call that. That’s it.

Deirdre: Cool.

Thomas: Got it. Okay. And that feels kind of similar to the way that LLL gets applied to existing crypto problems, right? It’s set up so that the smallest vector is like the likely candidate solution or whatever. Right.

Martin: So one intuition for maybe other people also heard of Bleichenbacher’s attack where you solve the hidden number problem kind of for poor randomness kind of ECDSA. So the hidden number problem is one dimensional LWE. So if you understand the hidden number problem and you now have an idea how LLL solves that, it’s literally the same thing as the primal attack on LWE.

Deirdre: Okay, I really love this. Just sort of like, hey, you know, this other attack that, you know, from 30 years of classical asymmetric cryptography, it is in fact a short dimension or a small dimension instance of these bigger ones. That’s going to help me a lot to learn these a little bit more in depth.

Thomas: I’m assuming that the reason everyone talks about the dual attack is because the primal attack is not the way you would really do this.

Martin: Yeah, I would say borrowly the primrimal attack should be the way to do this because it really solves the underlying problem. So like the dual attack. But that is the segue, I guess, is you’re actually solving the SIS problem. So that’s what we mentioned earlier, the module SIS or module SIS problem, and that is finding a short solution kind of that sums to zero. That is actually how the dual attack proceeds. So we know that A times s plus e equals C. And now say I have a lot of rows of A, so it’s not just a square matrix, but I have like quite a few rows. So that means there exists a short linear combination of the rows that sums to zero because these are linearly dependent.

And so all I have to do is to find a short linear combination that sums to zero. So now if I apply this short linear combinations of the rows of A, well, that makes it zero point. And so if I now apply to C, then this is a short linear combination. I’ve just killed A, so all that’s left to do is a short linear combination of the error term. But the error term itself is small. So I can after I’ve done that, I get an element out that is much smaller than q. So what I do is I do this many times and then I say like, well, what I got out, is that a short element all the time or is that a uniform element mod q? That’s the dual attack. And then how do I find short linear combinations kind of a pay? Well, the BKZ algorithm finds short vectors in a lattice and it turns out the kernel of A is a lattice.

So what I do is I compute the kernel of A and then that gives me an integer matrix and then I say like, BKZ, would you please be so kind and give me some short vectors in this lattice? I would really like to multiply them by my C.

And then you actually do something slightly more complex called the normal form of the dual attack. The reason why I’m mentioning this is the lattice that you fundamentally reduce in the dual and the primal attack, is one has A in it, which is the LWE A, and the other one has a transpose in it. So more or less the difference in the BKZ task that you’re running is whether you will reduce A or A-transpose, which, reminder, A is a random matrix. The only difference is in one case at some point you unlock the secret e because you do linear combination and suddenly everything collapses. In the other case, there is nothing to collapse, you just do that and later on take your short vector and multiply it by e and then it’s still something small and then it turns out and then the dual attack is in a way, it’s a quite roundabout way of attacking it, right? So you’re saying like, oh, I find something short in A that doesn’t have anything to do with a times s plus e and then I multiply it by a times s plus e, and suddenly I can distinguish now, it feels like a less direct way. So morally you would say I don’t think this should be better than just attacking the problem head on and you’re even running the same BKZ algorithm on it.

But it turns out, as far as we understand these attacks, you can play some tricks there of how you go about it and that kind of makes these attacks currently kind of a few a little bit faster than the primary attack, at least kind of as far as our estimates are concerned. So it seems like, yeah, indeed, these algorithms perform slightly better. And if that’s an artifact of, like, we just haven’t really found out how to analyze either of them really properly, or that actually there’s something deeper happening with this roundabout way is more efficient.

Thomas: Okay, so that mostly made sense to me. And I won’t torture you by having you explain that further. I’ll torture you a different way, which is, so we have, like, for the PQC stuff, we have target security levels, right, which is roughly like matching the strength of the asymmetric algorithm, the key exchange or whatever to whatever bulk encryption. We’re doing, like, 128 bits of security, whatever it is. Right. So I guess there was news last year, I think it was 2022, right, where the Israeli equivalent of the NSA published a technical report, right. Where they had brought cost of, I think, the dual attack down to below the target security levels. I don’t know how meaningful that reduction of security was. It was like on the order of like 10 or 15 bits or something like that, which the number of bits we had was already higher than AES or something like that.

Martin: I don’t know.

Thomas: Right. But I’m more interested just in what was the meaning of that technical report, what was the interesting bit of that?

Martin: So they did two things. So let me compare with the bit reductions that they did. So the target security level was meant to be something in the ballpark of 143, and they reduced it to 138

Deidrre: So, five bits?

Martin: Something for the smallest parameter, and it’s bigger for Kyber 1024, it went from 272 to 257. So that’s a bit better. Right? And then what they did is two things. On the one hand, they improved. So they build on some model of how you cost the sub-exponential part of this algorithm. I mentioned this decoding in the sitting earlier, and that is a sub exponential part.

So that’s great, but it’s expensive. In this paper that I put in the meeting notes, we gave some costs for that, and they said, actually, you can do better than this. And so they improved the sub exponential factor. That is one of the big savings. The other one is they are building or maybe I think it might be independent, might have been independent work. But there was a paper slightly earlier by Guo and Johansson at Asiacrypt, and they essentially introduced an FFT based approach to do a part of guessing in the dual attack. So let me unpack that. So the way I’ve described the dual attack, now, it was a distinguishing attack.

Like, do I get something small or large? But what you can do is you can say, like, how about, I guess, some components of the secret? And then if I guess incorrectly, then I assume that the distribution of what I’m left over is uniform. If I guess correctly, I get a smaller dimensional LW instance. If I can distinguish those two, I can confirm or deny my guess.

Deirdre: Yeah.

Martin: And what they did is they and this is useful because now the lattice problem that you have to solve is smaller, right? And a nice thing about this attack from an attacker perspective that the costs are more additive. So you can run the lattice part of the attack once and then you do the guessing stage, kind of like on this preprocessed data.

Deirdre: Cool.

Martin: And they kind of showed a way of making this guessing step cheaper using some FFT based approach by essentially targeting only kind of some lower order bits and that allowed them to increase the dimension of the stuff that you guess, which decreased the dimension of the lattice problem. And then all parameters kind of attack parameters a bit smaller and everything was a bit faster. And that kind of they have in this matzov paper, they have a slightly different approach to kind of like, also using an FFT, which also allows him to generalize this and not just for bits, but like any small prime. And then that flexibility allows him to reduce this further still. And this then also opens the door for the back and forth because there’s a paper that is joyously titled ‘Does the Dual[-Sieve] Attack Even Work?’ by Leo and Ludo. And the problem that you might run into is at some point you have to distinguish between, is this thing random or is this thing from an LWE distribution? But now if we guess a lot of secrets, the correct secret has to compete with potentially an exponential number of false solutions.

Deirdre: Okay?

Martin: And then you have to take into account, like, maybe some of them randomly behave even better than your correct solution. And then there’s this paper by Leo and Ludo, as I mentioned. And then there’s a paper by Yiheng and her co author where they said, okay, where’s the regime where we can prove this? But it’s essentially the question of we’re doing some statistics, we’re more or less making some independence assumptions here that this is fine, but these things are not necessarily independent. Is it actually sound? So there’s a bit of a back and forth of, like, do you actually get the claim complexity or not?

Thomas: But here we’re talking about not whether the particular MATSOV attack worked, but whether the dual attack approach works at all.

Martin: So the dual attack has been known to work. So what the question really means does the attack, when you do this key guessing

Thomas: Gotcha

Martin: So can you do the step? And what is the regime in which you can do this step and you still have some guarantees? Like, no, my correct solution will be the one that I identify. And what they kind of showed in this paper said the analysis is kind of heuristic. Like, we have some counter examples. And so I think if the attack has this complexity or not at this point is a bit up in the air is my understanding. I haven’t followed the latest on this in detail, but we’re talking about ballpark of do you gain five bits or not in the lowest attack, right? So we’re not talking about is the whole principle doomed, but we’re talking about do you really get this complexity that pushes you below this kind of magical number of 143.

Thomas: Gotcha.

Deirdre: Awesome. So all of these, even including the MATZOV attack or some of these improved sieving attacks, these are all classical. So we talked a little bit about how in the quantum attack model, you have to consider how good or how much QRAM you have. But it seems that for a lot of schemes, you don’t even have to consider that because even if you consider them having all the QRAM all the time and it’s perfect and never has errors, it doesn’t really affect the security margins for those attacks. What about for these best in class, if the dual attack actually gets the speed up, from MATZOV and all these things like that, what do we have to worry about in terms of RAM and storage and all that sort of stuff? Is that a consideration for classical? It seems like it’s not so much a consideration for quantum attacks.

Martin: So it definitely is a consideration for classical. So the sieving algorithm, so the key routine that we kind of argue about here is an algorithm that runs in exponential time and requires exponential memory.

Thomas: It sounded so easy!

Martin: But the key thing is you need exponentially many of your strings in order to do this pair-wise comparison, actually get a reduction. Right? That’s the reason. So the numbers here, the memory that you need for a sieve is roughly 0.2 times the dimension of your sieve. So if you have block size beta, it’s 0.2 times beta.

Deirdre: Okay.

Martin: So we had block size roughly 400 for the smallest Kyber parameters. So you’re looking at memory of roughly 2^80 vectors. And then I had already mentioned that the way we actually do this attack is that we actually kind of put them in these small buckets. And it’s only in these buckets that you do a pairwise comparison.

Deirdre: Within each bucket.

Martin: Yeah, within each bucket you do this pairwise comparison. And that a bucket is roughly the square root of your entire database. That’s how you parameterize it. So you’re looking at something like 2^40 factors and that’s where you do the quadratic sieve.

Deirdre: Okay.

Martin: And so some work that we have recently done that is kind of hopefully put on Eprint soon is to say like, okay, can you sieve on multiple machines?

Deirdre: Yeah.

Martin: And then turns out you can. Because the thing that you really need to be fast and memory very need to be kind of because you’re looping through your database and find something that is close to your vector that you’re currently holding onto. Right? And there you need memory to be fast. That is something of 0.1 of your dimension, right? So it’s quite a bit smaller. And so if you have local RAM that is kind of in that ballpark, then you can essentially put one bucket per machine and there enjoy all the benefits of fast memory. So that was the thing that we’re focusing and there’s been some designs of that you can even do arrange computers in the ring structure and so on, so that you can minimize this.

But more or less, yes, you need exponential memory, but it’s not like you’re jumping around randomly in this exponential memory all the time. But you can hide a lot of load latency by essentially running through this in a circle and just touching the next vector. So in that sense there is some cost that is associated with kind of accessing all of these exponentially many vectors.

But it’s not like each call now means like I now have to go out in an exponentially large database and find the random element in there and you can caches and so on. Makes sense.

Deirdre: Yeah. This smacks of not quite MapReduce, but you can just break it up into chunks and you can farm it out and pretty much just bucket it into parallel processes, get some results and compare them and then kind of work on it. Work on it, work on it as opposed to every sieving bucket having to see all the other data that the other sieving processes are also working on or something like that.

Martin: Of course some synchronization that you need to do, but in our somewhat limited experiments, but between three servers, we saw a pretty tidy linear speed up. So it seems like, yeah, you can scale this across multiple servers and then of course, really what you want to do is have networks of FPGAs doing this sort of thing, but we leave that for future work.

Deirdre: So does this affect any of the current estimates of the hardness of the different security parameters compared to- because it’s supposed to be like, well, level one is approximately AES 128 and level three is supposed to be AES 256 and so on and so on. Do you think it affects that?

Martin: Because at the moment all of these estimates that we’re talking about assume that RAM is free. So if anything, they push the cost up, not down.

Deirdre: Cool. All right.

Martin: And so the dispute is over, does it push it up enough to regain the security level that was originally claimed or not? So if you think that kind of these numbers that for example, just take say like we find out actually MATZOV behaves exactly the MATZOV attack behaves exactly as they claimed in their paper, despite problems in the analysis identified. But somehow it turns out, yeah, we make some independence assumption. It’s not true, but it seems to work in practice sort of thing. Let’s say that. And if you’re happy with that, like, yeah, it’s not quite as two, five, six. It’s a little bit lower than that, but I accept that. Then taking RAM costs into account will only push that up.

The whole dispute is over if you are uncomfortable with these numbers right now, does taking memory into account allow you to be comfortable again?

Deirdre: I see. Okay.

Thomas: And you’re going to have to take memory into account, somehow. But you might not take the worst case assumption about how much complexity or how much time it’s going to effort it’s going to take having to have every bit be connected to every other bit because of, like, if we can bucket things, then blah, blah, blah.

Martin: Okay, yes. At the moment, we’re just assuming this is free. Right. So every cell I can just address and immediately because it’s a no op. Right. So if I want to compare a vector, then I can just do that. And that is, of course, unrealistic. But the question is, like, how well, there’s different levels of cache.

How much do they mask the true cost? And how much should you actually care? I think is an open research question.

Deirdre: Sure. On the flip side, has there anything changed with analysis of how hard AES 128, 256 are against quantum attackers with perfect QRAM or classical attackers with perfect all access and it’s free? Has anything evolved on that? Or is it just purely like model groves with perfect QRAM with everything, everywhere, all at once?

Martin: No, there has been. So all the lattice schemes, let me try to phrase this the right way around, more secure, because there was a revised cost estimate for breaking AES on a quantum computer.

Deirdre: Neat.

Martin: So there’s a paper, I can put it in the meeting notes by Fernando Virdia and co authors. I apologize to the co authors, but Fernando was my student, so I know his name. Best meeting notes. And they kind of revised the costs for as quantums, actually. And these circuits still get so the missed target of as hard as as on a quantum computer was a moving target. It makes sense why they picked this, but it’s not that it is settled. Of how many operations does it take to solve as on a quantum computer? And even what is the right cost model for a quantum computer isn’t clear. So what are the operations that you should count? Because we don’t really know yet which of the architectures will be dominant and then what are the relative costs of various kind of quantum gates?

Deirdre: Yeah, cool. Thank you. Because I know that there’s discussion when you’re looking at specific new crypto schemes about all these attacks and how to measure and that sort of stuff. But the other side of it is the thing you’re trying to categorize against it also may move because it’s also subject to analysis and cryptanalysis and modeling of whatever. And so I was very curious if these things were moving at all or if AES 128 complexity of attacks kind of stayed stable. And I’m glad that you pointed out that I haven’t read that paper. So yeah, cool.

Thomas, do you have anything else?

Thomas: No. My brain is leaking out of both my ears and my nose right now, but that was incredible. That was awesome. I know a little bit more than I did before, but it’s so difficult to get me to learn new things that you should be very impressed with yourself for accomplishing that. That was awesome. Thank you so much.

Martin: Okay. Honored as an educator.

Deirdre: Yeah. Thank you, Martin, for meeting the challenge of, how do I talk about these things with no visual aids, audio only. And I think you did an excellent job because I understood a lot of it. Thank you, Martin. Thank you very much.

Security Cryptography Whatever is a side project from Deirdre Connolly, Thomas Ptcaek and David Adrian. Our editor is Nettie Smith. You can find the podcast on BlueSky and Twitter @scwpod and the host on this guy on Twitter @durumcrustulum, @tqbf, and @davidcadrian.

You can buy merchandise at https://merch.securitycryptographywhatever.com. Thank you for listening.