The feeling's mutual: mTLS with Colm MacCárthaigh

We recorded this months ago, and now it’s finally up!

Colm MacCárthaigh joined us to chat about all things TLS, S2N, MTLS, SSH, fuzzing, formal verification, implementing state machines, and of course, DNSSEC.

This rough transcript has not been edited and may have errors.

Deirdre: Hello. Welcome to Security Cryptography Whatever. I am Deirdre.

David: I’m David.

Thomas: I’m Thomas.

Colm: And I’m Colm.

Deirdre: Yay! Colm is our special guest today. I’m a cryptographic engineer at the Zcash foundation.

David: I am a engineer at a company called name tag, but I also did a PhD in almost cryptography at Michigan and co-founded Censys.

Thomas: I’m an engineer at fly.io And I have one semester of undergrad college.

Colm: I’m an engineer at Amazon Web Services where I work on ah, security, cryptography, identity and virtualization.

Thomas: This is very

David: One of the coveted principle engineers at Amazon. Correct?

Colm: Yes. definitely a member of the Amazon principal engineering

It’s fun.

Deirdre: Welcome Colm.

Thomas: What is that?

Colm: So, at Amazon, you know, when you join as a, as an engineer, typically you come in as, as a, an SDE1, right. Which is typically a college hire. and then SDE2 was our next level at that, after that’s kind of, uh, still kind of an early career position where, where people learn how to be, ah, really great software developers. Then SDE3 are lead developers. And then after that, you become this, you know, blessed principal engineer, who, you know, supposedly we are great at, you know, stewarding and shepherding projects across teams and across the company and figuring out what the technical direction should be, as well as getting real things done.

Thomas: Does it come with

Colm: AH, it comes with a title. It doesn’t, uh, there’s no intrinsic powers. It comes with responsibilities. Um, I found out the hard way the, you know, tends to make things harder rather than easier.

Deirdre: but do you have minions?

Colm: I do not have minions.

No, I it’s. I work with work with a lot of teams and, um, I have to help a lot of engineers, but none of them, none of them just do my bidding. Everybody always wants to argue and figure out what the right thing to do is,

David: How much bar raising do you do per day?

Colm: uh, all the time, everything it’s kinda, it’s a constant bar raising. It’s more of a life philosophy than anything else.

Thomas: I gave, like, I gave a talk at Amazon, like a bunch of years ago about like, I think it was our cryptography for pentesters presentation that we had done, like at a bunch of places. And when I was there like it was like a blue hat kind of thing, where they had outside people from a bunch of places just presenting to the Amazon team and like a bunch of people at Amazon, like the engineers that were talking about how they would get like sponsorships or recommendations from people outside the company.

Like that was a strange Amazon culture thing. I was unaware aware of at least at the time there was some process where like, it mattered, if somebody outside of Amazon like wrote you a recommendation or it said something nice about you. Do you have any recollection of that?

Colm: you’re not crazy. I it’s it’s it’s definitely something that comes up. We do, uh, You know, when people are doing annual review processes or promotion processes and all the kind of stuff. It’s not that unusual to look for feedback from folks who don’t work at Amazon, in part, because, you know, we try to be really customer obsessed and getting feedback from customers is super valuable, but also partners and vendors and other people we work with because you know, their, their feedback’s really useful. I

do that all

Deirdre: me of the tenure process when you’re going up for tenure. And if you’re in academia, you have to get letters from people outside your department or outside your school to could be like, yeah, I would totally give them tenure or whatever, but I’m somewhere else. So, no,

Colm: Yeah. And I think when, when part of someone’s role is, you know, interacting with the industry, it becomes particularly important. I can, I can definitely think of some people, like that’s most of their job, so they probably have all sorts of feedback and recommendations.

Thomas: I just want to tell people working at Amazon, that I am available for recommendations and that my fees are very. reasonable.

Deirdre: Do we want to talk about S2N? Like the AWS cryptographic library?

Colm: Sure. Amazon S2N, it’s, it’s an open source project. It’s on send, GitHub in our, uh, you can find it there and our AWS repo and it’s, it’s short for signal to noise, which I am amazed was a name that was left there in terms of, you know, for, for a cryptographic library, how no one had taken that name before us, you know, cause the, the core function of cryptography is to turn meaningful signals into useless noise, right into, to hide information in plain sight. but it’s, it started as a library that just implemented TLS, right. And SSL And we would use, you know, OpenSSL’s libcrypto or other lib cryptos under the hood to do the core cryptography, but steadily bit by bit, we’re kind of taking on more and even the implementing that we’ve, we’ve got to around libcrypto project as well, that that kind of plugs into it and we’ve grown to support QUIC.

Um,

and, some other stuff in there, and we’ve got some, uh, cryptographic primitives that we use in non TLS contexts as well. That’s all part of S2N. And I think it’s been going six or seven years now. I can’t remember.

Thomas: I’m trying to remember what was going on with TLS six or seven years ago. Like, what was the impetus for, what was the impetus for starting your own

TLS?

Colm: Yeah. Well,

Deirdre: something other than OpenSSL.

Colm: So the th I mean, development on Astra and started literally the day after heart bleed,

just, just to get there directly at that. So that, that was definitely a trigger. Uh, but we had talked about and discussed our own internal need for, for S two M before that. And we had actually outlined it and we were planning on starting at about, probably about six months later than we

did, um, because we saw there were like some performance optimizations that were kind of on the table. And we saw that, um, we just had to get into this game of owning this part of the stack ourselves, um, for, for, uh, for a bunch of reasons. Um, but you know, when heartbeat happened, it definitely accelerated my timeline. I literally literally started working on it pretty much. Full-time right, right after that, not quite as to when itself first, the first thing we wrote was actually, um, a kernel module that would act as a, uh, network filter that could block heart bleed.

Cause we had some customers stranded who couldn’t update, you know, their, their copies of, uh, OpenSSL. So we wrote a little module they could use on their answers.

Deirdre: Hmm.

Colm: And then that kind of grew to, to become as to ed.

Deirdre: So they couldn’t upgrade OpenSSL, but they could install a Colonel.

Colm: Yeah.

So, So, we, we had, um, we had a bunch of customers who were, who were kind of unusual situations. Um, some, we were able to help them with hot patching, right? So Heartbleed was a pretty easy, uh, issue to, to hop patch boundaries for, you know, literally just, you know, find a block of code near that processes, these heartbeat requests and add a jump.

But then at a section at the end of the job that, you know, adds a condition to defend against harp, they, and then jumped back to where you went, right. It’s actually not that hard to do, but we had some customers who had, you know, validated binaries that they couldn’t change. They had self-checks and checksums that have to, that have to pass. And so you couldn’t modify the applications themselves. So you, so you can’t hop patch it.

So now you’ve got to do something in the network and that meant, you know, shadowing and parsing the entire kind of SSL TLS statement.

Detecting a heart beat record and projecting it. So that’s what we did.

Deirdre: Is that code still alive? Deployed

some.

Colm: I hope not.

Uh, but the module is still public it’s in my get hub repository, but it’s Um,

it’s, I hope nobody’s still running that I, that was, that was definitely, you know, intended as a bandaid to, to help people, some people get by who, who didn’t really have any other option.

Thomas: I’m looking at, like, I’m looking at the, the, introduction for us to end the S the, like your announcement post, how it played on hacker news in 2015. Cause that’s my lens for how to look at everything. And the T the top comment there is about a 12,000 line OCaml implementation of, of TLS. And my question for you is why did you not implement that in no camel?

Colm: well, I personally, I’m not Oakham illiterate and. Uh, so I implemented it in C and I guess the, the, the other big reason to implement it in C at the time is pretty much everywhere. We did TLS, you know, every application that did the, the front end SSL TLS processing. It was also written in C

and so we didn’t want to be constrained. Uh, and we had some, you know, compilation target environments that couldn’t be, didn’t even run, couldn’t even compile C plus plus too. So, um, so we, we were pretty restricted. Uh, we were able to get to see 99, but, um, pretty much all even our embedded environments could support that, but that, that was pretty much our lowest common denominator at the time. Um, you know, wasn’t what I do today, but that’s, that’s that’s what we did at the time.

Deirdre: What kind of embedded environments are these clients talking to AWS or else?

Colm: so as, as the name is it’s Amazon as two N and it’s not just used in, in, uh,

Deirdre: Yeah.

Colm: fact, at the time when we wrote it, one of, one of the, some of the smallest, um, compilation targets included things like dash buttons,

which are, you know, I don’t know if you can remember this. Right. So

you think about trying to get an, a, you know, a TLS stack that can run an environment like

that.

Very, very small, very tiny footprint.

Deirdre: Ooh. Oh, so there’s like a little section of post quantum crypto and the two repo. Are you deploying those? Calling them crypto to dash buttons, please say yes, please say yes.

Colm: Wait. No, I don’t think so. As

fun as

David: you still buy dash buttons? Are those still a thing

that again,

Colm: I, know that they still work and I know that I know that they could still be used. And, uh, I actually, um, have a friend, uh, you know, and one of my friends, she, she works on and she has dash buttons and they, they definitely still work.

But, uh, I, I, dunno if he could still.

Thomas: This is like the thing where if you’re out of detergent, you just have a button next to your washing machine and you push it and then detergent comes.

Colm: Yeah, they’re really cool.

Deirdre: they are really

David: yeah, I actually got one that just every time I hit it, I get a new TLS implementation.

Um,

Colm: Yeah. It’s uh, you know, now, now you can just ask your Echo device to do it for you and you don’t even have to press the button. So, but it’s,

Thomas: Like, so there’s like there, the Heartbleed happens. I, I, so I guess you announced S2N in 2015, but Heartbleed was like years before that. Right. Um, but like you, um, like you have a sense of where like OpenSSL is now, but you also have a sense of where it was back then, right? Like I get, like, I get the, it’s easy to sell me an S2N, in 2012 or whatever, right? Like in the, in the bad old days of OpenSSL, right. But like, if you were going to make a sales pitch right now for like, ‘use this different C implementation of, um, of TLS instead of OpenSSL’, I believe that there’s a good pitch there. I’m just wondering what it is.

Colm: Yeah. Uh, well, for, for, well, first I don’t mean to criticize OpenSSL, and don’t ever don’t ever take anything I’m saying— I think OpenSSL is probably one of the greatest world goods that has ever achieved in software development. Literally like a mostly volunteer team that brought cryptography to the masses.

It’s just an amazing accomplishment. And I don’t ever want to talk down on it. I’ll tell I’m pretty good friends with a lot of the openSSL team, so I don’t want to get in trouble with them.

Thomas: we all, we all agree. Which, which frees us to be mean to it.

Colm: Okay.

David: yeah. And what we can, I was asleep. We can put in the standard disclaimer, now to that, like OpenSSL. In 2014 and OpenSSL, like now are also two very different beasts. Um, I also wouldn’t, you know, and open up a, sell 2014. I wouldn’t go like beat anyone with a stick because of heart bleed. I think there’s a lot of things that kinda kind of led to that happening.

But,

Colm: Cause

good, good,

David: it’s certainly what you’re talking about when you’re creating is not at all. What OpenSSL is like.

Deirdre: yeah.

Colm: yeah. Good, good context setting. So, so the big motivation for us was, was really well, um, to have fewer security issues. And there’s, there’s two senses of that, right? One, one sense is, well, we, we really do have to have a lower risk profile. There are sometimes bad actors coming after our customers and we, we gotta be able to protect them.

And there’s some real trance there, but the, the deal, their level of it is, um, no matter what, anytime an issue comes out anytime, um, no matter how low the risk is, you’ve got to go do a bunch of updates

and those can be pretty disruptive. Um, you know, I, and Amazon scale rolling out a software update and a low level library or a low level system, you know, like SSL and TLS is used everywhere. Like Ken costs, you know, a lot of teams to have to pause. And then they, you know, don’t work on their own roadmap for a bit. They go and pivot and they have to do this

update and get it deployed and do all their testing and so on and so forth. And I don’t know a good way to measure the impact of that, but it is certainly tens of millions of. Like at, at Amazon’s scale, right? It’s like just a huge, over many years, the amount of whatever that productivity loss is. And, and the development costs of something like as twins is always going to be less than that. And as long as we can do it in a way where we add the right, you know, defenses in depth and have a much more minimal surface area, the advantages you get to kind of just sit there and write out all those updates.

And that’s pretty much worked. You know, we there’s, a lot of updates of, or issues have come and we just didn’t have to do anything at all. And, and all of that productivity is saved and, and that’s the, the main benefit. And that was the main pitch. And, and that’s how we kind of continue to go at it. Then secondly, um, the performance, you know, there there’s, um, performance enhancements that are still on the table.

You know, we still have ideas for how we think we can do pretty serious performance savings, uh, in how this stuff is. You know, works and we got some pretty big wins, you know, when you’re looking at a system like Amazon astray, even like two, 3% performance improvements turn into, you know, big numbers, very fast.

Deirdre: Yeah. Especially when you’re like the first stop for a cash for anything for the internet to stick it in S3. So any of that, when you’re, when you’re effectively a cash for the internet, uh, any of that latency is felt including in your SSL handshakes or, you know, whatever,

Thomas: I guess I should have asked like a second ago, but like where does S to N live in that, in the architecture right now? Like literally everywhere in AWS, where there would be a TLS. Is it now?

Colm: uh, not quite everywhere. I mean, there’s there’s a few, uh, laughed. We’re very, very careful about how we update them and making sure we don’t break customers. And, and we preserve backwards compatibility along the way, but pretty much every, if you talk to an Amazon web service, if you call an API. Um, that that’s as to when, if you hit the CloudFront CDN that’s as two.

And if you hit Amazon S3 that’s, uh, as to N if you had network club bouncers, that’s asked who, and

if you hit an application, load, balancer might not be as to, and that’s one of the few things, um, that’s, that’s left on our list, but we’re working on that one.

Um, and it’ll, that’ll be fun.

David: in terms of the productivity gains or is that coming because you have to patch less or because you can patch easier because it’s like within your build system and owned by you.

Colm: think the answer is yes. And,

um, it’s, it’s, it’s, it’s, it’s mostly having to do nothing, you know, it’s, it’s, it’s, it’s, it’s nice to be able to say, well, this issue came in, uh, but we, we don’t have to worry about it. We just don’t have to do anything that, that is by far the biggest win. And then, because it is in our development and build system, and it’s, it’s kind of a first-class project internally. You know, it, that is easier. It’s also, you know, if there is an issue with as to N right, if there is a security issue with as to why, and, um, we’re generally, you know, going to be where any security researcher reports it to. Right. So we’re going to have kind of firsthand, uh, privileges in the embargo process.

Right. And be able to coordinate with them and have everything updated and, and the day customers find out about it. Everything’s already patched, you know, so it’s, it’s, um, it that’s uh, that’s, uh, a good productivity when too.

Deirdre: that’s an interesting, interesting way to be like, Is there a value to a pro a, a quote-unquote fully source project, like OpenSSL, that’s available to quote anyone, but that means that it’s also controlled by no one in particular. So everyone has to coordinate and you’re at the whim of the project and you have to try and get your fixes in line.

And if you’re an Amazon, you’re just like, well, no, that I can’t, I can’t come. I can’t work in that workflow. I’m just going to go over here and do it on my own. So is there, I guess the value would be to smaller organizations than an Amazon or an AWS

Colm: Yeah, I mean, for us, we maintain a full Linux distribution to Amazon Linux. And so we, we already have to patch everything, you know, well, within embargo timelines and we have to be able to, um, to do that. And so, and we, we put a lot of work into being able to respond to any issues that are reported in any kind of project. You know, very, very, very quickly, but that is a hard thing for a smaller organization. I

can’t even imagine

Deirdre: You have a whole Linux. I forgot.

Thomas: Like. I guess, like, I feel like I know the answer already, but I’ll ask anyways, right? Like if I was to come at you as like the skeptic saying, I simply don’t trust memory, unsafe C software, like what is the set of things that you guys have done? And I know there’s a bunch of things that you guys have done to mitigate that concern.

Colm: Yeah. So, well, firstly, I point out that, um, almost every memory safe, uh, language, especially the dynamic ones, uh, that you can think of is itself written. At at, at some level, uh, you know, we have exceptions now that they’re able to be self hosting, but you know, at the time, even if you thought about the JVM or, or, you know, huge things like that. And so there’s no like magic that they have access to that you can also do an a C program. So we essentially wrote in, um, our own dialect of see like a pretty restricted dialect of C that takes its inspiration from, uh, functional programming techniques. And so we, we structure all of our memory handling and, uh, IO through a very kind of functionally inspired architecture, you know, called stuffers where we create these, you know, which is like a buffer for. Right. And it’s really, really simple. It’s very simple data structure that keeps a cursor. Right. And anytime you write to it, increments the cursor, and anytime you read from it and make sure you don’t read past the cursor, but when you, when you use a technique like that and you write code like that, it makes everything like beautiful and declarative looking and very functional seeming.

Right. Uh, we did another example of that and how we construct that our state machine, where we’re literally, we’re using like a fixed, you know, table of a function pointers. And all you do is increment your way to the table. And so it’s kind of writing see, like you’d write lips. So ma maybe you’re all kind of a question wasn’t, wasn’t so crazy. And, um, and then, and then we did, you know, just an enormous degree of testing and verification all the way up to formal validation. Um,

Deirdre: oh, I’ve heard about that. Yeah. So it sounds like the answer is you have to write your own variant of C to do this securely.

Colm: Um, I don’t really have to, but it’s a, it’s a it’s, it’s what we did. It’s um, you know, I, I always think of, you know, your starting point is the first layer of defense is always going to be writing the code and whoever’s reviewing the code and you have to make that as, you know, mentally on taxing as possible.

And if you have to make things as consistent and idiomatic as possible, so did anything unusual. We’ll S we’ll, you know, really stand out and, and, you know, for me, that means you probably want to go for pretty restricted, small set of, of patterns that you’re going to program with and use and start there,

you know, and then make tests easy to write and, and, and have lots and lots and lots of testing and test cases, and then do fuzzing. Cause fuzzing is really, really awesome. And then after the app, you’ve done all those things. Uh, if, you know, if you can spare to, you know, think about doing some form of validation as well.

Deirdre: um,

Thomas: is the, what is the formal validation story there? What are

you guys doing?

Colm: Uh, well, we’ve, we, we formally validate quite a bit of S when we do things like we, uh, we formally validated our state machine, that it can’t get into any invalid states that are allowed by the, the TLS specifications. We formally validated our memory safety, uh, by, by, by, um, doing some, some processes. We have a tool called CBMC that we run on, uh, uh, like our core, uh, stuffer and IO algorithms and all that stuff.

And we validate that that’s always going to be correct, no matter what the input is,

uh, we formally validated our implementation of

Um,

Deirdre: Yeah. I remember this. Yeah.

Colm: uh, that was, that was fun because we wanted to see if we were getting better at formally validating things. Because about two years prior, they had been in the, their formal proof of, of, um, uh, different Hatrack implementation. And we wanted to see if we could make it easier. Um,

and, and we formulate from the validated a bunch of our algorithms and most of the post quantum ones that, that we’re working on as well.

Deirdre: You, you have a formerly

Colm: So we’re where we are. We are formally validating those.

I, um, I, think there, I, I think those algorithms aren’t yet like locked in to the point.

You can even say that they’re, they’re validated, right? They’re still open to tweaks on various parameters.

David: you’re validating that the implementation matches the algorithm or that the, some other, some property about the average.

Colm: yeah. So, so most of our validation, what we do is we validate that the actual compile. Um, like machine code matches some simple specification, uh, like declarative specification of a protocol or a safety property, you know, some kind of embarrassed set of invariants that we, we want to be able to hold it to, but we, we, we, try to go all the way through the machine code where we can

Deirdre: got it.

Thomas: I’m Googling right now, like formal, formal verification and S to N and like get lock comes up.

Deirdre: yup.

Thomas: Like w w w I I’m, I’m, I’m going to nerd out a little bit here. Right. But I’m just curious about like, what the tooling looked like and what those projects look like.

Colm: Yeah. So was one of the companies we partnered with. Super-early actually not long after we started as to when we, we had, um, uh, Byron cook join aid west. Who’s now a distinguished scientist at Amazon

and, um, he he’s awesome guy, a real, real leader in the field on, on, uh, formal verification. And he connected us with. Uh, so we could get going before he was able to hire a full team. Um, and we still work with Galway they’re there. They’re awesome. And, um, they have these tools called CryptoWall and saw that are for, um, they’re specifically designed for, for, for, uh, verifying cryptographic code and cryptographic algorithms, which can be, you know, some traditional verification tools from like the safety critical world don’t really apply

because there’s so much entropy and randomness and cryptography, and you have to be able to kind of abstract that out. Um, and, um, uh, they were able to come up with, you know, crypto and soft specifications. Uh, the HVAC algorithm and for the Corps TLS state machine. Um, and, and they, they were able to find issues. It was really cool. They

actually found that there were certain like invalid combinations of, um, uh, TLS extensions that could cause us, uh, to abort the, uh, the TLS state machine early.

Now it wasn’t a security issue, but it was still a pretty, pretty good, fine there’s I can’t think of any other way. We would’ve found that

Deirdre: So is that an in, so is that the, the specification allows this, you can be conformant with TLS one, three, or whatever, and you have all these extensions and that is allowed by the spec. But if you did, if you implemented the spec as written, it would get into this weird state that you really should not be allowed to be in.

Colm: kind of, it, it was so TLS has, um, Supports session resumption

and its state machine,

right? So normally when you connect to something over SSL, there’s a handshake and it’s, there’s a bit of a back and forth and you

eventually negotiate a key and then you encrypt stuff over it.

Right. Uh, but you can skip most of the handshake by doing resumption.

And what they found was if you showed up with this, a weird combination of extensions, um, in which you shouldn’t presume, it could try to resil and then the resumption would fail

when,

and, uh, what it should do is just carry on and do a full normal handshake. But it instead tried to resume didn’t resume and kind of gave up an award at the handshake early when it shouldn’t. That was, that was the issue they found out, which would be really hard to find

Deirdre: So did that. that. sounds like a section needs to be added under session resumption in the TLS one that the respect to be like, well, make sure that if you’re, you know, w you know, blah, blah, blah, blah, blah, blah, blah. So did anything get updated after you

Colm: Yeah. So, Yeah. So, um, so that was on the state machine. That’s part of, you know, SSL, V3 TLS 1 0

1, 1, 1, 1, 2, uh, the TLS one-three handshake is radically simplified

and, and does not suffer from any issues like that.

There’s just nowhere near as much kind of parametrization

of the TLS handshake and 1.3 it’s way, way simpler. And, and that was inspired by issues like that, that, you know, like, oh, I found

Deirdre: That’s awesome.

David: And the one-three handshake or specification itself has been symbolically, formally verified.

Deirdre: so.

David: The spec doesn’t have invalid states with.

Deirdre: Yeah. Yeah. Okay.

I don’t know

Thomas: like, I guess having been through the process of taking a relatively complicated state machine, like TLS and then seeing it formally verified and you’re like, you’re a software developer, like the rest of us, right? Like, are you still comfortable building state machines? Um, building protocols, like just building software without formal verification.

Like I have friends that do this and they never come out from behind the looking glass. They’re just permanently like formal verification people. And from that point on, they just don’t take seriously. Anything that isn’t formally verified. Have you fully drunk? The Kool-Aid on.

Colm: um, maybe not fully. So what I’d say is first, if you have to build a state machine, right. And if at first rule is in your code and your functions do not mix input, parsing and, uh, and changing.

Right. Don’t like mix those core things like separate. Those really clearly try to have like a set of code that really clearly describes how you’re going to go through your states and have that very separate from the code.

That’s going to like parse input and do stuff like that. You know, uh, a lot of projects, there are a mix of those things, you know, there’s these huge, big functions that do both and it gets just too confusing and gets to spaghetti like, and is

unreviewable in, in my opinion. So you first got to do that.

It’s like the biggest thing, and that structure helps. The second thing then I tell people to do is, um, linearize all your possible state transitions. Right? So instead of having. Um, you know, conditions in your code that can go well, if this happens, then jump to this state and so on and so forth

instead,

lay it all out in these linear lines tables of like, well, this is one valid set of state transitions in like one table.

This is another valid set of transitions in another table. If you have too many tables, if that feels like you won’t be able to program that, give up and redesigned a state machine, like that’s a, that’s a big hint that, you know, you’re, you’re, you’re, you’re too flexible in all your state transitions. And if you do those two things, I think matter more and probably prevented more issues and bugs for us. Um, and then, you know, formal validation after that kind of gets you through a hundred percent. It’s

like you do those things. You’ll get to like 80, 90%.

But you know, for, for a lot of patients, probably the only way you’re going to get to a hundred.

Deirdre: This gives me a little more confidence about doing Z cash stuff with the network and the, this and the, that, um, in, in rust. And we basically. Doing, as you described, we have parsing over here and it throws errors. If it doesn’t do anything, it doesn’t understand. We have state transitions and how we handle them over here.

Um, but also shout out to rust. There’s the type safety and Russ makes it very easy to encode states as Eden variants or types. And to say, you can only go for one state into another valid state, you cannot go from any to any, or, you know, to, from valid to invalid or something like that. And you can check it at compile time.

And it’s very nice for that sort of thing. So

Colm: big fan too.

Thomas: You can use as to end with rest, right? Like.

Colm: Yeah. There, there bindings. Um, you can use aspirin for most time. We do, uh, we, we have various roast projects that use us. Uh, including the, do a roast STK for afraid, Wes. Uh, and we go the other way, there are now parts of syn that we’re writing in most, uh, like our quick implementation is, is written in rust. Um, cause, cause we feel like it’s ready and it’s a, it’s a better starting point for all that.

Deirdre: That’s so exciting

Thomas: What are like, what are the prospects for rust fully infecting? That’s the end project. And you gradually hoisting out most of the C code.

Colm: um, not imminent and, and, and in general we try, try to, you know, leave out That’s working alone and, and, uh, and, and not go rattle it. And, um, but, uh, I mean I’d love

to see

Thomas: what, that’s what the OpenSSL people set in.

Deirdre: in what, 2014.

Thomas: Well, I mean, they can say it now, and it’s much more credible because like lots of stuff has happened.

Colm: yeah. I mean, I it’d be great to see it someday, but we don’t have any of them.

Deirdre: Yeah. I meant to ask it. Does AWS have a fuzzing cluster or are you leveraging open, open fuzz or whatever it’s called.

Colm: uh, so, so we do so, and Amazon, those kinds of practices are kind of up to each team and

what they want to do, but we do, we do have some centralized posing infrastructure. Um, and we do have a compute cloud. It’s called it’s

it’s it’s

it’s got,

Deirdre: heard

Colm: it’s got, it’s got some compute. It’s got a, it’s got a few instances we can use now.

And then,

and, um, we, we certainly do. We certainly do use it. Um, and um, I mean, we, we’ve been focusing on as to when for it’s been running for years and years and

years at this point,

Deirdre: Awesome. Do you, did you write your own management to run and report and correlate or are you deploying? Cause I tried to deploy. Whatever they there’s a pro cluster fuzz. Uh, oh, OSS fuzz. The Google run. One is they have a piece of software. That’s like, you know, Kubernetes, here’s how you have a web app that deploys your, your fuzzing infrastructure.

And then you can tie it back to your, you know, your repository or whatever it is. Do you have something like that?

Colm: well, the, the first, th the first fuzzing tool that? I wrote, I just use elastic, MapReduce.

Um, and I’m kind of like true, true without it that way. Um, I’ve, I’ve seen us use, uh, Lambda for it as well.

Um, just as a nice, cool demo. Um, you know, I think land is cool for folks saying, obviously you

can’t run things for very long, but it’s still, it’s still useful for, um, uh, integrating it directly into the build process.

If you want to get just a really, really quick. Yes. Now when something,

um, and we, we do that sometimes. Cause, cause it’s not feasible to run it on everybody’s desktop for that kind of stuff, but a. I, I don’t know, actually I should ask the team what they’re, how they’re coordinating and running at these days. Um, there’s, there’s a lot of different ways to run something in parallel across, uh, many easy two instances.

Deirdre: Yeah. But like specifically the, like a lot of people will write a fuzzer and then they’re just like run the fuzzer for, I don’t know, some amount of time as part of their CGI or CD or something like that. And then, you know, you’re not really getting the, the real benefit of fuzzing. You need like a continuously running fuzzing infrastructure, and then you also need it to report when it does something.

And then you have to correlate it back with the change that actually, that you found the thing on and all of that work, not just the like deploying lots of compute in parallel. It turns out to be a only one solved problem, at least as I’ve seen, actually that’s not true. Um, there’s cluster files. If you run it yourself, someone took cluster phys and they were trying to run it as like a service, not, they basically OSS opensource, uh, fuzz, but like pay them to do it well for you.

And I think they shut down or something like that. And that made me very sad. So if you have software that makes this task easier for people to do, or you know, the people who have it, I am interested and I would like to see more of it in the world, please.

Colm: sure I’ll ask them. Maybe, maybe we should have a service for it. I always viewed the stuff that’s integrated right into your CI pipeline is, is really just to give the developer feedback that they haven’t broken. The first thing infrastructure.

Deirdre: Yeah, but you want to auto deploy anything you’ve changed with, uh, two, like you’re fuzzing and for infrastructure or whatever. So either way, but, uh, Uh, enough about us to hon tell us about MTLS.

Colm: oh, wow.

Thomas: I would just say I would actually, I I’ll put something in the middle there. Right. Which is just like, you’ve now had the experience of being like firsthand to a, you know, ground up implementation of TLS. Um, I’m still, I think I bring this up every time the subject comes up. Right. But like one of my favorite people is Watson lad and Watson lad had like a comment that has stuck with me forever on the CFR G mailing list, which is the IATF crypto review board, um, where he compared TLS to like an undergraduate, like secure transport, like undergraduate homework assignment, and said that you would have gotten a C on it.

If you had turned it in, what’s your general take on TLS at.

Colm: Um, I, well, I think it takes all sorts of internet standards are like that. Right. You know, it’s and you can’t be too hard on them because you know, a lot of them came about to a culture of experimentation and iteration. Right. And then sometimes something takes off and succeeds wildly before maybe the, you know, people got another chance to iterate and then you’re all stuck with it because, you know, it’s just baked into everything.

You know, my, one of my favorite examples of that is actually like TCP and UDP, like even like way back, like UDP has this crazy designer where it does fragmentation at the IP layer,

right? Like the header and a UDP packet is only in the first pocket. Uh, you know, of a fragmented data gram, which complicate you would not believe the amount of extra money that makes routers and switches.

Right. Cause they have to be able to like reassemble those packets and so on to be

able to do flows, but it’s flow switching. Right. And you would look at that and you go, well, you’d get a C minus on that design. It would have been trivial to just put the UDP header and in every, in every packet. Right. But you can’t, you can’t see it like that.

It’s solved their problem. And it did it really well and, and, and took off and TLS is the same. I mean, you can look at it and say, oh my God, look, they got the defi, how many exchanges of runway around? I mean, this is like, this is, this is clearly a C, C minus, but you know, they, they built a really cool pro protocol that could effectively emulate TCP enough that you could just bolt on existing protocols and get going, you know, and it’s, it’s a, it’s a good example of an MVP succeeding and and maybe we got better.

We have to get better at iterating. And it shouldn’t take 20 years before we’re able to like all come back. And, you know, at that time we’re at a, a better, more optimal design.

So I kind of see.

Thomas: I feel like there was a point where you warned me about the UDP fragmentation thing. We did like an all BPF implementation of UDP, um, for fly. And like, you were like UDP fragmentation that you didn’t use these words. Cause you don’t use words like this, but you were like UDP fragmentation. That’s going to screw you, um, with like DNS sec and I’m like, oh, it’s going to screw up DNS sec.

Oh no. And then I moved on that’s only

because you mentioned it only because you mentioned it now, do I remember that you warned me about that, but it’s not like, I guess like part of the, part of the reason we’re bringing MTLS up is like there’s. Uh, sort of brave new world of how, um, like people say microservice is, and I hate that term, but like modern application service ensembles, um, that’s mass mace.

That’s my new acronym. Um, anyways, for these mace applications, right? Like there’s this notion that like, you’ve got all these services running now delivering the same application, they’ll talk to each other. Right. And like, we now have an opportunity to use TLS to secure the connections between those services.

Right? Like you’ve got all these random things talking to each other and like in the bad old days, there’d be no good way to kind of authorize. Who’s allowed to talk to what it’s all just kind of like, you’d run TCP dump to see what’s going on. And now what we can do instead is like bolt a proxy on to everything and have it talk MTLS and like, like MTLS is the way that you would like TLS the TLS protocol that we’re talking about.

Right. Um, but in mutual mode where you’re presenting both client and server certificates for things. And if you look at that, like you can get a long ways, um, you know, into the kind of the authorization and authentication problem. Um, you can get a long way of solving those problems just by using certificates and both sides of TLS.

Right. And that’s essentially a, it’s kind of where Kubernetes is going, right. It’s towards something that looks like that. And I gather you’re a great fan of.

Colm: I am not. And I think what I, when I first saw the designs from, from like Istio and spiffy and so on, and I sent them a very long note with like my 56 point detailed critique of why you really don’t want to use MTLS for this. Um, uh, but, uh, you know, at the same time they’re solving a problem. Right. And if they’re plugging into a layer that they can, but I’ll, I’ll, I’ll try to give some more detail. Um, I, I have a long history with MTLS. Uh, you know, w when I was still in college, one of the ways I was kind of paying the bills, I was, I was, um, um, uh, was a member of the Apache Haitian V project,

right. Uh, right writing code for, for the Apache web server. And, uh, at that time it was still mostly non us folks who were working on the SSS. Because of, you know, silly, uh, crypto export restrictions and so on. And, um, and so, you know, I would, I would help people with the, the, the SSL stuff and, uh, I would help developers with them and I would help, you know, people who are just running Apache with them. And I would, I would do these workshops and, and, and through that, got into this kind of big business of being an amateur auditor of, um, people’s MTLS setups, you know, they would, they would come to me or to somebody else at Apache said, can you take a look at this and see if we’ve actually done it securely? And, uh, first I was not a professional security auditor or a viewer or anything even resembling that. So the fact that they were coming to me and I was, uh, you know, pretty close to, um, maybe one of the best people out there to do that was a very bad sign, that this was not a very mature, you know, ecosystem. And I literally. Every single case I looked at would find unbelievably low hanging issues, like stuff that it just didn’t work at all. And, um, and on top of that, it was, you know, it’s a really complex ecosystem when you’re, when you’re using MTLS, there’s a lot of ex 5 0 9 flying around a lot of string parsing flying around. And, and like I was saying earlier, like you’re gonna have to respond and update to a lot of security issues when you bake that really deep into your stack and that’s, that’s gonna slow you down. But like just some of the top things, literally revocation almost never had, they built a working revocation system. And I kind of think about, you know, when people tell me you should back up your data, I’m like the first question I’m asking is, well, how do I restore it? Like, that’s the important part. Right? And when somebody tells me, you gotta be able to rotate your passwords, I’m like, I don’t really care about rotating them.

I wanna know how you revoke them. Like, and tell me, you know, H how do you make sure it’s something can’t be used again? And they weren’t. Almost everyone would like, well, we’ll build that later. Or, you know, they would use CRLs or some huge lists and then I’d ask them, well, what if you have to revoke everything? Like, is that going to scale? And they just never really have an answer, you know? And, uh, it was, it was, it was kind of scary. And then I would find these cases where, uh, authentication wasn’t happening at all. You know, people, people couldn’t tell,

like they were using the system, they were going to some internet sites cinder in their web browser.

And then, you know, everything worked, but under the hood, the client certificates weren’t even being used.

They were just getting regular TLS or I suppose there’s no easy way to tell, you know? And, and, um, I found cases where people were doing authentication just based on the strings that are in X 5 0 9. Uh, my, my favorite example of that was, um, uh, one of the, one of the first projects I did, we found that the CTS. uh, yay. Right. His executive assistant, um, could pretty much do anything she wanted and like she had God level power in the system. And at the time I thought it was, well, we must have put her in the CTO’s group and the group CTO’s group has, has real power.

And so that makes sense. We’ll track it down and figure it out. Um, but it wasn’t quite that it was literally just that she had the word admin and her job title

Deirdre: Oh, and they were just matched pattern matching strings

Colm: yeah, they just had a rejects. That was those looking for admin. and and didn’t, didn’t, didn’t,

have the dollar sign Terminator it’s stuff like that.

And it’s, and it’s, and it’s, it’s just follow that now now these things that are building on it now, you know, these, these mesh networks and so on, you know, they’re, they’re much more professional and they’ve, they’ve thought about a lot of that and they’re, they’re compensating for a lot of that, but,

they still mostly still have X five or nine, you know, stuck in there pretty deep.

And so you’re going to have to update for every X 5 0 9 parsing issue that comes

out,

Thomas: I feel like, I feel like, I feel like revocation is kind of where you sold me on this. Like, you wrote a long thread on Twitter about your, uh, your MTLS grapes, which I am going to shamelessly plagiarized in a blog post at some point. Right. Um, but like we had written like a blog post kind of cataloging, different inner service authorization things.

And we said the fond things about MTLS and you said, unfun things about MTLS on Twitter. Right. And like, I think going into it, I might’ve been prepared to put up a fight. And then you talk about the revocation thing and an immediately clicks for me that like the revocation stuff in TLS that we’re familiar with and that we talk about is not the same problem.

As like inner service revocation, or even kind of any kind of like API revocation, right? Like they’re just different problems, right? Like when we talk about internet scale revocation, we’re talking about generally targeted attacks or specific misses issuances and things. And like, there’s, there’s a sense in which that system kind of converges on correctness over time and you kind of hope things shake, you know, shake themselves out.

And anyways, any attacker that’s going after that system, like, they’re a passive adversary that has control over traffic anyways and all that. But like in API authentication, you have to be able to revoke like there’s no not revoking, right? Like you lost a credential. Like if, if you, if you can’t revoke it, then people can keep like forever using that credential.

It’s like the system permanently loses security. And I feel like, um, I look at like how, just how these systems are built. Right. And I get the sense that they think that they’re drafting off of a lot of security that TLS has that TLS never promised to provide. Right? Like there was never a notion that, that, that TLS was going to solve, you know, fine grained, immediate real-time revocation, the way that we expect, you know, evenly author or something.

Colm: Yeah, it re revocation has always been the stickiest problem in NTLS on both sides for, for client certificates on for, for service certificates and, um, Anderson some genuinely hard problems in there. And, and, and, you know, maybe there are better ways to solve it outside the TLS kind of ecosystem. Um, you know, we, we we did for our inter API or inter service auth that we designed for to west, we decided to do everything at request level, right.

authenticate every specific request. Uh, so in TLS you’re authenticating the channel, right? And then you’re just blessing the channel and anything that happens over that channel inherits the author. And that means you can’t authorize and authenticate specific transactions at the same time,

which, you know, we just feel is a very weak security. Um, and, uh, so we, we just go for it at the request level. And then we use, I mean, we have uncountable large sets of identities and credentials. You know, we, we issue very ephemeral, uh, identities and very femorals, you know, session credentials that can last seconds hours, you know? And, and so you could never even try to, to do that, something like that with, uh, with TLS. So some you can kind of try by baking in E-box, you know, some, you can kind of mint shortly of credentials and say, well, this expires at a certain time or past a certain epoch, but that doesn’t give you the ability to revoke on demand. And

also if you’re able to isolate something or if you’re able to influence time and often things are just using NTP for

for time, you can overcome that too.

So it’s, it’s full of all these little gotchas. So we just kind of went through everything at the request level, and we’re just going to use, you know, haitch Mac and symmetric keys and, and, and, and go at it.

David: Um, so let’s say though, like, just to play the other side a little bit, like if you are authenticating on every request and you have revocation, that means that like more or less. You’re doing like a revocation check out on each request. Um, and so is there any reason that you couldn’t, uh, just do O CSP on every TLS connection?

Like you’re, you’re, you’re paying the check. Every connection costs either way. So why not just pay it in TLS?

Colm: Yeah, so we, we, we, we don’t quite do a revocation check on every request. Instead we kind of do proactive, full life cycle management of every credential. Right? So you, you, when you create a session credential, you, you push it out there. Right. And it gets to the places that can authenticate. And when you want to validate or revoke it, you do this. Right. And there, there are some, you know, fall back safety measures. If something becomes isolated, it knows it’s isolated and stopped serving requests and so on. But in general, it’s alive, you know, positively acknowledged feedback system, which is really, really important. Right. Because if you’re making changes, right, like you, you don’t want to use a new identity or credential until you’re sure everything that could authenticate when it has it. Right. And in the opposite direction, you don’t want to stop using one until you’re sure. Um, it’s no longer in use, right? You don’t want to kill it unless you’re like, you know, sometimes there are cases like, you know, uh, let’s say one of our customers says fired an employee and in negative circumstances, in that case, they do want to, you know, break their access very, very quickly. Uh, and you can do that, but, but at the systems pushing things in general, it’s not having to go do checks. So you’ve got efficiency and you got the management to.

Thomas: I feel like that there there’s, there’s stuff we’re thinking about in the channel versus message thing as well. Right? Like, um, you know, probably one of the most important kind of server side attack vectors right now, or I’m, I’m like a year out of date because I’ve been doing just pure software engineering for the last year.

But like when I stopped doing assessment, like one of the major things that was like, you know, important for service side, like when we were doing assessments and stuff was things like SRF things where you have like, you know, an existing channel and then, you know, just being able to send a message over it.

Um, it’s, it’s counterintuitive, right? Because like, you wouldn’t imagine that just being able to turn a server application into a proxy would be that big of a deal. Like you can get a proxy anywhere. Right. But like, if you’ve got like a, if you’ve got a trusted HTPs channel, um, that it’s already trusting and that anything that goes over it is blessed.

Right. Then you’ve kind of got game over. Anytime anybody gets a way to slip a message into that channel and you don’t have that problem with like 64 message authentication or.

Colm: yeah, correct. And, and I think probably D sink is probably even the newer kind of a form of that. Right. You can, you can imagine a decent issue if it occurs at one of these proxy layers that’s that’s, um,

using MTLS to bless the channel. W w we’ll have that issue in a, in a way where an authenticated request will not

Thomas: Um, I’m guessing that like James Kendall hasn’t tested that yet. Cause cause no, one’s really testing Envoy and stuff for like, you know, you can’t write a scanner or you can’t make a, make a bird pro you know, a Burt plugin that doesn’t, I’m saying this and I’m going to point out it exists. Right. That’s that’s, that’s a smart bot, right?

Like there’s probably is something there.

Colm: Yeah, that’s probably it potentially a target rich environment, but. I mean, w we, we came to it from that perspective too, right? Like there’s all sorts of ways. You can end up breaking an HTTP request and putting strings in places they shouldn’t be and, you know, bad escaping and so on. And so let’s be more defensive there.

but the other one was, oh, the one, uh, we were thinking about is, well, we want to be able to target things like, you know, banking apps and, and, and financial customers and so on where they literally want to sign the specific transaction and sometimes even want to sign it offline because they’re, they’re, they’re, they’re very paranoid with their keys. And, um, it’s very hard to do that with MTLS.

David: so

Thomas: go,

David: time. Go ahead. I was going to move us to a different search

these case.

Thomas: I was gonna do the exact same thing. We should leave this on the edit so people can see how the sausage is made. Di

di di di don’t do that.

David: They can see how the sausage is made, so to speak. Um, great. So let’s say, uh, you’ve sold me on why you shouldn’t use MTLS for like microservice authentication. Um, but what about in like a kind of employee or device authentication or like zero trust scenario, but let’s say SSH to make things. Like, let’s say I have servers, then I have employees.

And, um, while you can certainly make the argument that nobody should have SSH permission in anything. Um, let’s say you need that type of permission. And you’re like, okay, what I’m going to do is I’m going to build like a key vending machine. Um, that’s going to give out client, um, that’s gonna use my organization’s SSO to give out SSH client certs.

And then people will authenticate to whatever server they want to using this clients are. And that way my config management only needs to push out the, you know, the, the verification side for SSH. Um, this, this also kind of gives me a little bit of an edge in the argument since a SSH doesn’t use X 5 0 9 certs.

And like, you can make me, you can, you don’t need to explain to me why X 5 0 9 makes things complicated. Uh, But like, what about a scenario like that? That is an API authentication. Is our client search just fundamentally broken or can they be used in other scenarios?

Colm: uh, I, I don’t think lion’s certificates are fundamentally broken. I think PKI, you know, is, is useful. Um, and, and SSH is example where it, it, it is useful. Uh, I, I’ve definitely seen customers build setups like that. Uh, although maybe they’re not as common as they should be. Most people are still just using, you know, either SSH passwords or, or just generating a public key and dropping another box. But, um, you can, Um,

I think a lot of my arguments against MTLS fall away in that case because they’re now, you know, low longer, um, you’re no longer just trying to create this like TCP compatible, you know, pipe or tunnel that almost anything can run on. And you’re just going to ignore, you know, the context of that protocol, you know, SSH is very coupled and they’ve really taught through how, how very carefully, you know, the implications of, of public and private keys and how that impacts the protocol and how that affects things and so on.

And I think there’s, there’s, all sorts of other great uses of PKI systems, uh, for building, you know, identity frameworks and, and giving people, um, you know, long lived identity. Sometimes it’s, it’s really the only way to do it, like with physical cards and so on. If, if,

you want to be able to, to, to work with systems that work like that, and they definitely have their place,

Thomas: I guess, like I might stick up for it in, you know, I might stick up for MTLS itself in a couple of scenarios, right? Like, um, W we, we use some of the, we use some of the HashiCorp stack at fly, right. So there’s the there’s console in the mix and there’s some nomad and the mix for us. Right. And, um, HashiCorp is really big on MTLS authentication things.

I think go programs in general are, are, are MTLS positive. Cause go makes it pretty straightforward to use, um, client certificates. Right. And in those settings where I’m basically expressing what is sort of, kind of a network topology to begin with, like the relationships that are changed with my network topology.

Um, but there isn’t a whole bunch of like issuance going on and stuff like that. Right. Um, I think I kind of like about. As opposed to like fine-grain request authentication is that if you don’t have the client certificate, like if you don’t have the root secret for talking to the service or whatever, you can’t talk to it at all, um, you’re just like kind of locked out.

Right. And that, that’s a thing that MTLS does that I do kind of like, right. It’s similar to, similar to SSH, I guess, too, if you, if you’ve turned password, password authentication off, then almost everything that people write about as Sage hardening goes out the window as well. Right? Like, unless you don’t trust the SHQ exchange and stuff, which hasn’t been broken in forever.

Right? Like, um, I do sort of like the idea of, you know, really coarse grained kind of, um, you know, access control rules express with MTLS. You can tell me I’m crazy about that. You should, by the way, if I’m crazy about that, telling me that I’m crazy.

Colm: uh, well, I I’ve definitely seen it fail open more times than I’ve seen it fail closed. Where, where people, you know, just stand up a web server tank, they have mutual auth working and it turns out they don’t. But I, I, you know, the go ecosystem is much better at that. It’s way more explicit. I can, you know, it’s harder to get wrong than, than an Apache config or, or, or an engine X config where it’s really, really pretty intimidating recipe to make sure this thing’s actually turned.

Um, but the, um, so that, that part I agree with. I mean, at that level, it’s kind of logically identical to a VBN right. It’s you know, and, and sometimes VPNs make all the sense in the world to

Deirdre: Um,

Thomas: So we were, we, we were talking before we started recording and we were talking about like, roughly what we were to talk about and you slipped in right before we started recording that. If we put you on the spot, you might try to make an argument that yes, CBC was more secure than a GCM, right? So C CBC is like, Old-school AEs.

That’s how like AEs protocols were designed. It’s how blocks block encryption stuff was done since like the 1990s. Right. Where in particular, you’ve kind of, you separate out the encryption part and the authentication part and you kind of compose them from primitives kind of generically. Right. And GCM is like, you know, it’s like a formula one car, right.

The whole thing is just hermetically sealed around doing both authentication. Um, and you know, and the bulk encryption at the same time. Right. And like, I think a lot of people liked GCM. I’m not a big fan of GCM. It seems really brittle to me, but I am a fan of AED ciphers. Like, you know, like I would use like cha-cha on poly 1305 or something like that before I use CBC, I guess I sort of see where you might be coming from on that, but like make the argument, convince me that I should use.

Colm: Oh,

my God. Give me two things to argue against there too. Cause I’ll have to have to get back at you about AED as well, but that. So, uh, so TLS kind of famously got a CBC the wrong way around, right?

There’s uh, you can’t, you can’t use a CBC on its own. You have to couple it with some kind of authentication algorithm or a Mac, right. And TLS uses H Mac, but it, you know, if you give it a plain text, it will do the H Mac first and then encrypt the H Mac, which is the wrong way around. That’s

not what you want to do. It means somebody can, you know, now means the cipher text is malleable. Somebody can mess with things and try to do experiments. And, uh, and we got security issues out of the it’s most famously the lucky 13 security issue, right. Which, uh, Kenny, Kenny Patterson and co-found, which is awesome. And, uh, and, and, you know, showed us, but definitely don’t do it that way around, um, Um, now you can, you can use it the other way around securely, right?

If, if, if there was a cipher suite defined that, you know, when You’re decrypting check to HVAC first, and then decrypts it, you know, I don’t think anybody out there would actually w would have any standing issues with, with a CBC except except maybe performance. It’s not quite as performant as a S GCM. Um, but a CBC has like a lot of positive properties that if you’re thinking from the perspective of like a real world attacker, somebody who might really try to go at, um, your protocol, it’s, it’s really defensive.

Um, you know, th the, the biggest one is if you screw up in how you generate your initialization vectors or nonces, which are just these things, it takes to it, to encrypt with it, it’s much, much more defensive of that. In fact, G GCM can, can break wide open in a way where you’ll be able to, um, to forge information and, um, and second. It was, it gave us a measure of land hiding that I think was really important. Uh, and, and so, you know, the, the cryptographic research community out there, you know, their job is to advance the art and think of, you know, the next cool break in cryptographic protocols. It’s not to defend real-world systems from real-world attacks.

Right? And, um, it’s staggering to me that the most practical attack on TLS to this day is, well, if somebody is passively tapping traffic, whether that’s, you know, shared wifi or whatever, they can see the length of all the information that’s going by. And the rate of the information that’s going by righteous plastic, traffic, traffic analysis stuff, and any measure of land tiding helps protect that.

And there’s, and there’s like real practical attacks here. You know, people have done research, they can, they can see. You know what map you’re looking at in the browser, because they figured out the mapping tiles or what video you’re watching on a video streaming service. Cause they figure out the size of the movie segments, or, you know, if you’re pushing voice around, you can, you know, just watch for the silences and the breaks and figure out what speech patterns are.

Right. And most ordinary people would be really surprised to learn that. And we also be really surprised to learn that like these encryption things don’t actually protect their information in that way. And at the same time, most cryptographers who would be surprised to learn that they are surprised,

you know,

Thomas: you’re you’re you’re

Colm: like, of course you could do that.

and

Thomas: getting just to make sure that I’m tracking this. I think I am right. You’re getting the length hiding from CBC because CBC has padded because w what, when you, when you put together a cipher texts with CBC, it’s gotta be a multiple of 16 bytes long. And if it’s

not, you fill out the balance with garbage,

Colm: it. And it’s a relative, it’s a relatively ministry. Amount of padding. Like I’d prefer a much larger amount of padding. And now TLS 1.3 has support for larger amounts of padding, but no one’s using it yet, but, um, but it’s even that 16 bytes it’s actually really effective at, um, hiding the URLs, right? Like when you, when you think about trying to analyze the

HTTP session, you’ve got two attempts, right? You’ve got the size of the request and the size of the response that you can use to try to fingerprint what’s going on. And it really makes the attacker job much, much harder on the request side, because a lot of URLs will collapse into that same 16 bytes and, and this stuff has, uh, you know, it, it, it increases attacker difficulty by, by many, many, many, many factors, you know? Um, and it’s just a shame to lose that, you know, over, you know, like it’s, it’s like, we’re not, yeah. It’s like, we’re not thinking through the perspective of like what sit down and try to do a practical attack on TLS. What would. It’d be something like this. And it’s like, well, we actually went backwards a bit on defending against that,

which is

just kind of reverse.

It’s

Deirdre: w

where, so my, my instinct there is that, um, the block cipher mode or an authenticated encryption with associated data mode is like, it’s a different level of abstraction than HTTP over TLS length extension, or, you know, length, observability attacks, blah, blah, blah. Like it, if you are designing your BlockCypher motor, your AED thinking about that, like, are you kind of.

Are you out of your element? Are you like, or are you going to try and shove too many sort of things into your thing that you’re designing at a far lower level of abstraction of the primitive or quote unquote primitive that you’re designing, um, or is, or they are basically, should you consider that?

Because I don’t know if CBC mode was ever, I don’t, I doubt it was designed with that sort of security when integrated into a higher level protocol in mind, it just was a happy, happy accident, basically.

Colm: yeah, it was definitely accidental. Um, security. I um, so I definitely think padding should be part of the AAD kind

of API and, and myself and shaker on designed, uh, an AAD, um, spec, uh, two years ago. Scram about Ray. Yeah. So we put padding right there, front and center. So it’d be in the developer space. You need to take about how much padding you should have. Right. And here’s why, here’s why that matters. Um, a and, uh, you see it ill considered, you know, in, in, in a fair amount of applications and protocols out there where folks really haven’t thought about, you know, just basic blinding, uh, information to attackers. So I think, I think we have some more work to do

there, but you got, you kind of put absolutely everything a developer has to think about in the AAD, you know, fingerprint either.

Thomas: Hey, David, you’ve got your name on some TLS papers. How convinced are you?

David: Um, I mean, I I’ve, I’ve been generally skeptical of most like timing type attacks. I don’t know about traffic analysis as much, most anything that falls under the side channels. Um, as to like, what is revealed there that simply like existence of the connection doesn’t reveal there’s spots where it’s like clearly terrible, right?

Like the old Dawn song at, at Berkeley and SSH with the whole keystrokes on the password thing, bathroom from, from back in the day. Right, right. That’s clearly an issue I’d believe that like many, uh, Audio formats might leak this type of stuff. Um, but, uh, and a lot of those are like SSH, for example, has like specific things in the implementation to avoid the, the, the keystroke timings.

Um, and I believe that like, even if you have the 16 pies of padding and audio, you might still, um, need to do things like that. So I don’t know. Um, I was never much of a timing person. I, I agree that, you know, anything you can do to mask lengths and mask stuff is positive. You see this with the century, censorship resistance stuff, a lot more that just like anything that can be used as a side channel on that does get used as a side channel.

Um, and sometimes like a really silly stuff. Um, and sometimes, or even just the attitude of like, well, surely they’d never blocked AWS. And then they like block all of AWS

type stuff. So I don’t know.

Thomas: you’re bothering me? How, how, how, uh, how seriously do I need to take the SSH? You start timing thing. Is it to the point where like, if

David: oh, that’s been, that was fixed like 20 years ago.

Thomas: Right,

right. But

like, but it’s, it’s, it’s implementation fixed, right? Like, but if I’m using some random, you know, a Sage library and a memory of safe language where like, I trust the crypto, but like there’s a whole bunch of other things you have to think about when you implement this Sage, is that one of them like, do I need to look for that?

David: um, that’s probably the first one you should look for. I say this gives cause like I’m tangentially involved. I’m involved on the, um, on the transport crypto side, not the SSH side of, of like, uh, kind of SSH reimplementation,

um, for reasons. Um, but, uh, I don’t know. I would believe that the. I haven’t checked this in the go one, but I assume that the go implementation does this correctly because go as a lot of smart people that worked on that.

Um, and like, I just, I, I personally wouldn’t touch drop pair with like a 10 foot pole regardless of the keystroke timings. Um, and that’s about all of the SSH implementations or as a Sage drop error. And

Thomas: Literally

David: one other one that I basically, I mean, I wouldn’t touch anything. That’s not open as, as atrial.

That go one is my answer to you.

Deirdre: Okay,

Thomas: as you say this, I’m remembering that I wrote our own SSH implementation for fly.

I didn’t go look for this. And

it’s also, it’s like,

David: it imports the Golang slash X one, which is like the same thing that

Thomas: Yeah, but it’s a, it’s an, it’s an X library. They’re not the same. I’m gonna, we need to bring

Felipo on, like right now, right

David: Yeah.

Deirdre: The

Colm: Y Y you got to watch out for the, the, the CLI utilities though. That’s the problem right there. The way it works in SSH is the second you go into line buffering mode, right? Or, or password input mode where you get the asterix is it tells your terminal over, over SSH to like, not send the input until the carriage return.

Right.

And, but if somebody writes an application that doesn’t go into line buffered mode, but accepts the secret, like it’s it’s game over and you still see those.

Deirdre: You can

Thomas: I’ve seen that. I’ve seen that in a sensible, secure messengers before where they send real-time updates. Like who’s talking like somebody is typing, but like every time they type, like you get like fine-grained messages, like when the keystroke actually happens. And I always wonder, like how crazy that is.

Like, um, you know, you probably want to do something that like deep bouncer jitter that or something like that.

David: Yeah. I mean, I saw stuff back in like 2014 of like, you know, just figuring out what your type is just by listening to it. So like,

Deirdre: yeah,

I hope that would

David: sure it’s not.

Deirdre: Um, coming back to CBC, uh, or other eighties, uh, I think that if you came up with another AED and Rosa, like, Hey, it’s all padded so that you can’t distinguish lengths. Some of them. It should be secure without all that extra padding, you know, blah, blah, blah. Why is it so fat?

Um, and it would be hard to, it was hard. It would be harder for you to like, get support for your thing if it was, uh, you know, thickly, padded, even if it was to mitigate exactly this kind of attack when it’s used in a higher level protocol.

David: I mean, we saw like exactly this with like AEs the first time around we can all, we can go and look up that email from, uh, why am I blanking on his name? The. The. moral character, Phil Rockaway from like 1996, where he’s like, you mother fuckers are fucking all of this up. Like he would never say that he’s like one of the most timid people, but

Thomas: He has a really, he’s a really

David: basically describes everything that goes wrong in the next 20 years. Right then, and then they’re like, eh, but performance or like, and this isn’t realistic

where he’s just saying like use the AED abstraction, which like he had just, he had basically just come up

Deirdre: And then like 20 years later, everyone comes back and we’re like, we’re sorry, can you like release the patent on this? We really want

David: Yeah. Or same with like, you know, let’s build custom block. Like we have custom block cipher modes specifically for disc encryption because that’s a separate problem. It doesn’t seem unreasonable to me to want to have a separate block cipher mode or ver AADs that are designed for length, hiding to be used in transport layer and

Christian,

um, like,

like that doesn’t,

um,

Thomas: I.

David: that seems reasonable to me.

Thomas: I know the thread that you’re talking about, right? That’s the one where like people were

referring to Phil Rockaway as like a so-called cryptographer when he was

trying to fix IP. Sec, all I have to say about this is that obviously the next time we do a reporting, one of the things that we need to do is a dramatic reading of that email with you as the anger, translator,

David: yeah,

Thomas: not kidding in the least it’ll work.

Deirdre: it would be great.

David: yeah.

We can get a, we can get a, the, uh, Jason on for wire

garden. Do that.

Thomas: I have, I have just like one more question for you Colm, which is, um, in a day that shall live in infamy. I, on this podcast, um, we had, uh, Ryan Slavey, who was one of my personal heroes. Um, and I, uh, kind of w with some hubris brought up DNS sec and asked if it was okay for us to stick a fork in it and declare it dead.

And, uh, he gave a whole speech that has shaken me to my core for the last two weeks about how, what they’re doing in Europe with a certificate authorities means that there is a future for DNS sec. And in fact, you know, we may be soon in a Dane world, and I would like, I’m not even asking you a question so much as making a request, which is, would you say he’s wrong,

please?

David: and it’s important to note that that request was not individually authenticated. We authenticated as transport connection first. So

Colm: sure I. I think DNS sec, I it’s probably best described as a zombie right now. And, and it is, it is the living dead. Um, and I, I say that with a twinge of regret, you know, um, I think at this time, you know, at this point I feel like DNS, DNS sec is mostly being propped up by regulation. At least the folks, uh, who I see using it tend to be the folks who have, um, government mandates to use it or, or icon mandates to use it. And which I guess is just another form of government. And, um, they, it, isn’t getting too much commercial adoption and I’m, I’m kinda skeptical of the protocol’s fundamental future just because it seems really poorly positioned for like the next round of updates or, or. You know, it’s, it’s, it’s really, really hard to update the parameters in DNS, like very, very slow. Some of that’s, um, due to the nature of the protocol and just how crufty all the things out there that support DNS are. If, if you think that the challenges that, you know, Greece had to overcome in the TLS space or anything, they’re there 10 times harder in the DNS world where there’s all sorts of very old DNS implementations out there that really can’t handle any changes.

And, um, and you know, things will come along and updates will have to happen. And what are folks going to do? You know? And, and then the other part of it is there’s just, there’s not much of a, uh, you know, deep technical community around it who are able to steward and shepherd all that. You know, TLS has a really deep bench of, uh, folks who work on that, who, you know, got to work every day and that’s their job. and and they keep the world moving on that stuff. And DNS sec doesn’t really have that. So it’s, it’s really hard to see it kind of thriving. Um, You know, I, at the same time, I don’t want anyone listening to think, you know, w we don’t stand over the DNS sec or, or MTLS invitations we have at AWS, you know, we’ve, we, we, we support these things and we put a lot of work into getting it right for people and making sure that they, they work really, really reliably. But you know, these, these broader ecosystem concerns are real to

Thomas: I feel like you hear sometime it’s like, You know, I think that gets brought up with DNS SAC a lot. Is that like the ecosystem as deployed right now is mostly RSA 10 24. Um, like there were like recommendations to use 10 24 for like the mainstream keys. Then the longer live key signing keys can be like, you know, you know, you know, two K RSA keys, but like the obvious criticism is just, it shouldn’t be RSA at all.

It should be curves. And then people will point out, well, there is curve, you know, DNS SAC, right? Like you can use the P curves with DNS. Second CloudFlare has an implementation of P curve DNS. Second. It’s like, there’s like two problems there. Right. Um, you know, the obvious problem there is that P curve DNSS has like, you know, 1990s curves.

But like the other problem is just like none of the deployed, you know, the actual deployed base of TLS are of DNS sec is, is P curve DNS sec. Right? Like you imagine how long it would take to get that stuff deployed. And it’s, you know, it seems like it would be an attorney to get that attack to actually happen.

Let alone getting like curve 2, 5, 5, 1 9, or something like that.

Colm: Yeah, and it’s been, it’s been at least 10 years just to get 10, 24 out there. Um, and, and that’s gone pretty, pretty slowly. Um, it’s I don’t even know where you would start on, on if you wanted to really deploy PC and you’d have to turn off RSA as well, really to get, to get the security benefit. It’s not, it’s really two steps and getting to the end of those two steps, I mean, at the current track record would, would take over 20 years. So it’s that, that’s why I’m skeptical of it, but you know, it’s, it’s a protocol. It’s got security in the name, um, for folks are going to. You know, w want that extra comfort of using it too sometimes. And I, at least we’re, we’re, you know, at least we’re getting better as an industry of avoiding outages and, and, and, you know, operating it very well.

We’re starting to see more mature implementation show up that, you know, safeguard against that. So it makes it kind of, there’s less downside to using it than there was before

Thomas: I feel like when, when, when Ryan was talking, I, by the way, I’ve big on everything. I’ve said this time with the words, I feel like I feel like a lot of things today. Um, but like I w when, when Ryan was making his case for this, I think I should have been more forceful. And I think you’re seeing me attempting to be more forceful about my dislike for the NSX right now.

So what I’m hearing is it’s a zombie, so you can stick a fork in it, but in, for the head. So it’s dead, permanently.

Colm: Well, I personally would love to see a revived effort to have a real, you know, secure DNS protocol. I actually think there’s, if you step back from the internet architecture and you were doing everything a fresh, right, you would want encryption to be like a day one property,

right. You’d want to, you’d want to be right in there and you’d want it to be part of the name lookup.

Right? You, you kind of want everything that happens at DNS and everything that happens in the TLS handshake to be one protocol. Right. And, and you just, you, you, you start with a name and you get back an IP address and a key, right. And maybe a port, whatever, and then you go connect to it and you’re good. Right. And, uh, doing, doing that well with, you know, confidentiality and privacy, which DNS sec doesn’t have with, um, with, you know, with full, uh, verification and authentication at each stage would, would be really, really awesome. Like there’s definitely a space for that and hopefully we’ll see it.

Thomas: You’re starting to see it bottom up with things like DOH. Right. But like, if, if you step back and look at like how this came to be, right. What you’re really just seeing is like sheer bloody-minded path dependence, right? Like we’re working with the constraints that our small DOD project had in like the mid 1990s where like it’s not encrypted because they felt like DNS servers of the time wouldn’t be able to keep up with encryption.

Right. And there’s a, there’s a notion of offline versus online signing where like the protocol can, you know, make any nods to online signers, like, you know, anything that actually has a key and can do cryptography in real time. Because I mean, you can make a message board argument that that’s a good property, but the reason it’s there is because they felt like the systems of the time that we’re going to, you know, when DNS was fully deployed in 1997, which whatever the plan was, right.

Like that, like those servers wouldn’t be able to keep up with it. And we need a protocol that’s designed around 3 86 SX, you know, DNS servers or something like that. Right. I totally like I, by complete. I come across as an evangelist for not securing the DNS, but I think DNS that gets in the way of securing DMS.

Colm: I agree with that. And I think I’m optimistic that we can solve all those technological problems and challenges. Um,

I think there’s another property too, though, which is maybe speaks to your other concerns about DNS SAC and, and, um, PKI about, you know, they can be subverted by single parties sometimes, right?

Whether that’s a, a DNS operator somewhere in the tree or, or a rogue CA or whatever. Right. And I think stepping back, you kind of look at it and go, well, wouldn’t you love to just make that a multi-party system where, you know, everything has to be signed by an parties and that’s computationally cheap now.

And, and we can have many CAS now and so on, but that’s, uh, that, that kind of change where essentially every wound becomes dispensable, right by design, um, is those are harder to do. So the kind of changes you’re ambitious for, um, I’m less optimistic about.

Thomas: You’re saying it’s a, so you’re saying it’s a coin.

Colm: No,

I don’t think blockchain is the answer.

Thomas: you said secure multi-party computation and I’m like, you have Dierdra’s attention. So

Deirdre: yeah. Do I have to write another thresholds signing?

Colm: I not, it I’m not even talking about threshold signatures here. I’m just talking about, you know, you could, you could have to say, you know, something in the DNS source or a certificate, right. Just has to be signed by multiple parties. Right. And then you could

just have the software exactly. As simple as that.

Right. What would, would, would be another interesting change?

Deirdre: Well, call them. Thank you so much for coming on our little show. This has been great. Anything else?

Colm: no, it’s been

Thomas: nothing

Deirdre: Cool. Awesome.

Thomas: That’s great. It was awesome to finally meet you and talk to you. Thank you so much for

taking the

David: Thank you.

Colm: Thank you all.

Deirdre: I’m going to, I’m going to stop recording.

The feeling's mutual: mTLS with Colm MacCárthaigh

Latest Posts

Summertime Sadness

Zero Day Markets with Mark Dowd