Matrix with Martin Albrecht & Dan Jones

Matrix with Martin Albrecht & Dan Jones

No not the movie: the secure group messaging protocol! Or rather all the bugs and vulns that a team of researchers found when trying to formalize said protocol. Martin Albrecht and Dan Jones joined us to walk us through “Practically-exploitable Cryptographic Vulnerabilities in Matrix”.


This rough transcript has not been edited and may have errors.

Deirdre: Hello. Welcome to Security Cryptography Whatever. I’m Deirdre.

David: I’m David.

Thomas: I don’t know why I’m awake right now.

Deirdre: and you are

David: It’s 9:00 AM in in Illinois and you’re an adult

Thomas: I’ve, I’ve built a whole life. I’ve constructed a whole life around not having to be up right now. That’s my adulthood. This is by design. I’m, I’m, I’m excited to be up for this. This is the only reason I would get up for this is this conversation. But yeah, I should not be awake right now.

Martin: You cannot imagine how much it means to me that you are up for us.

Thomas: if, if only you knew

Deirdre: And that is one of our special guests today, which is Martin Albrecht. Hi, Martin

Hi

Martin: there

Deirdre: hi Ann. Dan Jones. Hi, Dan.

Dan: Hi there. Hi.

Deirdre: Hi. So we’ve, uh, invited our illustrious guests today from, uh, video calling in from across the pond to talk about their new paper, Analyzing the Matrix, uh, group end-to-end messaging protocol that has been implemented in several clients, especially the Element client.

And they found some issues in this kind of, um, let’s generously say, they started with good bones and then it just kind of grew into this forest, this tangled bramble of end, end encrypted thingies. And they did not really interact well with each other. So do we wanna talk about what Matrix is first, or, Okay.

Alright.

Thomas: a hundred percent. This is

David: a hundred percent.

Thomas: this is, this is super interesting. Like Matrix is one of the more popular secure messaging applications, and this, this is probably one of the most significant cryptographic results against a secure messenger like that’s been published. So, uh, I’m psyched about this.

So, yes. Yeah. So like, bring us up to speed on Matrix. Like if, if people aren’t familiar with Matrix, what should they know?

Deirdre: Dan.

Dan: Oh, cool. Yeah. Um, so Matrix, um, it’s primarily like, it’s an open standard, um, and a federated kind of this protocol for real time communication and they provide end-to-end encryption is like quite a big selling point of the specification. But additionally, the kind of the second big selling point is that it’s, um, it’s federated.

So there isn’t a single central server. There’s a few different servers. You have your own kind of like your account on a home server, like they chat to each other and you can have like a big kind of ecosystem.

Martin: So one way of thinking about this is to think that, uh, Matrix tries to do for messaging what SMTP does for email. But they kind of, but you know, not SMTP as it was designed in the seventies, eighties. I don’t know, who knows this by, by heart, but you know, imagine SMTP had end-to-end encryption built in, right?

So you can chat even though you’ll have accounts on different servers, and that’s the federated nature. Uh, and the whole thing is seamless. You can have groups spanning different servers and so on.

Thomas: And there’s a, there’s a whole thing about how nerds love Matrix. Nerds love Signal in theory too, but the big problem that they would have with Signal is that there’s only one Signal server. Um, and there’s no particularly good reason to wanna run your own Signal server because the Signal server doesn’t do much, right?

Like Signals designed with most of the smarts in the client. But like, people don’t like the idea that there’s, like, there’s a single Signal ecosystem that you can’t plug things into. Matrix is not like that, right? Matrix. Anybody can run a home server, like you can stand up your own Matrix, you know, network the same way you would with irc.

I’m a little fuzzy on the seven, dug into running Matrix before, but home servers talk to each other, right? Like there’s, it’s sort of like the fediverse that way too, like with Mastodon and stuff. Like everything kind of links together in some way if you like, try?

Martin: Indeed that’s, that’s the, the key idea is that these servers talk to each other and thus you can talk to users on other servers, uh, in a, in a more or less s kind of experience. And maybe this is a good point to kind of mention that it’s, it’s not just, uh, open source hippies who like Matrix, right? So, but it’s also, for example, uh, it has quite some traction in the European Union because it agrees quite wellwith the federated nature of like the European Union. So the French, uh, government adopted Matrix for its internal communication, the German military adopted Matrix for its internal and classified communication.

Deirdre: The German

Martin: and it, yes, and the, so the bundes wear, they have a, like, it’s based off, like, I don’t think they use kind of the implementations that are in the public domain.

So I didn’t know what the differences are, but it’s based off it

David: You’re laughing about that, but like Department of Defense of the US uses like Microsoft teams for communication, so

Deirdre: There was, Yeah. Okay. mean,

Martin: and the German healthcare system and so on. And then it, you could argue that maybe for those institutions end-to-end encryption isn’t exactly the selling point, Right? So for them, perhaps even having access to all the communications of the civil servants is a legal requirement or a design goal. So we don’t know if, for example, our research, if that would even kind of matter to them as a, as a result, it depends on kind of what you’re designing for. And we haven’t talked to any of these kind of government agencies who are running these kind of Matrix derivatives.

Deirdre: Right.

Thomas: The another impression I have about Matrix is that kind of in terms of the UX and kind of like the way people use it, it’s more like Slack than it is like Signal, or at least for me, right? I, when I, when I’m talking to people on Signal, most of my Signal’s one on one, it’s just like IM style. But Matrix is overwhelmingly group messaging, right?

Like there are rooms and people talking in rooms, and that’s most of what people use Matrix for. Am I wrong about that? I could be totally wrong.

Dan: That seems to be the main focus. I think especially if you look at their user like Element’s, user interface for example. Um, it definitely fits that bill. Um, I know like our university department for example, we kind of, we have a Matrix server set up and we use it as a type of Slack, so it makes a lot of sense.

And similarly, like I probably wouldn’t want to use Signal for my, for my work chats. Um, I quite like the idea of having something that feels a bit more like Slack.

Thomas: And then you mentioned Element.

Dan: Yes, I did. Maybe I should introduce Element. It’s kind of the flagship client I think is how it’s described for Matrix, because Matrix is just like open standard.

There are loads of clients that kind of implement it in different languages depending on what like, excites you, you know, in the month. And Element is like the biggest one and it seems to be kind of associated the most with the Matrix Foundation I would say. Okay,

Thomas: So we, we could kind of like, we could go into the weeds on how Matrix is designed and the m protocol and all that stuff, but before we do— and we inevitably will. But before we do that, we could probably motivate this a little bit by kind of giving people a high level view of what you guys found. Like what, what were you able to do with Matrix?

Bearing in mind the setting here, right? Like this is end-to-end encrypted messaging. So the idea here is that the clients don’t trust the server. That’s the whole point of end-to-end encrypted messaging. If you, if you have to trust the server, you might as well use Slack. When you talk to Slack your TLS to the Slack server, you’re encrypted on the wire.

The only thing you don’t trust in Slack is the server, right? So my impression

Deirdre: only, The only thing you do trust in Slack is the server in Slack. Yeah. Okay.

Thomas: So do I have to trust a Matrix server?

Deirdre: /Should/ you be expecting to trust a Matrix server?

Dan: So one of the things we found, I guess in our, in our research is that you do need to trust home servers, or at least you can’t completely not trust a home server. And specifically the, the first two attacks that we talk about in our paper are really about how much do we trust the home server to tell us who to encrypt our messages to.

Because whilst it is end-to-end encrypted, the group membership, like the list of people who are in a room who you’re chatting to is actually controlled by these home servers. So there’s a bit of a kind of, to what extent is that end-to-end encrypted? Can you claim that to be end-to-end encrypted?

Deirdre: Yeah, because it sounds like you have to trust the server to decide what the ends of your end-to-end encryption are and that doesn’t feel real good , at least compared to what the kind of expectation is when you’re using an end-to-end encrypted service, like for a Signal or for a pair-wise Signal or WhatsApp or whatever.

It’s like I chose who these messages are going to, but if the server can just kind of be like, Oopsy daisy, we added another device, or Oopsy daisy, we added another person to your group and they are gonna see your end-to-end encrypted messages cuz we changed who the ends are. That feels bad.

Martin: Yeah. And so the, the initial response by the designers was like, so, and they’re correct in the statement is that you get a notification in the Element client that, uh, a new device was added or then you, uh, user was that. So essentially outsourcing the problem of verifying group membership lists to the users.

And then if you’re talking about large groups, then this becomes kind of increasingly, of course, more difficult. And also you need to then know what actually the actions are that you need to take in such an event. I. we look for a little bit. I don’t think that the Matrix standard as it stands, requires that this notification is displayed, but the Element client does.

So there might also be a little bit of a mismatch, but they consider this acceptable because of this notification. But it should be mentioned that throughout they always said, "We will fix this regardless." So they kind of, after we disclosed to them, they shared with us their timeline. It’s not fixed yet.

It’s a, it’s a bigger change because they need to change the protocol. But they always said like, Okay, fair enough, they’re gonna change it. And they also kind of announced a few days after kind of the public disclosure based on the public reaction that they should prioritize fixing that. So it seems kind of in the near future, I don’t have the timeline in front of me right now.

They’re going to fix that in the sense of like the— because there’s, notions of admins and so on. So like, um, so authenticating such group membership requests is not something that is kind of completely outside of, kind of like the spec. They just kind of need to implement the appropriate authentication and cryptography.

Thomas: So this is, this is the first of six attacks in this paper, five of which are exploitable, right? And there’s, this is, this is the one with the two variants, one about the user lists and one about the devices. And the fun thing about this particular set of attacks is that we can just dive into it because you don’t need to know much about how Matrix works under the hood to understand this problem.

And as I understand it, and you can shoot me down, is essentially it’s a group secure messaging system. So group membership in a system like this is like key distribution. It’s basically the same concept. And the problem as I understand it, is that the server just controls group membership . Like there’s, there’s nothing more to it than that, right?

It’s like the, the server can just tell all the members of a Matrix room, "This person is in your room now." And when that happens, I’m a little fuzzy here cuz this sounds crazy to me. But when that happens, the membership list of your room ticks up. Like there’s a new member, and you might get a popup or something like that that says there’s a new member, but probably not.

And everything else just keeps running. Like everyone syncs up with that new member, they all distribute like their sessions. The keys get distributed. And the only thing that happens to account for the fact that the new member was added was that like you got told that there was a new member in the group, but messages were still going back and forth.

Right. They don’t stop and wait for people to click, "Okay, this member is okay." Right? Like, conceivably a member could like dip in and out real quick. You, you can imagine ways to kind of, you know, refine that attack to make it even less noticeable that someone, you gotta add a member with like a similar username to somebody else or something like that, right?

Like, is is that really all there is to it ? Like, is it, I, I guess I’m, I’m stuck on this a little bit. In the remediation section of, of this attack in the, in the paper there was a bit there about how the Matrix team has, or the Element team has like, decided to accept the risk on this? And I don’t understand how you accept the risk on this.

Martin: The challenge here is that like it indeed, like what we have is just, it’s insecure by design. Uh, and then that’s the question with all of this work is like, do you only consider valid attacks that the designers consider about it? Or do you think like, I have my own definition of what I consider valid attack and where do you get best definition from?

And indeed, initially they said like, We think it’s sufficient to alert the users to this. And so that’s what we then reported. But I think going forward, kind of, this is gonna change. They’re gonna fix this. So I think they accepted the argument that this is not the sufficient control.

Dan: And there are some more, like there are a few layers to this outside of just kind of being alerted to new users. And that’s this, like they do have this out-of-band verification system and that will cause different levels of alerts to be shown depending on like the trustworthiness of the user that’s been added.

So say the home server makes up a completely new user and then adds it, you’re gonna get alerted quite a bit harder than if the home server takes someone you’ve already done out-of-band verification with and trust and adds them to a room they shouldn’t be in, that’s gonna look a lot more legitimate and it’s gonna have far fewer warnings than if they just create a whole new user out of nowhere.

So there are a few layers to this and for, that’s part of the reason why their initial response was kind of, these layers give enough.

Deirdre: But, but they’re like layers of notifications to a human that, that you hope that they notice, which is not my favorite way of ensuring my end-to-end encrypted client is secure, is relying on a human to notice things.

Thomas: You’re also, you’re talking about the user verification, which is cool because you have another attack that breaks that. But, um, like if you could kind of bring us up to speed on how, what does it mean to verify a user? Like is that everybody on the, the, on the Matrix server has verified that person? Or like, what does that look like?

Like how do I verify that somebody’s a legitimate user? If I’m in a, you know, in a medium sized community?

Dan: That’s a, that’s a really good point because in a small community or like in a small group chat, effectively what you’re doing is you need a process between you and that person, one to one, and you verify each other’s identities going through this kind of, this little dance. But as the size of the community grows, that’s like less and less practical.

And you’re less and less likely to be doing out-of-band verification with all the members in a thousand person chat. Of course, like the, the flip side of that is it’s a bit like a bit murkier as to like your level expect expectation of confidentiality of that chat. If you’ve got a 10,000 people in it, maybe like you’re not gonna do out-of-band verification with all 10,000 people and you kind of aren’t gonna have the same expectations for privacy as if you had five people.

But yes, you, you, you need to run this between each pair of people.

Deirdre: So the out-of-band verification is like if people remember on, on WhatsApp or Signal, like there, there’s a QR code or a number and like you and your party in some other channel that’s not a Signal chat or a WhatsApp chat or whatever, compare either your QR code or your number and if they look correct or they, you know, match in a way that the app decides, um, you’re able to say this person has been verified out-of-band.

It’s like literally a human ticking a box or the app doing a QR code scan to be like, Yes, this looks correct. I’m gonna tick this box that this person is, has been done out-of-band. But I’m— Signal’s my primary driver, I use WhatsApp for the extended family and I am on so many chats all over the place. I don’t remember the last time that I did an out-of-band key check with a new person that I was chatting to, like I don’t remember.

And I am like a highly motivated, knowledgeable user. So these things like they fall down all the time.

David: There’s one Signal user I, who will only talk to me if we’ve done out-of-band verification. And, um, the end result of that is we just never talk anymore.

Deirdre: And so like what this kind of falls back to in practice a lot of the time is trust on first use. It’s like a tofu thing of an ID key or device key from another user, which is just sort of like shrug. As long as it doesn’t change or if it changes, like I sort of grok it and ack it, I trust it. In practice, in a lot of these end-to-end encrypted protocols.

Okay. So we talked about like users, and now there’s multiple devices in these group chats, not just users or

David: the attack we were talking about, if, so you have a paper called, let’s see here, "Practically Exploitable Cryptographic Vulnerabilities in Matrix". The home server controlling the device list is vulnerability A.1, vulnerabilities go to F. Um, and some of them have sub numbers. I believe there is an A two, um, which is, involves home servers and devices.

Could you give us a, a quick overview of, of the other variants of the trusted home server attack?

Martin: It, it’s essentially, you know, at the high level it’s the same attack, but instead of saying here’s a new user and all the devices are added to this room, uh, you’re saying this user now has a new device. And so every chat in Matrix is a group chat because you know, you might have multiple devices and so just adding this device and then the home server can also play games along the lines of like, for the person for whom a new device was added, they don’t see this new device being added, but the other parties kind of in the room kind of see this kind of device being added.

Of course, this device will be show up as, uh, unverified unless kind of additional steps are taken, you know, additional attacks are possible. So like the same caveat about the user could detect this and then take evasive measures or whatever, but the attack is essentially the same.

It’s like you’re adding, you know, for an existing user in the chat, you’re adding a new device.

Thomas: Can a user detect that the home server has added, is telling other people in the, on that Matrix server that you have an extra device, cuz it can just not tell you about the device.

Martin: I think for your own list, I don’t think you can, you can notice that the other users in the room can see that a new device was added for you. And they could monitor that and uh, and so on, right? And then pretending we live in the world for all the other devices, kind of like, you know, everything is verified.

Then there will be an additional device that hasn’t been verified.

Thomas: At this point, I kind of wanna point out that, or I wanna just note for everybody else that nobody on this recording here is in any way affiliated with Signal, but Signal’s gonna be a reference point for a lot of what we talk about. So what I love about this variant of the attack is that, I guess the reason that they have this problem is one thing that users of secure messaging systems really want is the ability to chat with the same identity, on a bunch of different devices. They wanna talk to people on their computers and then get up and walk to the car and keep talking as they’re walking to the car on their phone, and just have that work seamlessly. And one of like the kind of the perennial complaints about Signal is that that doesn’t work great on a Signal.

It works better now, but it used to be terrible, you know, trying to use a computer if you could it all, and your phone. So Matrix kind of started with the premise of: this is a multi device secure messaging system and we’re gonna make that look seamless, right? Like if you have an iPhone and a computer, it’ll just all work, right. They’ll all see the same messages. They’ll be members of the same chats. It’ll be like they’re just permanently synchronized with each other. And you can imagine everyone wanting that feature and wondering why Signal doesn’t have that feature.

And then you look at an actual implementation of that feature, and it turns out the way you get that feature is the server just decides whose devices belong to who. And so you’re, you’re back to like this, you know, forget all the cryptography in Matrix and whether it all fits together properly, it doesn’t matter because the server could just say, you know, a random device that you don’t own is part of your account and gets to read all of your messages.

Which again, it’s hard for me to get my head around. We spend so much time. This a podcast is where I rant about all of your

Deirdre: Yes.

Thomas: you know. Going forward it’s gonna be you like introducing a bug and then me ranting about the bug for 10 minutes. But like we spent all this time talking about like ratchet protocols and like, are we using the right curves and things like that, right?

Like, and none of it matters because the server can just decide who gets to read your messages or not. Am I overshooting it at all there, or is that like basically what you guys

found?

Martin: the one thing I would like to stress is like, it’s not inherent in their design choice that they leave it to the server to decide room membership for users or devices.

Deirdre: It’s just left unsaid. It’s left unspecified,

Martin: so like their, the current specification kind of requires it, but there’s no inherent reason why their, their specification had to require this.

You can design, you know, and redesign Matrix as they now have committed to do, that you indeed kind of use these roles that already exist of administrators for groups and they sign group membership messages and you only accept when they’re signed by these parties. The same way you will never accept an unverified device if you have verified the user.

You insist you will never talk to the users that are not verified and so on. So what I’m saying is like the. I wouldn’t quite jump to the conclusion this shows you like this is impossible. It’s just like they made a a poor design choice and they have accepted that this was a poor design choice, and so we had now have to see what they come up with next and see if that provides a solution to this problem.

Deirdre: And so the primary remediation of this sort of thing is, you cannot make changes to group membership without someone else in the group attesting to them and MACing them and checking the at or signing or checking the integrity of the group membership. And then other people have to verify that MAC or that signature against, you know, secret key material that they are able to access.

And that takes it out of fully server control, basically. Right. But that’s also keeping it transparent. People can see group members and group devices, but they can’t change it without someone else’s work. Is that basically the, the straightforward

Dan: Yes. Yeah, that’s like, I imagine that’s the sketch of, of the solution you’d want if, if, like you say, you don’t care about the privacy of group membership. And I think that’s something that might be a bit harder for someone like Matrix to do compared to Signal. Because Signal do have, um, signed Tinas around keeping the, the, the members of a group private as well as having these kind of authenticated messages.

But I imagine that it’s a little bit more difficult for Matrix to implement the privacy part because they have this federation. I’m sure it’s not impossible, but it’s, maybe it is that little bit harder.

Deirdre: Yes. And if you are a Signal and you’re almost in an Apple model of controlling the full stack of how the service runs, how the clients run, what they’re capable of doing and how you wanna change things so that you don’t have to worry about federation so that you can deploy on top of what already existed, private group membership with fanciness and anonymous credentials and encrypting your things and storing them on the single service, because you control all the servers that are ferrying messages and you know, working with clients.

There’s also openmls, the messaging layer, security protocol that’s kind of evolving, uh, in the I ETF to try and be end-to-end encrypted, but more efficient for thousands and thousands of group members.

But there’s like a whole bunch of other pieces that need to go on top of like an MLS group, key agreement thing to get something like a Signal service, something like a Matrix service. And there have been proposals that I don’t know have been implemented to get private group membership on top of MLS as well.

It can be done, but if you have these things like federation, kind of, in your design space. It does complicate things a little bit. Yeah.

Okay. So we have like five more things to talk about. So we’re gonna go to the second thing, which is the, the key device identifier confusion, which seems to be like straightforwardly, you’re able to use one kind of key as another kind of key in the wrong domain. And this leads to an attack. And let— the way I look, I like, I scan this and I was like, if I was writing this, especially in a strongly typed language, these would be two completely different types that would serialize and deserialize with domain separation.

And everything would be domain separated. Why don’t you just do that? And I realized that all of this stuff is written in JavaScript. I’m like, Huh, that’s harder I guess when the language just doesn’t let you do that. Just out of the box. So can you tell me about the, this one?

Dan: Yeah, so this one, I think, I’m gonna guess a little bit about the history of this, but I think it’s partially the result of their first version of out-of-band verification and then, developing it into their second version in that, initially when they implemented out-of-band verification, it was always device-to-device.

So if I had, yeah, a few device every, every pair of devices that communicate, ever have to go through out-of-band verification. And then what they did is they added this user level kind of key hierarchy that lets you, a user can attest that each device is their own and they’ve done out-of-band verification with it.

And you only do that once. So you kind of create this link between a user and a device, and then separately two users can do out of band verification and ensure that they’re each other. And then you can follow these links a lot. And they both use the same kind of verification protocol to do that. And in one case, when you’re verifying like two of your own devices, you don’t actually use a— exchange like a public key.

You exchange the home server controlled identifier for the device. So that’s just like a, a value that the home server decides? That’s, Yeah. It could be like counting 1, 2, 3, 4. It could be like, maybe like a hash, or it could be like a fancy name, but it’s just like an identifier. And you do that when you’re exchanging, you’re doing the verification between two devices, but when you’re doing verification between two users, then you pass around a public key of the user’s, kind of like root, cryptographic identity.

And the problem is, they kind of, both of those things actually exist within the same space.

Deirdre: I didn’t, I didn’t even realize this. Like looking at the section of your paper that they try to coerce just a random ID that could be just like a fancy string or just like a counter or something else like that into an Ed25519 key and try to use it. That’s like what happens.

Dan: Oh, that’s not quite what they, That’s not quite, um, Yeah, yeah, yeah. Don’t, Yeah. Yeah. They’re just using them both in one case, as as an identifier for a user where then they’re using this Ed25519 key as an identifier for a user’s kind of like root cryptographic identity. And in the other case, they’re just using this string that identifies a device.

And then, Yeah. So what you can do is you can just, the home server can generate a fake user identity and they can assign it to a device. And it just so happens the way the message is structured that— and processed that at the end of verification, you can trick a device into sending their device id— well, they’re always gonna send the device id, but the device ID you’ve tricked them into sending is actually a public key under their control.

And then the receiving device will interpret that as a user’s public key

Deirdre: Okay,

Dan: effectively.

Deirdre: Boo

Dan: And yeah. The attack’s quite fun in that, I don’t know if we make this super clear in the paper, but what it means you can actually do is you can do this at the start, the first time two users ever communicate with one another, and then actively man and man in the middle of their connections forever.

And as soon as they, they try to do out-of-band verification, that’s when you use this attack to tell them that, Oh, the, the identities we’ve been doing a man in the middle with all this time are actually correct. They, they’ve gone through all of band verification, but they think it’s like the correct identity.

Deirdre: So this is a implementation error, not quite a specification or protocol error, but it seems like you could flesh out the specification to explicitly require these different checks of doing these different things with device key and an ID key and whatever to be explicitly domain separated, and that would address this issue from a compromised server.

Is that correct?

Martin: Correct.

Deirdre: Okay. Domain separate all the things, like put the context in the transcript that you’re either hashing or MACing or whatever you’re doing. Uh oh boy.

Thomas: Yeah, it’s cool, right? Cuz like if you read good modern crypto designs, one of the things that you notice is that designs tend to go through a lot of trouble, like mixing random strings and brand, not random strings, but mixing readable strings into things like, you know, this is the, the wire guard protocol that we’re running as part of the transcript.

And if you haven’t done a lot of crypto engineering, it might not be clear why you bother to do that. Why you bother to inject all these strings here, you know, into the cryptography. But the reason you do that is to bind the usage of the key, to the key itself, so it can’t be misused. So like, you know, if you have a key that’s intended for device authentication and a key that’s intended for user authentication, you can’t accidentally use the user key for the device key.

The math won’t work because you haven’t injected the right strings in it. It’s the kind of bug you get when you’re just kind of plugging cryptographic components together, like Legos and not, thinking about how they join. Um, but it’s also cool because it, it explains to you why people do that. Like why those, um, you know, why those strings get mixed in.

It’s, this is what happens when you don’t have domain separation.

Martin: Coming from the academic side. I guess the reason why I would say we do this is because our proofs don’t go through if we don’t.

Deirdre: Oh yeah, that too. But that’s good

David: Yeah, I was gonna say that’s a sign that the proofs are probably attached to reality and not, right, just off in proof land or something.

Deirdre: I mean every random Oracle in your proof. Just put a domain separator in that hash function invocation. Just always do it and you will be happy because you will never have some SHA256 result used over here in your protocol getting used over here in your protocol because they will not compute because they’re domain separated, because they’re supposed to be different random oracles.

You’ll just save yourself. It’s great.

Martin: Precisely.

Deirdre: And your proofs will work and everyone will be happy. Your reviewers, your implementers, everyone will be happy.

Thomas: So we are like, we’re, we’re moving on now to atttack C, and I feel like attack C is where this starts to get hairy in, in attack C, and then later on in attack C, we’re gonna have to introduce some concepts about how Matrix works. So C is an attack on kind of MegOlm, which is a thing built on or around olm. And I guess at this point, I’m sorry to say, you’re gonna have to explain to people Olm and MegOlm.

Deirdre: Okay?

Martin: so Olm is essentially an old version of— based off an old version of, uh, Signal. So the double ratchet, And so, and these are pairwise channels. So essentially every communicating devices, they will have a pairwise, uh, triple Diffie-Hellman key exchange going on in a double ratchet to, and this is where like this exclusively used for management messages.

So cryptographic key materials exchanged, but user messages are not exchanged over that. So these are pairwise channels between all devices. And then on top of that it’s MegOlm and MegOlm is, uh, essentially like similar to the Sender Key architecture. So you have like some, some ratchet that you move forward for the sender and all receivers have like a copy of this ratchet, and ratchet is forward as well.

And that’s where messages are actually sent. And this key material, this ratchet, uh, and uh, you know, the keys that you need, in order to attest who sent this message are exchanged over these pairwise Olm channels. So think of this as like, uh, pairwise Signal, like channels between any two devices. And then on top of that it’s uh, kind of some, some group messaging, uh, some unidirectional group messaging called MegOlm, and those together, you, you, a bunch of those together are a room in Matrix,

Thomas: So a, a room in Matrix is a set of MegOlm channels. One for theoretically, one for each sender.

Martin: correct? And then maybe then you wanna do a attack C?

Dan: Oh yeah, that’s probably a good idea. Well, first, yeah, for attack C, we call it a semi- trusted impersonation attack. And what we mean by semi-trusted is that it creates, when you, when you complete this attack, you do get a warning message in, in the user interface next to the messages. And that’s because it’s detectable, but it, so it achieves the same level of impersonation as a legitimate feature called key forwarding or key sharing in the Matrix protocol.

And the idea of this feature, this key sharing feature, is that you’ve got, each user has a bunch of devices and a design goal is that when a user enters a chat, you order their devices to be able to read messages from that point onwards. So like, I think in their specification, they call it kind of partial forward secrecy because it’s almost— Oh yes.

Yeah, because it’s not like every message, but when membership change events happen to a group, that’s when you want the, like, people not to be able to go backwards. So kind of like, say, say a new user joins a group, you don’t want them to be able to decrypt old messages. But, um, when that user adds a new device, you want that device to be able to decrypt old messages.

But they’ve got these MegOlm— in the MegOlm system, they have some like secret states that’s called the MegOlm Ratchet, which is approximately like a hash ratchet that provides forward secrecy. On like a per message basis. So to solve this design goal, you can share old versions of your, kind of your ratchet state among devices of a single user.

So a user can be like, Oh, I’ve got an old version of this key material that you can have a copy of for the new device. You can have this co a copy of that key material and you can decrypt some old messages. And those are still kind of, the user can’t rewind time and get back to messages that were sent before they joined.

But between a user, they can share this key material among each other. So that’s kind of what this key sharing does. It’s a protocol that, again, it’s layered on top of Olm. So they, they send these, this is like kind of on the signaling layer. They will send messages on Olm that Aren’t user visible, but that’s share, share messages.

And it’s kind of just a simple request response protocol. So you actually kind of say, I added a new device, gets added to my user, and it gets given an old ciphertext and it doesn’t have the ability to decrypt it, will then send a message to a bunch of devices that it thinks might have it, to say, Oh, please, can you give me this, this decryption key?

And according to certain policies, they will then, if they believe it should be allowed, they will encrypt that MegOlm session and send it over Olm, to the device that requested it. That’s kind of the expected behavior. But the thing you can do in this attack is you can just forcibly send one of those messages over Olm through a client, and they’ll accept it.

So you can kind of forcibly share keys

Deirdre: Ooh,

Dan: at the MegOlm layer.

Deirdre: I don’t like that.

Thomas: Just to be clear about this, the key request protocol is like every member of a group has a MegOlm session, right? Every sender is a MegOlm session, and one of your devices might not have every single MegOlm session for a room cuz it’s just joined or whatever, right? And so what happens normally is when that happens and you get a message and you don’t have the right MegOlm thing for it.

But another one of your devices might, you ask the device, you ask your other device, Hey, do you have the cryptos data I needed to decrypt this message? And the bug is, as you just described it, normally it’s request response, but you can just do unprompted response. Just like, here’s a response, take it even though you didn’t ask for it, right?

Like, you’ll still accept

Dan: Yes, that’s exactly it. And they have a bunch of policy around who you should request from. Well, at the time they didn’t implement it on, um, when you accept the response. So they’ll only send the messages to certain people or certain devices asking for the key material, but they’ll accept the key material generally.

Deirdre: honest server will only do that, but a dishonest server can do whatever.

Martin: No this is the client.

Dan: sorry. Yeah, yeah, yeah. Sorry.

Deirdre: Got it. Okay. Okay. Okay. Thanks.

Martin: So the client has controls implemented. Yeah. We’re not going to ask some people we don’t trust for this key material. Right. And we are gonna be careful about who we share key material with. But like when the message arrives, like then essentially the implementation bug was to accept this from from anyone.

Thomas: What’s the impact of being able to inject a MegOlm session into another client?

Dan: It’s primarily an impersonation attack because you can generate your own MegOlm session and pretend it’s another user, and then you can forcibly share it to a bunch of devices, and now they’ll start accepting encrypted messages from that session thinking it’s the person who you claimed it it was, kind of, but it’s actually, it’s actually you.

Martin: But as we shall see later, you can’t have strong confidentiality without authentication, and we’re going to build on this attack in in two steps to then also break confidentiality.

David: So, so why don’t we do that? So both attacks D and E, I believe, build on top of attack C, which is the semi-trusted impersonation you just described. So why don’t we quickly run through, you know, what, what D is.

Dan: Oh, cool. Great. Yeah, so the, the trusted impersonation attack is kind of just like a little upgrade over the semi-trusted one, and the, the core implementation problem was that you could send— there are certain the, the key sharing messages and that are used to distribute MegOlm sessions, either initially or through this key sharing protocol are expected to be sent over the Olm protocol, not over MegOlm, but that wasn’t quite implemented right in Element, like kind of the flagship clients and instead they would accept those messages. So a particular kind of message type, over MegOlm sessions. So you could kind of chain these attacks a little bit where you could first do the attack we just described where you forcibly share a MegOlm session using the key sharing protocol.

Now you’re now impersonating someone, you can now generate a new MegOlm session, but send it as if you’ve just created new MegOlm session via the normal MegOlm protocol, not via the key sharing protocol. And they— clients would accept that thinking it came over Olm. So that’s kind of maybe bit confusing in hindsight.

Martin: You establish a semi-trusted channel, and then you can now send, you can you know, start negotiating more key material over the semi-trusted channel. But there was a bug that forgot that this was derived from the semi-trusted channel because there’s a protocol confusion. It was not supposed to arrive via MegOlm, but only via Olm.

But you send it over Olm, they accept this, and like, "now I’m upgrading", kind of like how much I trust the session from like, "ooh, maybe it’s not them because that was said by key share", to, "that is definitely So this attack achieves a higher level of confidence on the receiving side, that the key material is genuine, than the genuine key sharing feature, right? So like the attacker can outperform a genuine client in convincing another party that this key material is most is definitely genuine.

Dan: You were kind of previously sending messages, and those messages were flagged as, "Oh, they were sent via this key sharing protocol. You should trust them a bit less." But then you send signaling messages over that protocol and you can then upgrade it to a version that doesn’t, That’s.

David: No,

Thomas: Because there’s, there’s no mechanism in the client to reme— like it shouldn’t have been accepting these messages over MegOlm in the first place, right? The signaling channel for Matrix is supposed to be this Olm protocol, which is individual point to point Signal style ratchets, right. MegOlm is this thing they built on top of it to do scalable group messaging.

And they’re not supposed to do signaling over MegOlm, they’re just supposed to do messaging over MegOlm. So you can use that C attack. You can use that C attack to inject a MegOlm session to like pretend that you’re a member of a group and send messages as that user. But then in addition to that, they’re accepting signaling messages over the group messaging channel, which is only supposed to be for like content, right? But they’ll take signaling messages over that as well, because there’s a bug in the client where it doesn’t notice that it’s taking these signaling messages.

Now, when you send a signaling message, when you inject that signaling message, that first time with that C attack, there’s an alert, right? There’s something that pops up. It’s a common alert, doesn’t freak people out because it’s also what happens when you add new devices or whatever. But there’s a thing that pops up that says, this key request protocol just ran.

But the second attack this, the thing that builds on it, the tracking, this is the D attack, right? Like in this D attack, the second thing where they’re taking these signaling messages, over MegOlm instead of olm, over the group thing, instead of the signaling thing, that’s just a straight up bug, right?

They shouldn’t have been doing that to begin with. So there’s no mechanism anywhere to track the fact that they’re taking signaling messages over MegOlm, they just forget. They just pretend that it came over the more secure signaling channel. And so you could do arbitrary things with that, right? And there’s no notices cause the, the, the client just didn’t expect that to happen at all.

Am I, do I have that pretty much

Martin: Yeah. And indeed as far as we know, like this bug was only present in their flagship client Element. So they designers kind of then check their ecosystem and other clients. And then as far as we know, they haven’t found this bug elsewhere. But the attack C, they found variants of that in three other clients.

So that seems to be a bit, the semi-trusted kind of seems to be a bit more widespread. It’s not quite the same, but like details maybe are not so important here, but it seems that it, this upgrade that we’ve just described is something that was kind of only in the flagship client.

Thomas: And the impact of this and the impact of D is it’s similar to the impact of C, it’s impersonation, but now it’s, it’s like seamless impersonation. Now, it’s like there’s no notice, there’s no way you could defend yourself against it if you, if you have this bug, people could just spoof other users.

Martin: Yes,

Thomas: Now there’s E, which is like the nightmare version of the D attack.

Martin: Yes. And maybe we can kind of keep E very brief because the idea is so, so now I can impersonate arbitrary people to high confidence. And so in particular when somebody asks, "I would like to have a backup of my key material", right? So like a way of kind of making it easier to kind of like have access to the key material, they encrypted backups.

I would like to have like, what’s the key I should use to back up all my key material and then I can impersonate whoever I want. And so like, now here, this, here’s your key. Like I’m, I’m your other device and here’s your key that you should use. And then like, thank you very much. I’m now going to encrypt all my secrets, kind of with this key that you’ve just given me, and I’ll upload to the home server where, of course, if their keys under the control of the home server, the home server has now full, full access to, to all, uh, all

Deirdre: me so sad because the whole point of end-to-end encrypted backups is to be able to recover and to be able to have multiple devices and, and all this sort of stuff. If like you only have one phone and it falls in the, in the ocean, like you would like to be able to get all those baby photos that you shared over Matrix or whatever, back from the securely end-to-end encrypted cloud. Right?

And so the fact that like these vulnerabilities just kind of chain together to be like, no whoopsy, like some random can just like leverage these signaling channels to get access to your full end-to-end encrypted backup makes me like, I understand why people might be skeptical of an end-to-end encrypted backup because if this is sort of what can happen, it makes you just sort of be like, well, hruargh, but like it sucks.

It sucks

Martin: So this is mostly a bug, right? So it’s a bug in the client. But this bug is somewhat encouraged by, there’s a design choice in Element at least, of: so this checking on like how much should we trust this is something that happens at display time, not at decryption time. So there’s, there’s cryptographic processing that is spread across the libraries in different sub libraries.

But if you send signaling messages over this channel, then they’re never displayed. So this kind of verification then cannot be triggered because it’s only triggered at display time. You know, if you want to take away lessons, this thing of like, why do we like small, auditable, cryptographic cores and nothing else touches the cryptography, is one of the reasons, right?

That you cannot have this sort of confusion of, yes, it’s fine when we display, but then you make the assumption that every message to be harmful needs to be displayed. This is, in this case, not true.

Thomas: The, the paper is great and there’s actually, there’s more interesting stuff that you found that if the research had been less profitable, if you had not found five really bad exploitable attacks, I can imagine on a consulting engagement, I would’ve flagged some of the other things that you have documented in this paper as vulnerabilities as well.

There’s a sixth vulnerability, which is really simple, right? It’s just like they’re using CTR in someplace. They’re not including the nonce in their MAC,

David: like the straight textbook definition of chosen ciphertext attacks out of like Modern Cryptography,

Thomas: Yeah. But also you can’t exploit it, so I don’t care about it. Right? , So,

Deirdre: But don’t do that. Just don’t do that. Even if you can’t exploit it, it’s it. You never know. It’s bad. It’s bad news

Thomas: Also just, just don’t use CTR.

Deirdre: that too. That too.

Thomas: so like the paper is great, but like my experience of reading the paper was, before I even got to the vulnerabilities, I’m just reading all of this mechanism that they have, all this stuff that they’ve built and this E attack that we’re talking about, this is like, you know, being able to inject the backup key into a client and have them encrypt to a key that you control, right?

This involves the ssss protocol that runs on top of Matrix, which is like this secure backup protocol, which there’s another crypto bug that you found in the backup protocol, like some key confusion thing. I forget what it is, but it’s not exploitable, so we don’t care. But just to start with, a Signal doesn’t have message back up, right?

This just doesn’t exist in Signal

Deirdre: that I’m aware of. There’s a new WhatsApp end-to-end encrypted backup thingy, but it, it looks nothing like this sort of deal. It’s like a

Thomas: I, I feel like there’s like an academic cryptography thing where it’s like, okay, you start out with like this list of things you’re trying to accomplish, and one of the things you’re trying to accomplish is, okay, we’re doing a, a secure messaging system where there’s backup, right? And so you’re looking at it like, here’s how you would design a system, as an academic cryptography researcher. You’re like, Well, if you’re gonna design an secure messenger with encrypted backup, then you should at the very least have small auditable cryptography core functionality where you can like, you know, prove it and all that stuff. But I’m like a step higher and like, why do you have this feature?

Like, is it actually necessary to build these things into Matrix? I get how it makes the protocol a little bit more usable. But it also collapsed in practice, right? Like Element is the flagship client for Matrix, right? And Element has a vulnerability where there’s backup and attackers can control the backup and read everybody’s messages.

Like couldn’t they just not have had this functionality to begin with? And when you port of vulnerability to them, like what was the interval between reporting the vulnerability and the paper coming out?

Martin: Okay, let me respond to the first question because that is a pet peeve of mine. So I think I wanna go in a way, one level higher. What I think we shouldn’t do is, what we are currently doing is that a bunch of technologists kind of discuss of what is a feature that you should and shouldn’t have. I think there’s a problem here of that we need to also understand of like what is actually needed, right?

For the different users that we care about to protect, uh, and who are dependent on these systems, and do our designs actually live up to their security needs? And I think there, like, that’s, that’s a, that’s a task for good old social science. That’s not a task for us to solve. It’s for us then to kind of address this will say actually the risks are too high, we can’t do this.

Right. But I think the question is like, is message backup a thing that you need? I think that, you know, that’s a question that is not for us to discuss. We can discuss these are the risks and then, you know, like how badly people need this. I think I’m not really qualified to say, and it’s similar to, we talked a bit about forward secrecy, and we haven’t mentioned post compromise security, but it usually doesn’t fall that far behind.

We love designing for these, but then also, uh, some research we have done kind of suggests like the design goals that we have there, they don’t actually serve, uh, users in high risk environments who care deeply about forward secrecy and post compromise security, except not our definitions, but they have other ideas about those.

And so I think like there’s a, there’s a question of like, yeah, what, what is actually kind of the correct design goal for a messaging system, not just in its cryptographic core, which is something that we can handle, but also in terms of like, yeah, what do we need to live up to? What are the expectations against us when we build them?

Deirdre: I think this also kind of goes into Matrix, the protocol and Olm and MegOlm and the the Ssss back up thingy. All these things seem to have kind of grown organically and tied together, but without very clear or formal definitions of the guarantees that these cryptographic protocols were supposed to be giving.

So when you add olm and MegOlm and when you add this backup thing and you add these different keys, they started interacting in ways that were unclear and unspecified. And then Whoopsy Daisy, like all your secure encrypted backups are revealed to an adversary who controls a server or whatever. Doing engineering of any kind is hard.

Trying to, to do software of any kind of level of security is hard trying to formalize these security notions as you go, as opposed to— is not easy, let alone for full-time security people who like do this as their job. I don’t, I don’t wanna go back in time and like tell the Matrix people, like you should’ve had like a formal model of every single security property or, you know, integrity, confidentiality, or group membership or indistinguishability or all this sort of stuff of everything you do before you even write a line of software.

But it seems like a little bit of that would’ve helped to mitigate sort of these weird interactions as this sort of, all this stuff started growing on top of each other without checking how they affected each other in terms of security.

Martin: I agree. I guess we can at the very least tell anyone who’s going forward going to try that, that like, yes indeed. You should have formal models and you should have proofs. And so there’s this, one of the reactions to kind of the kind of attacks that we presented and also to prior previous work where we kind of like broken some cryptographic protocols is then to say like, "Well crypto’s hard", and “don’t roll your own crypto.”

But in a way the thing is like, you know, we need some people to roll their own crypto because that’s how we have crypto. Someone needs to roll it. But we have developed techniques, we have developed formalisms, we have developed methods for making sure it doesn’t have to be hard, it’s not, it’s not a dark art kind of that only kind of a few, a select few can master, but it’s, you know, it’s a science and you can learn it.

So, but you need to then indeed employ a cryptographer in kind of like forming, modeling your protocol and whenever you make changes, then, you know, they need to look over this and say like, Yes, my proof still goes through. Um, so like that is how you do this. And then, then true engineering is still hard and it will remain hard and you know, any science is hard, but then at least you have some confidence in what you’re doing.

You might still then kind of on the space and say like, you know, the attack surface is too large and I’m not gonna to have an encrypted backup. Right. That’s then the problem of a different hard science, social science. Right. But then just use the techniques that we have, the methods that we have to establish what we need.

David: We can leave it to the PMs to figure out if encrypted backups are required User journey or not for your, for your chat app,

Thomas: Well, you’re a PM!

Deirdre: Well, yeah, David is saying, leave

David: Yeah. It’s my decision is what I’m saying.

Deirdre: We’ll, I’ll

David: uh,

Deirdre: David.

David: So is, would you even call your paper a formal analysis? That sounded mean, but like it’s not, I wouldn’t, in terms of what I usually think of a formal analysis as like we’ve done a bunch of stuff with like Tamarin or symbolic proofs or computational proofs, and, and here’s the model.

Yours has more of the sense of like, I looked at the spec and um, I tried a formal analysis and instead all I got was a bunch of bugs. Would, would you say that’s accurate? And if so, I guess what was your process for doing this? Like were you trying a formal analysis?

Dan: Yeah, that’s pretty accurate. Our initial plan was to do a formal, like, um, a pen and paper proof type thing, but kind of during the process, turning it into pen and paper, turning it into like some pseudo code and those kinds of things, turning that spec into it. They weren’t necessarily like issues with the spec directly.

A lot of the time. I guess you, you can tell based on how, what we’ve discussed so far, but sometimes they were just converting this back into pseudo code, required a bit of deep diving into an implementation or just checking something and double, like not assuming that the, like from the spec, maybe you can assume the sensible way to do it is this, Well if we’re gonna do a proof on it, we should probably double check they’re doing it the sensible way.

Kind of just a few things like that as we were kind of converting it into the pseudo code, and it was during this process that we kind of did little deep dives into implementation parts and found some bits. We were like, Ooh, maybe we should double check if they do this right. And even though it might not have, Yeah, so even though it might not have been directly related to the formal analysis. It was kind of like led there by it and we kind of found those attacks through it. But we are, we’re still working on formal analysis and like kind of a follow up on that.

Deirdre: In like a symbolic model or just like any, you’re trying to formalize any of these properties and at all, and like is that all of Olm, MegOlm and the secure backup, or is, are you keeping it constrained for now because there’s just, there’s just a lot of stuff in this Matrix pile of stuff.

Dan: Uh, yeah. So we were originally working on formal modeling and it was gonna be pen and paper proofs of not the whole Matrix protocol, but a subset of it. So for example, we don’t consider the backup

Deirdre: Right. Okay. Cause that that feels kind of bolted on.

Dan: yeah, well partly just cuz we needed to like reduce it down to something manageable, um, effectively, but that’s, that, that was like the initial plan.

And then it was gonna be, Oh, here are some, like a couple of implementation attacks and, and a proof once those are fixed. But then we kind of ended up filling out the, that that attack paper a bit more. So we’ve spit out the work. Now it’s effectively what.

Martin: But I mean that it’s essentially the same motivation that, okay, yes we found some attacks, but how do we know they’re not more working there? Right. And then of course a pen and paper proof will not cover implementation bugs, but at least it would give you some confidence that there can exist a secure instantiation of protocol.

And for that you need to, you need to be more formal. And so like that is the task, then..

Deirdre: to kind of the unwieldiness of this whole thing that grew quote unquote organically, and now you’re trying to bolt on some sort of formal analysis that it never had in the first place, which seems inherently more difficult. Do you think formalizing group membership of these end to end encrypted group messaging is different?

Because one, I’ve been working on threshold schemes and it seems like anything multi-party just gets harder in terms of formal modeling, Do you think there’s anything to that or is that just sort of like, Nah, no, we we’re kind of figuring it out. It’s just slightly more novel.

Martin: I think we are getting there. So I think there’s not that much work there. I think it also depends, my guess is the social structure of the groups you’re talking about is gonna help you. So even if you have some distributed consensus protocol, you often only need to reach that between a bunch of admins.

So even if your group has a thousand members like you, once you’re talking about trusting a thousand members, then maybe kind of your security guarantees are, you know, a bit big anyway. But if your group membership is controlled by five people, you know, then maybe your quadratic protocol is totally fine, because I could take the square of five and feel good about myself.

So in that sense, it might be that even if the solution that we have for kind of like, you know, establishing this kind of consensus, the number of parties involved might be somewhat manageable, even if you’re talking about large group chats. So like, I think, you know, maybe the impossibility result is appearing next month and then I look like an idiot.

But like, my guess is no, no, no, I’m not, I’m not promising. I’m just saying

Deirdre: I was like, what? What

Martin: to, to me it feels like it’s just the thing that we haven’t spent as much attention on and now kind of the community is getting ready to tackle that and so I expect progress is within which.

Deirdre: And I’m trying to keep up with analysis of MLS, messaging layer security, and that is just group key agreement, all this other stuff, there’s no end-to-end encrypted backups. There’s no private group membership stuff in mls. MLS is just sort of, everyone agrees on a key in an efficient manner, and they’re trying to do it in a, in a way that’s like tens of thousands efficiently.

That seems like a lovely little box where formal analysis is going, is ongoing as each version of that specification is, uh, is updated and it seems to be promising and like formal analysis of like version nine found an issue and then they fixed it in a future version and so on and so on. And that seems promising for like, nailing down these formal notions of group key agreement and the, the guarantees that you’re supposed to be able to get from that.

That’s nice, but Signal doesn’t use MLS or group key agreement. What— Matrix doesn’t use that. Uh, WhatsApp doesn’t use that. There’s like a ton of deployed crypto that doesn’t use this nice formally analyzed thing that people are working on. It’s like only in a couple of very niche places right now.

Thomas: But you could still have some of these problems even if you used mls. So some to the idea

Deirdre: Oh yeah.

Thomas: MLS is a very big deal. It’s a, been an ongoing project for many years. It’s like pretty high profile, right? And the grossest problems we have in group secure messaging are problems it basically punts on, right?

Like you have

Deirdre: yeah. Yeah. It’s completely out

Thomas: mechanism for distributing keys that we can, you know, potentially formally model and have confidence in. But the real question is just how do you decide who gets to join your group? Right? And that’s not an MLS question. And it’s like that, that’s not like,

Deirdre: They punt to the application layer , which I understand you wanna keep your little box cleanly delineated. Um, and they’ve picked their box and that’s great that it solves a very specific problem. But the other problems about deploying a service that, an app to someone can just install in their device and just use, has a bunch of other stuff to it that you have to solve well, or else you end up with Matrix.

Martin: There is work ongoing in the academic community of saying like, we need to reason about group membership and authentication. That, So like, I think it’s, we are not there yet, but like work is ongoing to kind of like allow us to kind of like, not say like, "…and somehow magically the application kind of takes care of that," but to kind of like have more, um, more solid founding foundation for that.

Thomas: It’s frustrating, right? Because there’s a sense in which you can, like, in a very technical way, say that the protocol’s doing what it’s supposed to do. When you have these problems, like, you know, you have insecure group membership, right? It’s like, well, like the key distribution system and like the encryption of messages, that works fine.

Or there’s nothing wrong with that, right? Like, and there’s a message that comes up when somebody joins the group, right? But it’s like the, the security of the system depends on that decision about who gets in the group or not. You know, that being done properly. So it’s weird, it fuzzes up the whole, like what our security definitions are and the fact that there aren’t clear security definitions about group membership makes it a lot harder even to talk about what it means to have a secure group, you know, a secure group messaging system, which is tough because systems are moving towards like, you know, away from point to point secure messaging and more towards like literally all messaging is group membership, even if it’s just a group of two people.

So like it always matters what group membership is now for all of secure messaging.

Deirdre: Especially because you usually have more than one device and now you’re a group of three. Even though it’s two humans, it’s two devices per human or one, and so on.

Going back to what we were talking about earlier is that Matrix seems to be motivated by a desire to be federated- able, like email. Do we think all these sort of discussions about secure group membership and like formal models of these things is like just really hard in a federated system?

Is there any work on formal analysis of like a federated system or servers that talk to each other and share information and what that looks like? Cuz I don’t think I’ve seen anything in the literature, but that it’s not my like, you know, professional wheelhouse about federation principles.

I

Martin: don’t think federation is a focus. So there’s some work on trying to do it fully distributed. And that gets really hard because then, especially when you talk about groups, because how do you even establish a shared view of like the, what the group state is, which is something that currently the, you know, in the case of Signal, the Signal server does that, right?

It orders messages, for example. Um, and you trust it to do that. And so it seems to me, but this is not kind of like off the back of like some, some, some deep dive. It’s not that different in a federated environment because you still, like a group is somewhat associated with like, a server, and the user has an account on a server, and the administrators of that group might be on a different server, but you still essentially trust the servers in a similar way as you trust the Signal server of like ordering messages, um, and delivering them.

And then you can rely on that to kind of establish some ground truths and you don’t have this kind of spiraling out of control, kind of like state separation and so on, right? So of course the network can split, but on a secure, from a security standpoint, the Signal server can also split the network, right?

So like it’s not, when you argue about security, the federated nature doesn’t seem to make that much of a difference, at least as far as I can tell.

Deirdre: so it’s more of like, I live on a server, and then I can go talk to my friend who lives on another server and then someone figures out how to, like a group lives on that other server and they’re able to like import an identity from another server and then it still lives on one server or whatever, and everything else is the same.

Okay. That seems doable, I guess, but you still have to trust a server..

David: The Federation just makes it really hard to rev implementation changes cuz you have to get everybody to upgrade across a disparate set of clients, and you can’t assume that the client and the server all move and closer to lockstep.

Deirdre: Yeah. And there’s like a reason that certain secure softwares are like, you get to use this version of the software for like six weeks and then if you don’t upgrade, it’ll turn off. Like you could, if you have the source, you could recompile it and you could turn off the, the senescence. But I think Signal has just pushed an update, at least on their Android client to be like, You must update.

You must update now. Or else , we’ll stop working. And I’m like, Ooh, that’s interesting. So like that sort of thing becomes even harder in a federated model where like the server must be updated or you cannot communicate to another server or another client anymore. Um, and when you are trying to be highly secure, like a Signal or, or you know, hopefully Matrix, it gets harder.

So, yeah, I don’t know

Martin: It’s another pitch for hiring those cryptographers early because then they can design your protocol that it doesn’t need that many updates from a security perspective at

Deirdre: least.

But if you’re starting on like an already vetted protocol, like Signal, like do you I can just see like the logic of like, we started with the good stuff, someone else

Thomas: That’s how this went. Yeah. That’s how this went wrong,

Deirdre: Exactly. So

Thomas: That’s what, that’s,

Yeah. This is just awesome. This is, it’s awesome work. I was very happy reading the paper. I was very happy reading the paper again. Everything about this just makes me happy and thank you guys for bringing this into the world.

This is fantastic.

David: you very much for coming on our silly little podcast.

Martin: Thanks for

having

Dan: for having

Deirdre: Thank you.

David: Security, cryptography, whatever is a side project from Deirdre Connolly, Thomas Ptaçek and David Adrian. Our editor is Netty Smith. You can find the podcast on Twitter at scw pod and the hosts on Twitter @durumcrustulum @tqbf and @davidcadrian. You can buy merchandise at merch.securitycryptographywhatever.com.

Thank you for listening.