E2EE Storage Done Right with Matilda Backendal Jonas Hofmann and Kien Tuong Trong

E2EE Storage Done Right with Matilda Backendal Jonas Hofmann and Kien Tuong Trong

It seems like everyone that tries to deploy end-to-end encrypted cloud storage seems to mess it up, often in new and creative ways. Our special guests Matilda Backendal, Jonas Hofmann, and Kien Tuong Trong give us a tour through the breakage and discuss a new formal model of how to actually build a secure E2EE storage system.

Watch on YouTube: https://youtu.be/sizLiK_byCw

Links:


This rough transcript has not been edited and may have errors.

Deirdre: Hello, welcome to Security Cryptography Whatever. I’m Deirdre.

David: I’m David.

Thomas: I should not be awake right now.

Deirdre: We have three special guests today. We have Matilda Bakkendall. Hi Matilda.

Matilda: Hi.

Deirdre: We have Jonas Hoffman. Hi Jonas.

Jonas: Hi.

Deirdre: And we have returning champion Kien Tuong Truong. How are you, Kien?

Kien: Doing great. Happy to be back.

Deirdre: We have three special guests today because they’ve put out some very cool attack and construction research altogether on end to end encrypted cloud storage systems. And I think Thomas is in the background salivating over all the fun little attacks that you found in all of these systems. Ken and Matilda are currently at ETH Zurich and Jonas is at Tu dump. How. How giddy are you about the attacks in this paper?

Thomas: You know, I think my general feeling is they don’t make attacks like this anymore. Some of the things that you see in this paper are things that this might be your last chance to see some of these attacks ever. So I’m really psyched to talk through this stuff. So these are attacks on cloud drive systems, cloud storage systems. There’s like six different systems that you guys looked at. I guess you would do a better job than I would of kind of introducing what your targets are here and really I think importantly here what the threat model is like, who you’re concerned about actually conducting these attacks. So why don’t you try this instead.

Kien: Of me, Jonas, Maybe I can start at least this discussion or I would like to, you know, introduce my other, my fellow PhD students here. So I would like to start maybe historically, like where does this all start? Actually it starts with Matilda, because Matilda wanted to look at MEGA and maybe she will say more about that first. But essentially that is the main inspiration for our work. So I don’t know. Matilda, do you want to start by saying how it all started for you?

Matilda: Sure, yeah. I was just looking it up now actually, because it’s been a couple of years. So it started because we wanted to build more advanced security for file sharing systems. Actually we were looking at things like forward secrecy and like, you know, much more advanced security properties really than just end to end encryption. And then at some point it dawned on us that there’s not even really good basic security for cloud storage. And so that’s how this work started. We were just going to look at a few systems to get inspiration and then try to introduce these more advanced properties. And then we started looking at MEGA and quickly realized that it was broken.

So there was not much to do except to just keep going down that Rabbit hole. And that was primarily Mirohalla, who’s not here today. But he did really a lot of the work finding the attacks there together with Kitty Patterson as well.

Deirdre: Yeah, I think they were on previously for attacking MEGA and that was a lot of fun. How frustrated were you to be? Like, I wanted to do something new and better and I have to go and just do the basics first before we even can get to the next stuff of like forward secrecy or forward secure or post compromise secure end to end encrypted storage.

Matilda: I think it was actually quite fun. You know, I was sort of excited to have stumbled upon this part of cryptography where we were behind where, you know, I expected us to have much better security, especially coming from, you know, how end to end encryption has become the default for messaging systems and so on. I really expected also data at Rest to have better guarantees. So it felt kind of like a bit of an aha moment, like, oh, here’s a place where we as a community haven’t looked enough and there’s a lot of low hanging fruit and open problems that need to be tackled. So not so much frustrated as excited actually.

Deirdre: Okay, you looked into at least four publicly available production end to end encrypted, like store like cloud storage systems, kind of like your Google Drive or you know, box or whatever. But they weren’t those ones, they were some of these other constructions because Google Drive is not end to end encrypted. It has this other weird thing where you trust Google a lot more. Can one of you run through like what, what your approach was and generally what you found the high level and then we’ll get into each one.

Kien: Yeah. So essentially, I mean, I guess the idea here is that after Miro published his thesis, I looked at the list of cloud storage products and I said, okay, so some of these might be interesting. And at the same time there was Jonas coming in who requested a thesis. And I thought, yeah, maybe that’s a good thing to start with. And we looked through the list and we saw at least for the start sync. And then the other providers came a little bit later. I think maybe Jonas should talk about what we found on Sync at least.

Jonas: Yeah, sure. So Sync was the first provider that we started with basically and the idea was first to just look at the provider and then decide basically where we’re going to go from there, depending on whether we find something that we can prove or something that we can break. But the expectation was rather that since there were already attacks on other cloud storage systems, that it was not going to be super secure probably. So what we did is we looked at sync. That means we looked at the web application mostly and tried to find out what exactly is going on. And we found a couple of attacks right away or within a short amount.

Thomas: Of time before we dive into the attacks you guys should describe, especially for people in the US that might not be familiar with all of these apps. Right. Sync in particular, Sync. Let me see if I can do this from memory. You guys did Sync, pcloud, icedrive, C file, tresorit. Did I miss one?

Jonas: No, we did five. So it’s all of them.

Thomas: What are these things?

Jonas: Basically they’re providers of end to end encrypted cloud storage. And in comparison to regular cloud storage providers like Dropbox, Google Drive and so on, they don’t just offer encryption in transit. So when data is sent to the server and encryption at rest, when data is stored on the server, but actually provide end to end encryption, that means that the server shouldn’t be able to see any of the data or also modify any of the data that is sent by the client. So the client should be the only party that is in possession of the key material and should be the only party that is able to access their files.

Thomas: Basically all of these systems, essentially the keys are held client side, but really all these systems are password protected.

Jonas: Yes. So in the sense that the user password is used to derive keys and then to allow the client to encrypt the data when sending it to the server. So because these cloud storage systems should be used also that should support multiple devices and so on, all the key material that is used should be accessible by a user password that the user can enter on different devices. Eventually.

Thomas: Gotcha. And I wasn’t familiar with SYNC before I read this paper, but is SYNC a big deal?

Jonas: It’s not as big of a deal as MEGA is. It’s not like the biggest provider in the space of end to end encrypted cloud storage, but they have around 2 million users. And there are some institutions that are probably interesting that are using this software, like the Canadian government for example. So it’s definitely like an interesting target.

Thomas: So a pretty big deal. And this is like a SaaS application. Like this is not software that you run yourself, this is. Or you run the client. But like they run the service for this.

Jonas: Exactly. So in the providers that we’ve analyzed, there’s also one provider where you’re able to host your own instance. So where you’re able to run your own server which is C file. But for all the other applications, there’s a server that is basically run by the provider and that you can access by running your client locally and then sending information to the server.

Thomas: So in the SYNC case, you’re doing some amount of reverse engineering.

Jonas: Basically, you could call it that we’re looking at the code of the web application, which is nice because it’s already accessible in your browser. But of course there are some measures to prevent understanding what’s going on. Like there’s some obfuscation and you need to get into what exactly is going on by looking at the code and also looking at the requests and the responses that you’re sending and receiving from the server, which is the largest chunk of understanding how the protocol works.

Thomas: Cool. So with that in mind, roughly what did you guys find?

Jonas: So for Sync, we found that a few different attacks that allow us to attack the confidentiality and the integrity of files and of metadata. So in particularly, there’s one interesting key replacement attack that we can do for sync, which basically allows us to or allows a compromised server to replace key material that the client is storing server side, and then allows us to, or allows us compromise server to look at files that the client is sending to the server. So any file that is uploaded in SYNC can be read by a compromised server, but the server is able to do more. They’re actually also able to read a lot of metadata to modify the metadata. They can, for example, attack the binding between a file name and the file content. So if you have two files, you can basically swap the content of the two files and it’s not an issue in the protocol. And there’s also sync. Sync, as is a provider that offers sharing of files so you can share an end to end encrypted file with someone else.

There’s a problem with the authentication of the public keys of users.

Thomas: You guys looked at like five different systems and some of them did sharing and most of them did not do sharing. Did any of the systems that do sharing do so successfully or securely?

Jonas: So in Tresorit. So Sync and TreeSorit are the two providers that offer sharing. In Tresorit, there is actually a protection for key material. So this issue with the unauthentic keys only arises partially. So it’s a bit of a matter of definition whether they do it successfully or not. Because the problem is that the public key infrastructure that is run by Tresorit is operated by Tresorit themselves. So since we’re in a setting where we assume that an attacker can potentially compromise provider infrastructure, it’s quite likely that they would also be able to compromise this public key infrastructure and then the system is attackable. So it depends whether we assume that this public key infrastructure is compromised or not.

And if it’s not compromised, you could say that Tresoring is doing it successfully, otherwise it’s also an issue. And Sync, the other provider is definitely not really doing it successfully.

Thomas: Gotcha. So from my read of the paper and from what you just said, tell me if this is like a bad summary of kind of what you guys came up with. But it seems like apart from the metadata stuff, where every one of these systems, it looks like they don’t protect metadata in some cases just straight up plain text for file types and where the file came from and stuff like that. That aside, there’s like two broad kinds of attacks here, right? The first being usually key manipulation things where from that point on, from that attack on, the server is going to be able to read the contents of files that were uploaded. That’s the first broad area of attacks is reading files. And then the second, and I think kind of in the paper, more represented attck is servers being able to inject files into people’s file systems like control over what the content is. So those seem like the two big kind of broad cryptographic areas of attacks that you guys found. Did I miss a thing there?

Jonas: What I would add maybe is that there are problems with integrity of protection of files in general. So there are some providers that just don’t have any integrity protection for the encryption schemes, which would allow a compromised server to also change the encrypted files by, well, messing with the ciphertext, which maybe doesn’t fall into this category of injection attacks, but is also like an interesting attack and that is possible for multiple different providers. In the same category are attacks using unauthenticated chunking. So in the setting where the client splits the file into chunks before uploading it to the server, there are instances where the server is able to then swap these chunks around and, and do a mix and match to create a ciphertext that they want, which is also a problem.

Thomas: Arguably the stuff where you’re not doing integrity protection for existing files is scary in a different way than being able to inject files directly. In that if you’ve got executables or things up there, if you can tamper with files that are in some way implicitly trusted, then you’re breaking other security models. But you also have that problem with just uploading a file, the file being in somebody’s encrypted vault online gives it an implicit trust. So that’s basically the big attack that you’re thinking about there with both the integrity and the injection stuff is the server is going to put things there that people are going to trust are part of those volumes and are not.

Jonas: Yes. So it could be, for example, executables, but we could also think about a political setting where some adversary, some nation state actor, is trying to put compromising material into the drive of some unwanted party, something like this, to try to incriminate them in a specific way. This would maybe also be covered by this injection attack, which is also a problem, of course.

David: Right.

Thomas: And then simply as cryptography engineers and as connoisseurs of decent designs. Right. This seems like table stakes. Right. The server shouldn’t be able to just randomly make up content and have it appear to be authentic and put it there. So in the Sync case, I got you off track earlier. So talk to us about what Sync looked like, what you guys like, how these attacks actually worked, or maybe starting with Sync. I think pcloud is the really fun, the really fun one for me.

But Sync seems like a good starting place.

Jonas: Yeah. So for Sync, we looked at the code and it was already quite clear very early in the process that there were going to be rather severe vulnerabilities. There were like some red flags that we found. For example, this issue with the. The key replacement, so missing authentication for the key material that is stored on the server. This was like something that came up quite early. And then we spent some time looking at Sync and basically then had to take a decision whether we’re going to like deep dive into Sync a bit more or whether we’re going to spread out and look at different clients. And here the motivation was that since we already had some interesting attacks on Sync, it probably made more sense to also look at other providers and see if we could break them as well.

So we started looking at the other providers basically at the same time and then we tried to see if the same issues arose basically that we found in Sync. And as it turns out, there are definitely some common failure patterns. So things that people get wrong in products that are developed independently of each other. And that is quite concerning. That is like the most concerning thing, at least for me, regarding our paper.

Deirdre: So it looks like very common is that these like perf file system keys or just key material in general is unauthenticated. Unauthenticated encryption, unauthenticated sharing keys, unauthenticated metadata, like you said, file names Size, type, date. Like all this stuff is unauthenticated which allows like you know, a server that’s been compromised or you know, you are theoretically trusting that you shouldn’t have to trust to just manipulate whatever it seems like there’s just. You, you just said that there’s just like a common naive construction pattern amongst all of these. Like how are, what is the general pattern of how these things are being constructed? And like your first suggestion of like, no, just like have like a real, it sounds like these things need a real ad authenticated encryption with additional data at least and then like we can start building from there and at least a lot of this data wouldn’t be unauthenticated and manipulatable by the wrong party, AKA the untrusted server.

Kien: Yeah, I think this really is sort of a culture mismatch, let’s say because you look at the websites of all these cloud providers and the thing that they say is something along the lines of we provide zero knowledge encryption, which is, you know, what does it mean for you? Zero knowledge encryption? And for us, as Thomas also said, it’s this thing about integrity. You know, it’s table stakes, it must be there. Whereas I don’t think this concern has been in know absorbed in the larger public. So when they say our cloud storage is secure, what does it really mean? And apparently it doesn’t involve integrity.

Matilda: Can I add to that? Because I think having looked at some other systems beyond MEGA, that my takeaway was that most of them understand that the file data itself needs to be integrity protected, but then they fail higher up in the, in the hierarchy. So for example, it doesn’t seem to be self evident to developers that keys need to be protected with authenticated encryption. And then of course you have no protection for the file data lower down. But this fact that also key wrapping steps are in the key hierarchy to use aad, that just doesn’t seem to be obvious.

Deirdre: Yeah, especially because a key feature of all these systems is you put your encrypted file in the end to end encrypted cloud storage and you have to be able to share it either with yourself as a new device or with somebody else to be like, hey, here’s a link to my file that’s in my encrypted thing. I am wrapping the file key or a key decryption key, I’m sending it to you in some way. But that means you have to store encrypted key material. The plain text is more key material to decrypt the file. But you have to store that encrypted key material on the server or you have to store it somewhere to share it. And it seems like that is being treated differently than the actual files that you’re uploading. And I’m, I have a hard time understanding and like, okay, so are they, are they encrypting? They’re not encrypting it with like AES gcm, like the key wraps. The key wrapping is not like authenticate an AEAD or something like that.

What are they using to actually like encrypt these file keys or this key material?

Jonas: So, yeah, maybe I can answer that. It depends on the system, of course, but in a lot of cases they’re using, for example, some asymmetric encryption like rsaoep to encrypt symmetric keys, which is not authenticated. But yeah, for example, it still protects the confidentiality of the keys, but not the authentication. And then there’s a problem with that. Now there are also instances where people actually do it properly and use an AAD scheme to encrypt another symmetric key, for example. But oftentimes it’s in steps that use asymmetric encryption where this process fails.

Deirdre: Is that Tresorit?

Thomas: I mean, I think they’re really probably, from an aesthetic perspective, let’s say the. Probably the coolest attack or the stunt crypto iest attack in the paper is pcloud, right? So there’s a situation where you have like you have an RSA oeap key, you have an RSA key pair, and then the private key there, it’s encrypted, but it’s encrypted with ctr. They went out of their way to use a relatively modern RSA and then non authenticated encryption for the key itself. And like pCloud, my understanding from the paper, you’re just gonna correct me here if I’m wrong about this, is they do a consistency check, right? So you can’t just inject an arbitrary public key. And then I guess so the encryption of that private key is bound in some way to the user’s password. So you can’t just swap out the key completely because the private key wouldn’t decrypt properly at that point. And they check to see if the public key matches the private key, but it’s unauthenticated. So you can somehow bit flip that private key to match an arbitrary public key.

Like, can somebody first of all fix my explanation of that attack and then kind of walk us through it because it read really fun in the paper.

Kien: Yeah, that is correct in the sense that what you have is this encrypted RSA private key somewhere on the storage and what you retrieve is both your public key and your encrypted private key. Now the private key, as we said, is encrypted using these weird modification of counter mode, Whereas the public key is just there, unauthenticated, just given to you. Now you would like to give some arbitrary public key to the, to the client, because then the client is going to use the public key to encrypt their own keys. And so if you, if you choose the public key, then you can decrypt whatever you want. Now the problem there is that you also have to provide a private key which is consistent with whatever public key you provide, and that becomes a little bit harder. But because they use counter mode, you can start bit flipping within this private key and then do some fun things.

Thomas: Yeah, like an RSA private key. It’s mostly random, right? So how would you bit flip that? You don’t know what bits to flip.

Kien: Yeah, except for the fact that thankfully they encode everything in D there. And this means that we have a nice header, we have a nice variable version part of the header. There’s the length, there’s even the public key inside. So you know exactly what bytes are in between there. And then clearly there is a private part that you don’t really know. However, then turns out that you don’t actually need to bit flip that part because the public part is large enough to encode a private key within it. So now you have to imagine that there’s a there parser that’s gonna. So you’re going to decrypt this private key, you’re going to parse it, and then whatever is at the start of the DARE encoding tells you how long the key is going to be.

So you just cut it a little bit short, encode another private key inside which is shorter than the one that you actually have, and then there the coder is going to just discard whatever data you have.

Thomas: Trading this is like Deirdre was wondering why I like this paper so much.

Deirdre: There you go. Yeah.

Kien: Oh boy, this was a fun attack.

David: Go ahead.

Kien: Okay, so I just wanted to say that this was fun except for the fact that it has a little bit of a few caveats because you have to encode a smaller private key. But depending on the libraries that are decoding this private key, some of them will not accept any arbitrary private key. Surely you could put a private key with some arbitrary value for n and Maybe you can put your public exponent E to be one. That would be pretty fun except that there are some consistency checks when say, OpenSSL imports this key and says, oh, clearly the public exponent can’t be one and so it’s just going to reject it. This is not consistent across implementation. However, we have seen that there is a cli client for pCloud that uses a completely different library and that library does not perform such checks.

Deirdre: Oh, good. Pcloud the one. They’re all web based except. Was it. Which one was that had a downloadable client or, or no, you could, you could store. You could set up your own server. They’re all web. Okay.

Yeah, so they’re all web based. So it has a separate client that doesn’t do the validation checks.

Kien: CLI client. I guess it’s useful for automating things. I’m not sure.

Deirdre: Cool. Very cool.

Thomas: So we were talking a second ago about how some of these systems do use authenticated encryption for the content of files, or as they all do. GCM seems to be a common design point across all these things, but they don’t do authenticated encryption or they don’t have a coherent security encryption model for the keys themselves. And then everything falls apart from there. I’m struck a little bit by how old some of these systems are. So I think C file is the one I’m really. So at one point CFILE used ECB for their encryption. And I know this both from your paper and also because if you search for C file, you’ll find like 2011 threads where like, well, meaning crypto nerds are trying to explain to them that their crypto is broken. So like, I also wonder how much of this is just these systems.

I think the same thing about MEGA, by the way, that these systems kind of date back to a period where the norms are best practices for building these kinds of systems, at least for general practitioners. These don’t look like systems that were built by academic cryptographers. Let’s say the norms were not there in 2011 that are there now. I think people would be a little bit more careful with these things now. I wonder how much of this is just how old these systems are. In answering that question also, I think we should probably give our listeners some sense of kind of what the common design of these systems are like, what they actually look like, and what the constructions basically are what those protocols look like.

Matilda: I think you have a really good point there, Thomas, that MEGA were also still using AES ECB to encrypt keys when we looked at them in 2021. And one thing that we found out when we tried to propose mitigations is that it’s actually really, really difficult for them to move to something else. So you are probably right that some of these flaws are just legacy problems because it is very difficult when you’re dealing with persistent data, which is very much the use case of cloud storage to change your encryption schemes.

Deirdre: So it’s really difficult because basically the only time that they can migrate people off the old format is when they change stuff. And if they’re not changing stuff, you can’t do anything.

Matilda: Exactly. You’d have to force all of your users to come online and download all of their data and re encrypt it if you wanted to change the mode of encryption that’s used. And at least for MEGA, even at their peak bandwidth, we found that that would take over half a year given the amount of data that they’re storing. And that’s assuming that all of their users would be able to come online and do this because it has to be locally done on the client since it’s end to end encrypted.

Deirdre: This hurts my heart a little bit. But I also vigorously remember the similar constraint when I was working on zcash, which is basically end to end encrypted bitcoin. And zcash has at least three different versions of how it does these shielded transactions. The modern one is wonderful and it has new features and it has new security properties that are so much better and faster than the previous ones. But there are people who still have money in the very first iteration and you can’t force them to come on and like migrate their money to the latest versions. And so it’s like, it’s a forever sticking point about like what do, like we can’t maintain that sort of stuff forever. What do we do about the people who never come online and migrate their money or migrate their files from the old version of encrypted storage to a newer version? If you want to invest in that it, there’s no easy answer. Like you might abandon people, but that’s a, that’s a risk you would take.

And you know, some people don’t, don’t. Some people don’t see the value add to the modern encryption even if you say like, look at, look at how I can just like break the shit out of this if I wanted to. And like they just don’t quite understand it. And if it seems a bit esoteric, they’re like, what do you mean it’s encrypted? And it’s like. Well, it’s not very well encrypted. It’s badly encrypted anyway, maybe.

Jonas: Interesting to add that there are also some exceptions to this rule. So, like IceDrive, for example, is one of the providers we analyzed and they were founded in 2019, I think. And so this is rather recent, I would say.

Thomas: And they would say that the industry said 2019. IceDrive is the one where they said that two fish is more secure than AES, right?

Jonas: Yeah, exactly. So I would say that the industry standard was already better than this when they started the company. So I’m not sure if they, I don’t know, have the same excuse as the other providers.

Deirdre: I wonder if they. They had a copy of Applied Cryptography and that’s all they looked at in 2019 when they were creating Ice Drive, which is a shame. Go. Go buy Serious Cryptography by J.P. amundsen.

Thomas: That’s a good book IceDrive has. So Iced, not sponsored, uses two fish. And also they have their own block cipher mode. Right. Like they came up with their own mode for doing bulk content encryption too. Maybe IceDrive is a good place to like to fix on. Just to describe what one of these systems look like. I feel like these systems all kind of look a little similar.

Like there’s differences, obviously, but there’s a kind of a. There are design commonalities to this. So if you were to Describe roughly how IceDrive was designed, describe that system to us.

Jonas: Should I? So IceDrive is rather simple in design. In comparison to the other providers, they only use symmetric cryptography. They have a user password, and the user password, like with other providers, is used to derive a symmetric key. So you have a key derivation function that you use together with the user password. And then what they do basically is that there’s a master symmetric key in IceDrive, and this master key is used to encrypt all the files that you have. Since IceDrive doesn’t support any sharing, this is rather simple and a sufficient approach because you never need to share any key material with anyone else. The problem is that there are already some things that go wrong there. The issue with metadata that I mentioned earlier.

So IceDrive leaks metadata and also allows a malicious server to tamper with that metadata. And then there’s a problem that they use either their, their custom encryption mode or CPC mode together with. With two fish, which is. Yeah, not doesn’t offer any integrity protection. Of the files. So you can, like, mess with the ciphertext a bit and then also mess with the content of the files, which is, as we mentioned earlier, especially with executables and stuff, a problem. And they also run into this problem of unauthenticated chunking, where you can basically reorder the chunks in which a file is uploaded and build a file that you have, like a file of the server’s choosing. With some caveats, of course.

How this differs from the protocols of other providers is that other providers usually use a lot more different keys. So, for example, depending on the file hierarchy, one key per folder and then one key for each file. Also, a lot of the other providers, especially the ones that offer sharing, use asymmetric cryptography, like some public private key pair that is specific to a user account. In some cases, pCloud, for example, does this. They use asymmetric cryptography in the same way. So there’s a public key and a private key that each client holds, but they don’t offer sharing. So in theory, at least in the current state of the protocol, it would also be sufficient to just use symmetric cryptography because they don’t really need the public key for anything. This is maybe for, I don’t know, compatibility reasons or because they want to add sharing in the future, which I don’t know.

But at the moment this is not needed. So there are, in other providers, some steps in between where the keys of files or folders are encrypted under the keys of the folders or files that are above them in the file hierarchy.

Kien: There’s one specifically the.

Matilda: I just wanted to mention that there’s one other system that stands out a little bit, which is ProThonDrive. This is the cloud storage scheme by ProThon Mail. They use only asymmetric cryptography because they’re basing also their cloud storage system on OpenPGP. Just to mention, there’s also some other interesting systems out there.

Kien: Yeah. And specifically for IceDrive, we were talking about their encryption, the mode of operation, and it looks like a series of peculiar decisions that we’ve seen. So, I mean, I guess first of all, using CBC encryption, raw CBC encryption is already an interesting decision. And then padding your message with only zeros. Okay, sure. Then you have to encode somewhere the length of this padding.

Jonas: Sure.

Kien: Okay. Then you have to provide an iv, and they choose the IV at random. Okay. But then they choose it only from letters and digits, and I don’t know where this comes from. And then they take the IV and then the length of the padding, and then they encrypt it again, this time with. Again with CBC encryption, but this time with a very specific IV, which is 1, 2, 3, 4, 5, 6, 7.

Deirdre: 8, 8 7, 6, 5, 1.2.1.

Kien: Just a series of peculiar decisions.

Thomas: I think my favorite thing about IceDrive was the fact that their unpadding function doesn’t check to see if the bytes are zero. Like, it’ll just take a random. Whatever it says the padding was in the header, it’ll just chop that off the end.

Kien: Absolutely.

Deirdre: This is like, we’re so far away from, like, you know, the, the good APIs of cryptographic primitives. But, like, this is yet another reason why, like, put an IV here, put a nonce here. Random engineer who’s calling an encryption library is like, no, because it results in, like, IVs or nonces with the value 1, 2, 3, 4, 5. Don’t expose that to people at all.

Kien: That’s a very good point.

Matilda: It’s a bit of a difficult balancing act, though, I have to say, because in a recent research project together with Matthias Carlotta and Nicola Deldanis, we were trying to design a secure file sharing system, like, actually implement it. And then we ran into the issue that a lot of the APIs are very restricted. For good reason. For historical reasons, we’ve started building more restricted APIs because it’s turned out that they’re misused. But then when you have some very competent users who want to build complex cryptography, it can become limiting instead.

Deirdre: Well, I would make an argument that my first pass at building something like this would involve nice primitives like HPKE for your hybrid encryption. And it has an aid in there, but you can basically use it for your key wrapping, and then you use a real AEAD for your files and your other stuff. And at the top layer, none of that stuff is exposed, it’s all kind of wrapped up. And it’s stuff like that. If you are sufficiently advanced that you’re really trying to get efficiency and secure construction, stuff like that, then you might have to get into the guts, because you’re trying to break through the nice layers of abstraction in favor of efficiency. But if you’re just trying to get something that’s nice and secure and maintainable for a general practitioner and not a, like, elite expert, you can get away with not having to dig into the guts, but it is a balancing act between, like, if you’re trying to push the metal or you’re trying to push these constructions yeah, you’re going to have to, like, break through these abstractions and constructions that are there to try and keep that sort of complexity and detail away from just a general practitioner. It is, it is a tough balancing act. But I would argue that, like, we, the general cryptography community, are trying to make things like HPK and other things to make it a little bit easier to do efficiently and securely without having to get your elbows deep into IVs and nonces and cipher modes and things like that.

Deirdre: Anyway. Oh, something about a Merkle tree. Sorry, I might be skipping ahead.

Thomas: So we’re just, we’re in the background trying to make sure that we’re catching all the details that we have to call out in these papers.

Deirdre: Right.

Thomas: So, yeah, like, I think we, you know, we want to get into a little bit of, I mean, probably Tresorit in particular is, you know, an interesting example of a more sophisticated system, but also we’re just dunking on, like, one of these systems went out of its way to do a Merkle tree for authentication that didn’t really do much to provide authentication, which is just another fun call out.

Kien: Yeah, yeah, you can go.

Jonas: Yeah. So basically this was a system that was used in pCloud for the integrity protection of files. This goes in the direction of this unauthenticated chunking issue where they split the file into chunks, basically, and then they compute a tag, a Mac tag over each chunk. So far, so good. And then they try to combine these mactags to build like, a tag for the entire file. And the way that they do this is by using like, a Merkle tree to aggregate all the individual tags into one. But then the problem is that instead of including only the root of the Merkle tree in the ciphertext, they include all the intermediate steps of the computation, so all the individual text of the chunks. And this is an issue because the.

The server can then simply like, choose the tag that it. That it wants, cut apart of, of the ciphertext, and obtains a valid ciphertext that is shorter than the original one. So the problem is basically that they didn’t just include the root of the Merkle tree, but also the, the other leaves of. Of the tree, and then it just breaks down.

Kien: Yeah, and also the fact that, you know, usually the way I think of Merkle trees is that you take, I don’t know, your chunks and then you hash them together and then maybe you authenticate the root. Yeah, but why? Like, for some reason, they decided to use max all the way, all the way down. And so this is peculiar. But then it also makes you think that that means that any subtree of this tree is also a valid tree because.

Deirdre: Oh, no.

Kien: Sort of recursive now.

Deirdre: Oh, no.

Kien: So you don’t have the tag only for the root, but you have tags everywhere. So you can just take a small part of the tree and then serve it as the entire tree.

Thomas: I feel bad for them because like, it feels like pcloud went out of their way. So like I feel like IceDrive went out of their way in a kind of a bad way. And then pcloud it seemed like they were trying to get things right. Right. And then it just didn’t work out for them.

Kien: Maybe. I mean it’s. It’s hard to say. They also made some very peculiar choices. I am not sure. It’s also the encryption scheme or. Yeah, the encryption method that they use is also quite strange. So for example, they have some cases in which they have messages of 16 bytes that get encrypted differently from messages which are longer than 16 bytes.

Deirdre: What?

Kien: And the encryption for messages which are 16 bytes includes storing the message with the Mac key as well.

Thomas: Okay, I take it back.

Kien: It’s like. I wouldn’t necessarily say that.

Deirdre: Yeah, that’s weird. Yeah, go ahead.

David: I’ll say. What about C file? So how is C file constructed and how did it go wrong?

Jonas: Maybe. Ken, do you want to have a go?

Kien: I’ll try. Well, C file is also a little bit peculiar in the sense that.

Matilda: They.

Kien: Don’T do sharing per se. For example, what you can do is you can just share the password to one of your drives to somebody else if you need to share it. I mean, that’s on them to decide. And the other thing is that this is one of the only self hostable solutions as far as we’ve seen. But it’s also been used by some universities. And I think one of the interesting things that we found on CFILE is the fact that you can really see the. If you do archaeology, you can see the layers of limestone sort of, and you can see the virus legacy versions that have been stacked on top of each other. And one of the attacks that we found in fact is the fact that they have somewhere a switch between the various versions of the encryption protocol and every version utilizes switch something different.

So for example, some of them use ECB and some of them use CPC. But AES128, some of them AES256, some of them use the OpenSSL bytes to key function to derive keys, Some of.

Thomas: Them which is amazing, by the way.

Kien: Which is amazing. SHA1 repeated a few times. Okay. So you can really see you have a nice cross section of all these things that are stacked on top of each other. And then the problem is that the server sort of decides the version. So the client asks the server, hey, which version do you support? And then the server can clearly just say, oh, I support only the weakest version. And please use that. And that’s not great.

Thomas: And the weakest version here is really, really weak.

Kien: Yeah, exactly. It’s the one that uses the bytes to key and I think just three iterations of that. So that means it’s SHA1 repeated three times and with no salt. And so that means that the passwords of the users, which is essentially the root of security here, gets. You know, it’s very easy to brute force.

Deirdre: What did Tresorit do? Because I’m scanning and I’m like, all right, gcm some weird. They have an HMAC in here, which is good. They have modern rsa, they have scrypt for stretching out that password. How is Tresorit broken?

Jonas: So from what we found, Tresorit is, I would say, a lot less broken than the other providers. And it’s really obvious that they put a lot more thought into their cryptographic design, which is a good thing for the security and the privacy of the users. But it’s not a very good thing if you’re trying to analyze it, because reading through the code is really not very easy because also it’s obfuscated to a very high degree. And they use a lot of different keys. So there’s like they split the their file storage up into these treasures and there’s a key object for each one of the treasures, and there’s also an object for each. And then these are related and they use like a very maybe convoluted combination of symmetric and asymmetric keys. And apart from the issues with metadata that kind of arise in all of the providers, the biggest issue we found there was this thing about key replacement when sharing files with other users. So that you need to query a public key of someone that you’re sharing a file with, and the server is potentially able to replace that public key with a public key that it knows or a public key that it generated themselves.

Deirdre: Yeah.

Jonas: And this is possible only if the public key infrastructure that Tresorit has fails. But since they’re running the public key infrastructure themselves, it’s likely that someone who is able to compromise the server can also compromise the pki.

Deirdre: Yeah, that seems to be like a running a running theme or a running issue with any end to end encrypted system is like at some point you have to trust the infrastructure for some services. Like for a lot of this stuff we seem to trust like just carry my bytes. And even then you might be like, oh, there might be a delivery. Like a compromised server or service might not deliver your bytes from like one device to another. Like that’s like the ultimate failure mode. But that’s accessibility, not confidentiality or authentication or anything like that, or authenticity. But then the next step is like, I’m trusting you to provide the public key material that identifies who I am or at least my devices. And so they’re like, okay, I want to share this file with David.

And the server is like, cool, here’s David’s public key. And you can start your handshake to share key material or do whatever you’re going to do. And then it’s secretly sharing the FBI’s ghost public key and you’re just sort of trusting the server. So the sort of immediate answer to that is don’t trust the server service for, you know, quote PKI and like exchange that data out of band. But like, how do you do that? You have to have another trusted channel that’s not controlled by the service. And like the way that other people. And then it turns into okay if you do that once you have trust on first use. But what about rotation and like then and so on and so on and so on.

These are not easy things to solve for. So it’s kind of understandable that like, if you’re just trying to stand up a service that you have to trust minimally to like carry your encrypted bytes, you’re just sort of like, yeah, I will just hand you the public key and you just trust me. You’re just like, okay, because like, what’s, what’s the alternative? Is it sort of like, ah, go find your friend in person, get on a call and like show each other a QR code or something like that. It’s just, it’s a hard problem to solve.

Thomas: I’m really struck by this whole situation, by how similar it looks to me to the situation that like three months and Matrix were looking at where like so you have these papers that are just like these clown fires of vulnerabilities. I mean, I’m constitutionally mean to people who crypto so. But they’re like, there’s a lot going wrong here. There’s a lot of stuff like you’re just Kind of ping ponging around like the greatest hits of like, you know, like 2010 era crypto vulnerabilities. Right. And like the solution for something like Matrix, which had kind of the same situation, it was like this like hodgepodge of different protocols that were put together that would all bounce off of each other in bad ways. There were a bunch of vulnerabilities there. And the solution there is something like mls, which is what Matrix ended up doing.

Right. Is you take a coherent design that already exists and has been vetted, and then you build your system around that coherent design. And there are a couple of coherent designs that you could use in a messaging system, like threema could take Signal protocol or some open version of Signal protocol, or they could take MLS and then you wouldn’t have to worry about all or most of, you know, the, like the, the problems and the construction, the constructions they had, because there’s a reference to go from there, but that doesn’t really seem to exist for these systems.

Deirdre: Right.

Thomas: There isn’t a model of, you know, what, what does a good, you know, encrypted drive look like? What are the constructions that we should be using there? Right. So I’m wondering if that’s a thing that we should be talking more about.

Deirdre: Yeah. And then like Matilda, this kind of goes into your work on like a formal treatment of end to end encrypted of cloud storage, which is just like, okay, write it up in the formal setting, like what we expect from a good design of end to encrypted storage. So like maybe your work turns into an actual protocol.

Matilda: That’s exactly what we were hoping. Right. So we had this realization that Tom was also made that there, there’s not really a good reference out there. And that’s what started the formal treatment work where we really wanted to provide a good protocol for end to end encrypted cloud storage. And our focus there was first of all to create a security model, because we realized that there wasn’t even any security definitions. So of course you can’t be provably secure if there are no definitions by which you can write such a proof. So that was actually the bulk of that work is figuring out what’s the right syntax for this kind of object, what level of detail should we view it in order to be able to say something about its provable security. And that turned out to be very difficult.

But after we created that model, we also wanted to say something about how to securely build it. So we also built or included. It’s not A fully fledged cloud storage protocol by any means, but it’s at least sort of the skeleton structure that we would recommend. So that’s also part of that research.

Deirdre: Oh yeah. Why was it difficult to really nail down what the model of a secure encrypted files end to end encrypted file storage? Because my first knee jerk reaction is like, well you need authenticity on the metadata and you need CCA resistant encryption on the key storage and on the file storage. It seemed. Why was it difficult?

Matilda: I think there are multiple reasons that it’s difficult. One is that these systems are very different from each other. They try to provide different functionality and so syntactically it’s a difficult object to nail down. It’s not just an encryption scheme where you know which inputs you’re getting and what output to expect. This is like it’s a system and it’s running interactive protocols between a client and a server. So one of the first things that we decided was that we really wanted to capture what was happening on the level of messages between client and server. So more fine grained than if you just look at here’s an encryption algorithm. And so one thing that was difficult there is that the complexity becomes kind of like a supercharged version of key exchange.

So key exchange models are notoriously complicated and they’re just running one protocol, the key exchange protocol. Whereas for cloud storage you’re running a bunch of protocols, there’s like registration when you first create an account, then there’s authentication and then once you’re signed into your account you can do all of these things to your files, like upload a new file, maybe change the file, download a file, share a file with another user, receive. So we had to first of all think about what’s the core functionality. Because some of the systems. So you know, when we did this, this formal treatment work, I had looked at MEGA nextcloud and Proton Drive together with collaborators, and these three systems are vastly different already. MEGA, for example, has this shot functionality for which they use some of the same keys. Prothon Drive of course also has their end to end encrypted emailing service. Nextcloud is meant to be self hosted, so that’s really targeting organizations who want to set up their own server and they do sharing in a very funky way.

So very different kind of system. So we had to first think about what’s the core functionality, how do we capture this in syntax. And once we’ve nailed down which interactive protocols to look at, we then had to find a syntax that could actually handle all of these messages coming from client to server. One super difficult thing there is that when all of these things are just messages between a client and a server, you don’t even have identified ciphertexts. So the client might not be saying, oh, here I’m encrypting a file and now I’m going to send you a file ciphertext, right? These are just sort of hidden inside messages. It could be that the client encrypts the file chunk by chunk and sends it over multiple messages. It could be, I don’t know, that they combine these ciphertexts that are file ciphertext with other things. Abstractly, when you’re creating a security model, you don’t know this, right? We don’t have access to any of this information.

When you’re analyzing the specific system, it might be possible to identify file ciphertext, for example. But on the abstract level where we wanted to define a security notion like I don’t know, integrity of ciphertexts, this turned out to be impossible because there is no ciphertext which you can replace by random, for example, in a security game.

Deirdre: You describing how diverse and complicated these systems are in trying to synthesize some sort of game or systems of games for these different notions of this multi protocol system reminds me of all these other end to end encrypted messaging apps where you might start from a good foundation of say you use signal in your new end to end encrypted chat app, but then you need to do file sharing and then you need to do secure backup of your end to end encrypted chats and you need to do this, then you need to have groups and then, and then, and then and you start from a very nicely modeled designed cryptographic protocol. But the necessities of a software product and service means it kind of just starts growing in all these directions as you start adding features and integrating stuff. And this seems to be like you try, you coming in at the end point being like, all right, how do I even think about all these things and how do they fit together? Do you have any sense of how difficult your current kind of model for these antenna encrypted systems will be to maintain or update as things kind of start growing organically or get added to a system like this?

Matilda: I think there is a risk that this will be difficult. We tried hard to stick to the core in this model, to not make it overly complicated and also to make it modular in some sense. For example, the way we treat file sharing in our model is that this is handled through some sort of abstract out of band channel. And this is really because we want to be able to plug in whatever way real world systems are doing this, which could be using a PKI hosted by the cloud service provider as we were talking about before, but it could also be some sort of out of bound verification over a secure messenger or meet in person or key transparency for example. I think we realized after the fact that we should have done this even more. We’re currently thinking about revising the model to make registration and authentication a separate part from all of the file operations, whereas currently all of these protocols are handled within the same model. It might be nice to do identity management separately so that we can also treat providers that already have an identity management system in place. If you think about the bigger providers that are now finally starting to add end to end encryption to their cloud storage, just like Apple and Google and so on, they are already identity providers.

Right. So they might want to handle this part very differently.

Deirdre: Yeah. Oh, I’d be very interested in seeing that. Thomas reporting. Mm, no, sorry.

David: Do you think just based on like the user experience that you get out of, let’s call like a centralized like cloud storage provider, do you think it’s possible to like nail a set of properties in a construction that gives you more or less the same experience that you would get from a centralized non encrypted one to just hand waving like a secure end to end encrypted one? Is that something that you think is actually possible to build and we just haven’t done it yet? Or is the nature of the problem such that you have to sacrifice somewhere?

Matilda: Is the question whether there is a core set of features that would work for all of these systems? Or is the question when you move trust from the server to the client?

David: Yeah. Can you keep the user experience the same for end users while removing trust in the server? Or do you have to change the experience somewhere?

Matilda: I think there’s reason to be hopeful that we will manage this mass cryptography research goes forward, but there are certainly different some things that are very difficult at the moment. So for example, real time collaborative editing is something that most server side encrypted systems provide today, like Google Docs where you can edit the document together with your collaborators alive. This seems difficult from just an engineering point of view to do on encrypted data, like where do you host the updates and who does the merging of them in real time when you can’t rely on the server anymore. But there are systems that are trying to do this, it’s also an open area of research in cryptography at the moment. So I think there’s definitely a reason to be hopeful. But whether or not the user experience will be identical in the future, I think that that’s a hard, hard question. I think maybe users will have to get used to the experience being slightly different. In exchange for better privacy, we just have to educate users about what it means for their data to be confidential and integrity protected.

Deirdre: Interesting. Although if we have very fast primitives, we may be able to get away with it and then not notice very much.

Matilda: Yeah, absolutely. So hopefully in terms of latency it won’t be a problem. We’ll still be able to support features like editing, but for example, as soon as you have end to end encryption, if a user loses all of their devices or their password, there’s just nothing that the service provider can do to help them get back to their data. This, I think is not something we will solve in terms of user experience. We can educate users to make sure that they have backups of their keys and maybe second factor devices and stuff that can help them get back into their accounts. But it’s never going to be the case that you can email customer service and say, hi, please let me into my account.

Deirdre: Boy. Okay, there’s so much great stuff in here. And like, I totally agree that like we love the thought of end to end encryption for a lot of these things, but then there’s a whole world of people that are like, what do you mean I can’t just log into my account? What do you mean I can’t just email support at and like they’ll just get me back in. It’s like, well that’s the point. We can’t get in ever. Only you can get in. And some people are comfortable with that. Like you lose your device, you lose.

But some people are not comfortable with that. And so, but anyway, Matilda, Jonas and returning champion Ken, thank you so much for this work. This is really cool. I am really looking forward to any kind of future, like nice formalized constructions based on your model and updates to your model Matilda. Because I like to build the good stuff. And all these attacks, like all these attacks are just sort of catnip about. Like it’s obvious that like whenever these were built, which is apparently 2019 for at least one of them, like there’s just a lot of stuff that people don’t understand is important about building end to end encrypted things. That is not just about like, can I read the bytes? There’s authentication and integrity that really, really matter, especially when you’re storing this encrypted key material on an untrusted server.

And this is just like a laundry list of why that’s true. Thank you so much.

Thomas: Yeah, we’ve like, we’ve raised the bar on the messaging side of things. I think most practitioners today have a sense of if you’re not running some derivative of signal protocol or mls, you’re kind of outside of kind of the state of the art right now. Right. And I love the idea that, you know, you kind of, you’re driving towards that same model for, you know, storage at rest, which is like if you were going to come up with the second most important problem in like end user cryptography after secure messaging, it’s this. Right. It’s like where you’re storing your files. And I love that there’s like a, you know, in messaging, there’s kind of, for me, in these messaging systems, there’s kind of an original sin of losing track of the connection between group management and key management. When you forget that the group management is the equivalent of the key distribution scheme, the whole system falls apart.

Here there’s another kind of similar original sin situation where if you forget that the entire key hierarchy needs to be authenticated, needs to have a coherent security model that chains back to some root of trust, then no matter what else you do, whether it’s a Merkle tree or your own block cipher mode, the whole thing is just going to fall apart because somewhere in that system there’s a key where it’s like the server controls this key. I can just switch the key up. The whole thing falls apart. So I love that there are some core notions here of things that every single one of these systems gets wrong in some way because there isn’t yet a popularized coherent security model for these systems to build on. You guys are driving that forward. That seems awesome. Thank you so much for describing this stuff to us.

Deirdre: Yeah, thank you. All right, I’m going to hit the button.

Matilda: Thanks.

Kien: Thank you.

Jonas: Thank you.

Deirdre: All right.