Research on HTTP privacy and security concerns

Explore

Handy to know shizzle

Research on HTTP privacy and security concerns

First draft: 2024-Apr First published: 2024-Sept Last edited: 2024-Sept

This document presents my research and understanding of HTTP privacy and security issues, particularly in the context of my project on recursive downloading from HTTP-based file hosts. While writing this content, I was reminded that most popular messaging services use some combination of HTTP and/or TLS in their implementation for sending and receiving messages and other data. So I touch on this topic

in this section⁠

. I also a touch lightly on the issue of AI being trained on users' private messages

in this section⁠

If you have any questions, suggestions or critique, or would like to contribute, please drop a note

here⁠

. If you think I've missed something, made a mistake, or have an idea for future content, I'd love to hear from you!

HTTP the Hypertext Transfer Protocol

Citing Wikipedia:

🔗⁠

⁠

HTTP is the foundation of data communication for the World Wide Web

HTTP by default is unencrypted, but this is becoming less common as public CAs (Certificate Authorities), such as Let's Encrypt, have made HTTPS (utilising TLS) easily accessible. I would go so far as to say that HTTPS has nearly become a ubiquitous standard—and rightly so.

The benefit of file level encryption

As a general file transfer rule, if you want to increase transit security (even on encrypted connections) i.e. to mitigate the possibility of a threat actor from monitoring or copying WHAT is being transferred, one can consider encrypting the data at the file level prior to transmission. If the connection (encrypted or otherwise) is exploited, then attackers will only see an encrypted byte stream. Ideally one would pick one of the the current “strong encryption” algorithms. For example AES256 or stronger and a key that maxes out the algorithms key size (AES256 max key size is 256 bits / 32 bytes).

While file-level encryption is a powerful tool for enhancing security and privacy, another key aspect to consider is the use of Transport Layer Security (TLS).

TLS - Transport Layer Security

TLS is great for on-the-fly encryption of HTTP and other web traffic, mitigating eavesdropping of WHAT is being transferred by the vast majority of threat actors. TLS <1.3 does not encrypt the initial TLS handshake and therefore exposes the SNI (Server Name Indication aka the website hostname). I refer to this as the SNI leak. SNI is how webservers determine how to route requests to the HTTPS websites they host. TLS 1.3 by default also suffers from the SNI leak, but introduces some extensions to mitigate SNI leakage which hopefully become the new gold standard in online privacy... BUT states and governments may block such traffic because they cannot censor or monitor it. All current versions of TLS expose the src and dst IP addresses in packet headers. Using a VPN can mitigate SNI leaks.

The TLS 1.3 extensions I mentioned are:

ESNI: Encrypted Server Name Indication (will likely be deprecated in favour of ECH)

ECH: Encrypted Client Hello

Cloudflare wrote an article in 2018 titled Fixing One of the Core Internet Bugs, where it introduced ESNI and talked about the SNI privacy leak. Its worth a read, but note ESNI looks like its being superseded by ECH. Here is the link to the 2018 post

🔗⁠

. A good follow up read is their post about ECH in 2020-Dec titled Good-bye ESNI, hello ECH! Here is the link

🔗⁠

. In 2023-Sept Cloudflare followed up again with a post titled Encrypted Client Hello - the last puzzle piece to privacy

🔗⁠

Cloudflare hosts a security and privacy checker (for their website) here:

https://www.cloudflare.com/ssl/encrypted-sni⁠

. An important takeaway from the page is:

Q: If I pass all four tests, am I secure no matter which site I browse?

A: Not necessarily. Even if you pass all four tests, the domain you are visiting also needs to support these technologies. If the domain you visit doesn't support DNSSEC, TLS 1.3, and Secure SNI, you are still potentially vulnerable, even if your browser has support for these technologies.

This highlights that strong security and privacy is a combination of client and server security. You can have “the most secure” browser available and still be connecting to servers that are vulnerable or leaking sensitive/private data.

I’ve collapsed the following two sections because they go into quite a bit of detail, so feel free to check them out if those topics interest you. I’ve tried to summarise things in the "So what?" section.

Certificate revocation infrastructure and privacy concerns

First we need to know a little about OCSP, Citing Wikipedia

🔗⁠

The Online Certificate Status Protocol (OCSP) is an Internet protocol used for obtaining the revocation status of an X.509 digital certificate. It was created as an alternative to certificate revocation lists (CRL), specifically addressing certain problems associated with using CRLs in a public key infrastructure (PKI). Messages communicated via OCSP are encoded in ASN.1 and are usually communicated over HTTP.

Some web browsers (e.g., Firefox) use OCSP to validate HTTPS certificates, while others have disabled it.

Certificate authorities (CAs) were previously required by the CA/Browser Forum to provide OCSP service, but this requirement was removed in August 2023, instead making CRLs required again. Let's Encrypt has announced their intention to end OCSP service as soon as possible, citing privacy concerns and operational simplicity.

Then, we have OCSP pinning or stapling, an extension that looked to mitigate some of the issues with base OCSP.

Citing Wikipedia

🔗⁠

The Online Certificate Status Protocol (OCSP) stapling, formally known as the TLS Certificate Status Request extension, is a standard for checking the revocation status of X.509 digital certificates. It allows the presenter of a certificate to bear the resource cost involved in providing Online Certificate Status Protocol (OCSP) responses by appending ("stapling") a time-stamped OCSP response signed by the CA (certificate authority) to the initial TLS handshake, eliminating the need for clients to contact the CA, with the aim of improving both security and performance.

In principle OCSP was a good idea and stapling was a good extension of the protocol, but as written in a blog post in 2024, Lets Encrypt is planning to end support for OCSP:

Today we are announcing our intent to end
Online Certificate Status Protocol (OCSP)⁠
support in favour of
Certificate Revocation Lists (CRLs)⁠
as soon as possible.

and wrote:

We plan to end support for OCSP primarily because it represents a considerable risk to privacy on the Internet. When someone visits a website using a browser or other software that checks for certificate revocation via OCSP, the Certificate Authority (CA) operating the OCSP responder immediately becomes aware of which website is being visited from that visitor’s particular IP address. Even when a CA intentionally does not retain this information, as is the case with Let’s Encrypt, CAs could be legally compelled to collect it. CRLs do not have this issue. ... We recommend that anyone relying on OCSP services today start the process of ending that reliance as soon as possible.

CRL’s (Certificate Revocation Lists)

Citing Wikipedia

🔗⁠

⁠

In
cryptography⁠
, a certificate revocation list (CRL) is "a list of
digital certificates⁠
that have been revoked by the issuing
certificate authority⁠
(CA) before their scheduled expiration date and should no longer be trusted".

A potential negative aspect of CRL checks is that they are normally performed in cleartext.

In 2022 Let’s Encrypt published a blog post:

A New Life for Certificate Revocation Lists⁠

and closed out the post stating:

We look forward to continuing to work with the rest of the Web PKI community to make revocation checking private, reliable, and efficient for everyone.

Reading the Wikipedia article, one can see that in 2013 Mozilla moved away from CRL’s:

As of Firefox 28, Firefox will not fetch CRLs during EV certificate validation

So what?

For individuals concerned with InfoSec and privacy, it's crucial to determine whether to configure their HTTP clients to perform checks for certificate revocation, either via Certificate Revocation Lists (CRL) or Online Certificate Status Protocol (OCSP). These checks are a key defence mechanism in identifying and rejecting revoked certificates, which could have been compromised or are no longer trustworthy for various reasons.

CRL requests offer a relative privacy advantage because the details in the requests and responses are less specific, which means that surveillance of these requests would generally result in only broad statistics and usage patterns being discernable, rather than specific user behaviour.

To further protect privacy during these checks, a VPN can be utilized to obscure the originating IP address of the revocation check requests. This approach makes it more challenging for third parties to track where the checks are coming from, thereby enhancing user anonymity.

However, there's a potential security trade-off involved. Users can configure their clients to skip revocation checks or block them at the network level, but doing so could leave them vulnerable to accepting revoked certificates. This vulnerability creates an attack vector where an adversary might exploit the use of a revoked certificate to perform malicious actions. Thus, while prioritising privacy, it is also important to consider the implications for security and the potential exposure to cyber threats.

💡 An interesting area of future research might be to look at the statistics of how often certificates are actually revoked and try to determine the probability of being hit by an exploit versus the negative privacy leaks from the revocation checks? Maybe Mozilla’s CRLite is the way to go?

TLS >=1.3 with ESNI or ECH

TLS >= 1.3 will help to mitigate tracking of which websites citizens visit by encrypting the initial TLS handshake but as of 2024-Apr, having looked at the code of major projects like Apache and nginx the support for ECH is experimental at best. It is to be expected that certain states and governments will mandate blocking of TLS using either ESNI or ECH extensions because they would loose control and not be able to monitor or censor their citizens. Further reading

🔗⁠

As ESNI / ECH becomes more widespread over the next few years and servers are upgraded to support it, InfoSec and privacy conscious users will want to consider how they can take advantage of this privacy feature... if their ISPs don't block it.

💡 I haven’t checked the ECH code in detail yet but I would assume if ECH fails TLS will fallback to insecure SNI. Privacy conscious web sites might be able to configure their servers to refuse insecure SNI?

TLS protocol in detail - every byte explained

If you feel like nerding out and exploring the TLS protocol at its lowest level, there are some great resources published by Michael Driscoll over at

xargs.org⁠

⁠

The Illustrated TLS 1.2 Connection⁠

⁠

The Illustrated TLS 1.3 Connection⁠

⁠

The Illustrated QUIC Connection: Every Byte Explained⁠

⁠

Can HTTPS/TLS traffic be tracked or surveilled?

TL;DR: As of 2024-Sept yes, to a degree. That degree also depends on how sophisticated a threat actor is.

TLS does not prevent an ISP (Internet Service Provider) or MITM (Man-in-the-Middle) from tracking which nodes on the internet are talking with each other due to unencrypted packet headers, and TLS <1.3 does not encrypt the initial handshake therefore exposing the SNI (remote server hostname). This means ISP’s and threat actors can monitor and correlate which network nodes are talking to each other and which hostnames are being requested. WHAT is being transferred once TLS is established is much more difficult for eavesdroppers to listen in on, and requires either compromising one end of the secure socket or a much more sophisticated exploit.

As I’ve already mentioned, secure SNI, also known as TLS ECH, should help combat the SNI privacy leak problem in the future.

When researching this subject I was reminded about the case of Edward Snowden

2023-June - It became clear that the US National Security Agency operates a complex web of spying programs which allow it to intercept internet and telephone conversations from over a billion users from dozens of countries around the world.

In conclusion

Insecure SNI leaks which websites citizens are using, at least until TLS ECH becomes widely adopted

Server certificate revocation checks especially OCSP, can leak which websites citizens are using (OCSP can be disabled)

Insecure packet headers leak which nodes are communicating

These leaks can be mitigated to a very large extent by using a VPN

Decrypting WHAT is being transmitted with strong encryption is cost prohibitive, so much of the worlds encrypted transmissions will remain secure, at least until the strong encryption becomes weak due to computational advances (see: Moore's Law and perhaps quantum computing). Stored encrypted streams will become an easier target in the future.

TLS decryption is not impossible but the probability is low especially if you are not being explicitly surveilled by a state level actor. Its much more likely that a threat actor will try to exploit weaker points in the chain, for example certificate authorities, clients, servers and humans.

I’ve collapsed the next section because its verbose.

🔐 Considerations for maximum privacy and resistance to surveillance

Don’t just rely on TLS if you want to keep a particular piece of data private and highly resistant to surveillance. For such a use case:

Use a VPN during transmission, AND where feasible a protocol/service that implements end-to-end encryption

Have a plan to verify the authenticity of files (anti-tampering and integrity)

Consider whether it is important to be able to verify who sent the file, and choose an appropriate solution

Consider multiple layers of encryption with very strong secret keys

For example, 7-Zip supports AES-256 encrypted archives

Consider adding a layer of GPG encryption with a hardened config, using at least AES-256

Consider using encrypted containers that natively support multi-layer encryption

Consider splitting secret key material into shares to mitigate key compromise (

example⁠

ssss tool⁠

)

Never transmit the secret keys in their entirety, and consider encrypting the key shares for transmission

Transmit key shares via a different communication channel to the data being protected

Never store the secret keys / key shares insecurely (

cli example⁠

) (

gui example⁠

)

For enterprise+ grade security, you would want to inaugurate multiple key custodians to manage sensitive key material, and an InfoSec policy, with the objective that no single person can learn a full secret, and procedures for handling compromised keys.

Avoid transmission to / storage on services that use cloud providers

Especially those that are free of charge and/or incorporated in the UK or US and other countries that are known for surveilling internet communications.

With messenger services, this is hard to avoid these days. Signal, for example, uses US cloud providers to implement its service, but Signal implements end-to-end encryption, meaning that surveillance can only capture ciphertext.

Capturing ciphertext presents a future risk that data could be decrypted at a later date, when techniques or computational resource become available to break the used cipher.

This is why I recommend to avoid cloud providers when aiming for maximum privacy and resistance to surveillance.

If you cannot avoid cloud providers, then take additional measures to increase the complexity of the encryption approach to protect the sensitive data, so that the cost/difficulty of decryption is as high as possible. Secure handling of the secret key material becomes even more sensitive in this case.

Consider using a private CA for TLS server certificates to mitigate public/commercial CA exploitation

For HTTPS services consider using client certificates (mTLS) for stronger authentication

Consider splitting data into pieces and transmitting at different times and through different VPN exit nodes

Investigate whether MLS (RFC 9420 Message Layer Security) would be additive in file transmission security, it may well incorporate a number of the aforementioned points.

If you are looking for an “information-theoretically secure” encryption technique you could study the one-time pad

🔗⁠

. But be advised, using it in practice has some drawbacks compared to modern strong encryption ciphers.

If you have suggestions or critique drop a note

here⁠

Can HTTPS/TLS traffic be censored?

Yes, it is happening right now in countries such as .ru, .cn, .tr, .uk, and .in. For example, there is a detailed Wikipedia article about web content censorship in the UK

🔗⁠

. In contrast to much of the rest of the world, where ISPs are subject to state mandates, most content regulation in the United States occurs at the private or voluntary level, often because of threats or pressure.

Utilising the TLS SNI leak is common in this kind of censorship, as well as DNS blocking, src and dst IP header blocking. More advanced techniques are also deployed including deep packet inspection (DPI).

It is remarkable to see the lengths to which people will go to circumvent censorship. Take this project, for example:

readme.eng.md at master · bol-van/zapret⁠

⁠

A stand-alone (without 3rd party servers) DPI circumvention tool. May allow to bypass http(s) website blocking or speed shaping, resist signature tcp/udp protocol discovery.

Mitigating tracking and censorship using a VPN

As of writing, to mitigate tracking and censorship of this nature, a VPN or similar tunnelling technology is required. A VPN client's traffic is routed through an encrypted tunnel to the VPN server and NAT'ed out to the Internet. In this scenario, it becomes very difficult for eavesdroppers to correlate which nodes are talking to each other and which hostnames are being requested by which clients. The vast majority of threat actors will not be able to trace tunnelled traffic back to the source VPN client.

A VPN helps by:

Hiding client internet traffic within its encrypted tunnel until it reaches the VPN exit node.

Concealing which internet nodes are communicating.

Reducing the leak of TLS SNI hostnames by shifting the leak point to the VPN exit node, separating the originating client IP address from the traffic that follows.

Bypassing internet censorship in many situations, as long as the VPN service itself isn't blocked.

Typically, when monitoring VPN traffic, one can determine client X has connected to VPN service Y and then the trail goes cold. Traffic routed through a VPN is encrypted and a direct decryption or exploitation without the secret key is computationally very expensive.

Threat actors look for the path of least resistance and prioritise exploits with the lowest cost. For example, it is easier to exploit the VPN client or the VPN server than to try to decrypt or monitor what is happening inside the encrypted VPN tunnel.

It is critical to have robust security measures in place and to strengthen defences at both the user end (where the VPN connection begins) and the VPN server end (where the connection ends - the exit node). This involves implementing a combination of security protocols, firewalls, intrusion detection systems and other safeguards to create a secure barrier - known as perimeter security - against unauthorised access and potential breaches. Hardening these points further by applying security updates, tightening configurations and reducing the vulnerability surface is essential to protect the integrity of the VPN connection and the data flowing through it.

no-log VPNs

For privacy-conscious users, it is worth checking if your VPN provider has been audited for “no-logs”. Restore Privacy, LLC published their latest article on this topic on 2024-May.

Best No Logs VPN for 2024 (Only 9 Are Proven True)⁠

DNS leaks

An often overlooked aspect of using a VPN is DNS leakage. Citing Wikipedia

🔗⁠

The vulnerability allows an ISP, as well as any on-path eavesdroppers, to see what websites a user may be visiting. This is possible because the browser's DNS requests are sent to the ISP DNS server directly, and not sent through the VPN.

It is important to VPN users to pay attention to this aspect of using a VPN and take the required steps to mitigate it.

There is a good write up from ProtonVPN on the topic here

🔗⁠

There is a DNS leak test hosted here:

https://www.dnsleaktest.com⁠

⁠

WebRTC leaks

Citing Wikipedia: WebRTC stands for Web Real-Time Communication

🔗⁠

and it allows audio and video communication and streaming to work inside web pages by allowing direct peer-to-peer communication, eliminating the need to install plugins or download native apps.

Unfortunately, the technology can leak privacy information including a user’s real IP address, even when a VPN is in use and functioning properly. To quote Restore Privacy, LLC:

If you have not protected yourself against WebRTC leaks in your browser, any website you visit could obtain your real (ISP-assigned) IP address through WebRTC STUN requests. This is a serious problem.

While the WebRTC feature may be useful for some users, it poses a threat to those using a VPN and seeking to maintain their online privacy without their IP address being exposed.

Restore Privacy, LLC have published a useful guide on mitigating this leak:

How to Fix WebRTC Leaks (All Browsers)⁠

A useful IP leak test tool is available here:

https://ipleak.net⁠

which will inform if the browser has mitigated WebRTC leaks.

VPN performance degradation

A drawback of VPN is that it can degrade transfer speeds and negatively impact network latency (ping). After all, with a VPN you are sending traffic through an encrypted tunnel, so there will always be a performance cost. There are a number of factors that affect performance, including the technical cost of the VPN technology and protocol used, and bandwidth contention from other VPN users (shared service).

Using modern VPN protocols like Wireguard

🔗⁠

can do a lot to mitigate performance degradation.

🔐 A peer review of this post by my friend and colleague Alasdair pointed out that Wireguard has some defaults that are not privacy friendly, specifically its default logging configuration and how it handles IP address logging. There is a good article posted by Restore Privacy, LLC entitled

WireGuard VPN: Fast and Secure, but Not Private?⁠

Its a insightful read. Another Wireguard privacy article by PrivacySavvy Ltd is worth a look too:

WireGuard VPN Protocol is Fast. But is It Good For Your Privacy?⁠

⁠

Can I trust my CA?

For increased privacy and security, a SysOp may consider issuing client and/or server certificates from a private CA, using the latest/highest standards of encryption configuration standards.

Public CA’s like Let’s Encrypt offer easy to use and high standards of transport security to protect against the large majority of threat actors, BUT there is no way to know who may hold copies of the public CA’s private keys, and with said keys be able to capture traffic and decrypt packet streams. 🔐💡 Running a private CA shifts responsibility from the public/commercial CA to the SysOp as the custodian of the primary CA’s sensitive keys. One has to weigh up which is more secure and more probable to be exploited.

A SysOp could offer client certificates aka mTLS

Client certificates can replace or enhance HTTP basic access authentication and provide mutual transport layer security (mTLS)

🔗⁠

. aria2c supports client certificates via the --certificate option.

Consider the following diagram courtesy of Cloudflare, Inc.:

⁠

Client certificates offer a SysOp some security advantages:

A HTTP server can be configured to enforce a client to provide a certificate, issued by a specific CA.

Further access control can be configured to only allow client certificates with a specific certificate fingerprint. This allows a SysOp to restrict access to a resource to authenticated clients with specific certificates.

Client certificates may mitigate some forms of TLS exploits.

Client and server CA’s are independent. That is, a server certificate can be issued from one CA and the client CA from another. Client certs are often issued from a private CA, while server certs are typical issued by commercial or public CA’s.

The client/user benefits from:

If the CA is trusted, strong and verifiable authentication to a resource

TODO Research if clients can/will abort a connection if the server ignores or rejects the client certificate? i.e. Have clients force mutual authentication?

Security researcher Michael Stepankin (

@artsploit⁠

) posted an insightful read on mTLS on the GitHub blog:

mTLS: When certificate authentication is done wrong⁠

⁠

Notes on DNS privacy and security concerns

The phone book of the Internet! DNS provides name-to-address lookup and is a fundamental part of Internet traffic. Originally, by default, DNS traffic is unencrypted and unauthenticated and very fast. Unfortunately ISP’s, states, governments and threat actors can eavesdrop on DNS requests to gain insights on what a client is requesting.

Consider this: ISP’s keep logs and records of which customers use which IP addresses to connect to the Internet and use their service. So ISP’s have the data and can easily profile customers/households/businesses and determine which remote servers and websites are being accessed, just from DNS queries. You can imagine that this information could be valuable to states, governments and threat actors, so they might find ways to obtain it. You can also imagine that this information could come up in a court of law, where an individual or organisation could have their ISP records subpoenaed, or ‘court ordered’ to be provided as evidence in a case.

Given the insecure default nature of DNS, it would be relatively easy for an ISP or threat actor to exploit this weakness and attempt DNS spoofing

🔗⁠

which would allow a threat actor to capture network packets for the exploited host names, which could lead to sensitive information or credential leakage or other exploits. The typical attack vector would be to exploit a DNS resolver used by the victim to change DNS responses to an address chosen by the attacker. A more complex and sophisticated exploit would be a DNS man-in-the-middle

🔗⁠

attack to achieve the same result.

OK, lets not use ISP DNS? Yes, That is a good practice BUT doesn’t fully mitigate the insecure default nature of DNS.

Can we trust DNS from Cloudflare and Google et al?

Organisations that offer "free of charge” services typically do so with a profit motive, usually making money through advertising, cross-selling, up-selling, or by monetizing user data.

They must also cooperate with law enforcement when they request user data...

Can you trust them with your privacy? Probably not, especially if their T&C’s and other policies are not prioritising user privacy.

⁠

Original Artist:

geek-and-poke.com⁠

(Oliver Widder) (

original post⁠

)

⁠

Bill Woodcock of quad9 is quoted on SNB forums stating

🔗⁠

Content networks like Google and Cloudflare make money in a lot of ways, some of which depend upon the monetization of Personally Identifiable Information (PII). Whatever you may think about the morality of that, it's flat-out illegal in Europe

Quad9 was started because European privacy regulators asked us (meaning
PCH⁠
, in this case) to stand up a
GDPR-compliant⁠
recursive resolver, as an existence-proof that it was possible to run this critical infrastructure without paying for it by (with) PII. So, unlike others, Quad9 does not collect personal information. Quad9 does not have a concept of a "user" to hang records off of, and does not collect any IP addresses. Quad9 is the only big anycast resolver that doesn't collect personal information, and it's the only free one that's GDPR-compliant.

Citing Wikipedia

🔗⁠

, Bill is the

Quad9 Foundation⁠

(CH), Chairman of the Foundation Council, 2021–present, and

Packet Clearing House⁠

(PCH), Director, 1994-2001; Executive Director, 2001–present.

Privacy in online messaging

The shining example of the Signal foundation

In contrast to services like Facebook Messenger and WhatsApp, I really like how the Signal foundation have handled their privacy policy. To cite their policy:

Signal is designed to never collect or store any sensitive information. Signal messages and calls cannot be accessed by us or other third parties because they are always end-to-end encrypted, private, and secure.

Privacy of user data. Signal does not sell, rent or monetize your personal data or content in any way – ever.

The Signal Foundation operates a privacy-centric messaging service. They employ end-to-end encryption to ensure that messages are secure between user devices, and the minimal amount of data they retain on central servers - limited to user identifiers and activity timestamps - is the only information available to law enforcement or threat actors. This practice underscores their commitment to user privacy.

Due to the nature of Signal's implementation of end-to-end encryption and its approach to privacy by design, even if Signal wanted to store more user data, it couldn't because Signal's technical implementation doesn't allow it. The private keys are known only to the end-users and are stored on the end-users' devices. A middleman can only see the encrypted byte streams. This can and has been audited by browsing the open-source code available on Signals GitHub:

https://github.com/signalapp⁠

⁠
`Threema.ch`⁠
and
`element.io`⁠
⁠

Threema is a Swiss-based privacy-centric messaging service (with almost identical functionality to WhatsApp, Telegram and Signal), and as far as I know Threema GmbH shares a similar set of core principles with Signal's foundation. Depending on how things go with EU legislation and encrypted messaging, Threema's Swiss geography could be an advantage for privacy-conscious users in the future. There software is also open-source:

🔗⁠

I’ve been a Threema user since their founding years, and always been impressed by their software and development roadmap. Today they have an offering for individuals and for organisations.

Element.io is a privacy-centric messaging service utilising the open-source [matrix] protocol. Element and [matrix] have been adopted by some big names, including the German Ministry of Defence and various government departments in other countries, who have forked the technology to create their own versions of the solution.

Elements main entity appears to be in the UK, with EU subsidiaries in France and Germany.

I’ve used Element and its predecessors in various orgs and can say it's a great collaboration and messaging tool. Element is probably the most mature and feature rich [matrix] client and can connect with any accessible matrix server. One can choose to be responsible for ones own Matrix network and therefore have full control over data privacy and sovereignty.

Recent events relating to Telegram

In contrast to Signal and Threema, Telegram does not put user privacy first and end-to-end encryption is not the default messaging mode. Telegram is proprietary, closed-source software. Telegram's central servers are, for argument's sake, a large searchable and exploitable database of user data and messages, the same applies to Facebook Messenger and WhatsApp. Personally, I do not trust WhatsApp's end-to-end encryption because it is not an open implementation. Who knows who has copies of the private keys.

2024-Aug Telegram’s CEO Pavel Durov was arrested in France

🔗⁠

and shortly after we see headlines like

Telegram will now provide user info to governments in response to legal requests

Here is the related article

🔗⁠

⁠

Who can read my messages and data?

⁠

See also the section on surveillance⁠

The central servers of Telegram, much like those of Facebook Messenger and WhatsApp, can be thought of as vast databases that hold extensive amounts of user data and messages. For the sake of discussion, one could argue that these repositories are not only searchable but also potentially vulnerable to exploitation. This means that, in theory, user information and private communications could be compromised, extracted, and used for various purposes, ranging from targeted advertising to more malicious intents like identity theft or surveillance. A real nightmare scenario is when bad actors gain access to such repositories.

Because these services store and relate user data to a users identity, it would be theoretically possible for employees or law enforcement agencies to access and search through user data. Such data is typically rich with personally identifiable and private information. One could say that this data is a gold mine for companies with weak privacy ethics, and a gold mine for surveillance.

One could argue that Telegram, Facebook and WhatsApp are monetising user data in some way and/or also making it relatively easy for law enforcement to be able to search user data and make correlations.

Conclusion

If you want privacy for a particular service, you need to use a technology/service that puts privacy first and implements open industry standard end-to-end encryption. I would encourage privacy-conscious users to study the examples of Signal.org, Threema.ch and Element.io, and take a look at [matrix] technology. I strongly recommend that the technology you choose should be open source and auditable, and based on very strong cryptographic standards with a mature development methodology.

Message Layer Server MLS looks really promising. It will be interesting to see how it holds up against EU Chat Control 2.0 legislation. I noticed that [matrix] has an MLS tracking page

https://arewemlsyet.com/⁠

. See section on MLS

here⁠

👿 EU Chat control 2.0

Citing from Patrick Breyer’s info site

🔗⁠

on the topic:

On 11 May 2022
the European Commission presented a proposal⁠
which would make chat control searching mandatory for all e-mail and messenger providers and would even apply to so far securely end-to-end encrypted communication services.

To most, chat control is a significant threat to online privacy and anonymity. It will be interesting to see how things develop.

2024-July, the EFF posted

Now The EU Council Should Finally Understand: No One Wants “Chat Control”⁠

⁠

However it seems like as of writing there is still a push for Chat control 2.0: 2024-Sept:

New EU push for chat control: Will messenger services be blocked in Europe?⁠

and Tech Radar posted: 2024-Sept:

The controversial CSAM scanning proposal is back on the agenda⁠

⁠

Isn’t there a standard for end-to-end encrypted messaging?

Why yes, glad you asked! Its relatively fresh but its here and approved by the IESG. Its called MLS and the spec is detailed in RFC 9420 and managed by the IETF. Wow, that's a lot of new acronyms, lets break it down.

RFC is Request For Comments, IETF is the Internet Engineering Task Force, IESG is the Internet Engineering Steering Group, and MLS is Message Layer Security 😅👍

Publication of RFCs is part of the process of creating Internet Standards according to the IETF methodology, and some RFC’s become Internet Standards. Here is a graphic from the IETF data tracker for RFC 9420 depicting the RFCs evolutionary timeline:

⁠

Having listened to the MLS keynote referenced below, its encouraging to see that the authors have considered the evolution of secure messaging, looking back at OTR chats (

Pidgin.im⁠

with the

OTR plugin⁠

Signal⁠

and its derivatives and now MLS.

There is also evidence that the standard introduces some welcome optimisations.

Here is a citation from the MLS homepage 🔗

Messaging Layer Security (MLS) is a security layer for encrypting messages in groups of size two to many. It is being built by the MLS working group and designed to be efficient, practical and secure. The proposed MLS specification is in two parts:

- an
architecture document⁠
(
source⁠
,
diffs⁠
) setting out the context, problem domain and security requirements, and

- a
protocol document⁠
(
source⁠
,
diffs⁠
) defining the protocol itself.

MLS already has a number of implementations which are tracked here

🔗⁠

There is a good keynote from 2023 on MLS by Raphael Robert (RFC Author) and Konrad Kohbrok (RFC Contributor). CCC mirror:

RFC 9420 or how to scale end-to-end encryption with Messaging Layer Security⁠

YouTube mirror⁠

⁠

As of writing the

https://github.com/openmls⁠

project has ~600 Stars ⭐ and 33 contributors, so it seems to be gaining some traction. From the keynote, it looks like openmls is the project that was developed alongside the RFC and has developed some agency.

Where does MLS fit into the stack?

⁠

Coming back to topic of HTTP security and privacy

If legislation such as Chat Control 2.0 were to be implemented in the EU, one would expect services such as Signal to leave the EU market or face the prospect of operating outside the law. A friend of mine based in the UK drew my attention to the following BBC article from 2023-Feb which deals with this issue in relation to the UK:

Headline: Signal would 'walk' from UK if Online Safety Bill undermined encryption 🔗

If forced to weaken the privacy of its messaging system under the Online Safety Bill, the organisation "would absolutely, 100% walk" Signal president Meredith Whittaker told the BBC.

A related article published by Wired in 2023-Sep entitled: Britain Admits Defeat in Controversial Fight to Break Encryption

🔗⁠

⁠

The current weaknesses in TLS that I've described here would allow Signal traffic to be detected, profiled and blocked by ISPs (censorship). One could imagine that even if Signal were to implement and/or force support for TLS ECH, such traffic could be dropped by ISPs as being “non-compliant” and/or "too secure" for Chat Control 2.0 to operate on.

This could lead to a situation where users would have to use a VPN to use services such as Signal. For example, users would have to change their location/sovereignty outside the EU before connecting to the service. For example to .ch or .no locations? Who knows, maybe the EU would go after VPN providers too? How long before the legislation goes further and implies that any encrypted service should be controlled? How long before legislation is drafted that says TLS ECH is bad?

It is a shame that such issues exist. Personally, if and when the need arises, I want to be able to hold private conversations about current events and governance, or any topic without the risk of my data being used, surveilled or abused in any way.

💡 TODO: I need to follow up on the flip side of security, where actors are using secure messaging to pursue and orchestrate crime. For example, in a scenario where there is loss of life or abuse because law enforcement cannot monitor online messages or calls? What if a relative is involved? In 2023 the Netflix documentary Cyberbunker: The Criminal Underworld was released. There are a lot of talking points from the revelations in the film. Wikipedia article

🔗⁠

. Doku link

🔗⁠

Is AI a big “Uh Oh!” for privacy?

I’m fascinated by AI, especially the underlying technologies. I use AI in some way most days, having DeepL and Coda subscriptions to inspire me to write higher quality / more engaging content, and assist with proof reading and translations. So, I can see positive and moral use cases for AI, but on the flip side there are some real nightmare scenarios.

What are the chances that Meta/Facebook/WhatsApp executives have discussed training AI on their users' private messages? What are the chances that they have the moral compass and ethical code to say no?

I read recently about AI models being trained on user data, data that users would consider private - 😲.

Imagine AI with the knowledge of your past chat history and/or your social media posts? Nice! /s

Imagine AI that can predict your future actions and decisions with some degree of accuracy?

Consider the implications of prompting an AI trained on user data with the following:

Can you create a database containing users who are likely...

... to vote for political party x?

... involved in topic x?

... sympathetic towards x?

Can you create a network diagram of users who are likely to have had contact with user x?

Related articles on AI training on user data:

Scientific American:

Your Personal Information Is Probably Being Used to Train Generative AI Models⁠

klicksafe.de:

Facebook and Instagram want to use personal data for AI training⁠

Vox:

The tricky truth about how generative AI uses your data⁠

Wired:

How to Stop Your Data From Being Used to Train AI⁠

⁠

TODO and further research

⁠

What is moderate NAT and how to use it | Proton⁠

Making VPN stealthy | Yannik Schmidt Freelance⁠

Tips for how to get around the Turkish government censoring Proton’s services⁠

⁠

https://proton.me/blog/use-protonmail-anonymously⁠

https://proton.me/blog/can-encryption-be-broken⁠

⁠

Firefox related:

⁠

https://support.mozilla.org/en-US/kb/configuring-networks-disable-dns-over-https⁠

⁠

Quad9 might be a privacy-centric DNS alternative?

⁠

https://quad9.net/⁠

⁠

Research self-hosted recursive DNS

Self-hosted recursive DNS

https://docs.pi-hole.net/guides/dns/unbound/⁠

⁠

https://www.reddit.com/r/pihole/comments/myogpy/which_dns_provider_do_you_use_why/⁠

⁠

Cover DNS privacy mitigations

⁠

DNS To The Nines⁠

What is private DNS, and should you use it with a VPN? | Proton⁠

DNS leak prevention⁠

DNS leaks when using a VPN | Proton⁠

⁠

DNSSEC

⁠

https://www.cloudflare.com/dns/dnssec/how-dnssec-works/⁠

⁠

DNS over HTTPS aka DoH

⁠

Secure DNS (DoH, DoT) differences, performance, comparison⁠

DNS-over-HTTPS (DoH) | Public DNS | Google for Developers⁠

⁠

DNS over TLS aka DoT

⁠

https://en.wikipedia.org/wiki/DNS_over_HTTPS⁠

https://www.cloudflare.com/learning/dns/dns-over-tls/⁠

https://en.wikipedia.org/wiki/DNS_over_TLS⁠

https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions⁠

https://crypto.cloudflare.com/cdn-cgi/trace⁠

⁠

Verifying DoH setup:

https://superuser.com/a/1708223⁠

⁠

Example of using DoH with curl

Further research on TLS & ECH

⁠

Forward secrecy⁠

⁠

https://defo.ie/⁠

https://defo.ie/ech-check.php⁠

Research apache2 SSLUseStapling directive and consider SSLProtocol TLSv1.3 and consider SSLStrictSNIVHostCheck on (might cause privacy issues, See notes on OCSP.)

Looks like apache2 has SSL_TLS_SNI env var: