Magic Wormhole is a secure file-transfer tool, whose motto is “get things from one computer to another, safely.” It is most useful for ad hoc one-shot transfer situations, such as:

  • You’ve just sat down next to someone at a conference, and you want to give them a tarball of your favorite project from your laptop.

  • You’re talking on the phone with someone and need to give them a picture that you’re looking at on your computer.

  • You’ve just set up a new account for a coworker and need to get their SSH public key from their computer safely.

  • You want to copy your GPG private key from your old computer to your new laptop.

  • A colleague on IRC wants you to send them a logfile from your computer.

One distinctive feature of this tool is the use of a wormhole code : a short phrase like “4-bravado-waffle” that enables the transfer and must be conveyed from the sending client to the receiving one. When Alice sends a file to Bob, Alice’s computer will display this phrase. Alice must somehow get this phrase to Bob: typically, she would speak it to him over the phone, or type it to him over SMS or IRC. The code consists of a number and a few words, and is designed for easy and accurate transcription, even in a noisy environment.

These codes are single use. The security properties are simple: the first recipient who claims the code correctly will get the file, and nobody else. These properties are strong: nobody else can get the file because it is encrypted, and only the first correct claim can compute the decryption key. And they depend only upon the behavior of the client software: no server or internet eavesdropper can violate them. Magic Wormhole is unique in combining strong confidentiality with an easy workflow.

What It Looks Like

Magic Wormhole is currently only available as a Python-based command-line tool, but ports to other languages and runtime environments are underway. The most important projects are to develop a GUI application (where you can drag and drop the files to be transferred), and a mobile app.

  • 1: Alice runs wormhole send FILENAME on her computer, and it tells her the wormhole code (“4-bravado-waffle”).

  • 2: She then dictates this to Bob over the phone.

  • 3: Bob types the wormhole code into his computer.

  • 4: The two computers connect, then encrypt and transfer the file.

Figure 7-1
figure 1

Sender Screenshot

Figure 7-2
figure 2

Receiver Screenshot

Figure 7-3
figure 3

Magic Wormhole Workflow Diagram

How It Works

Magic Wormhole clients (both sender and receiver) connect to the same Rendezvous Server and exchange a handful of short messages. These messages are used to run a special cryptographic key-agreement protocol named SPAKE2, which is an authenticated version of the basic Diffie-Hellman key-exchange protocol (see the references below for more detail).

Each side starts their half of the SPAKE2 protocol state machine by feeding it a password: the randomly-generated wormhole code. Their half produces a message to deliver to the other side. When that message is delivered, the other side combines it with their own internal state to produce a session key. When both sides used the same wormhole code, their two session keys will be identical. Each time the protocol is run, they’ll get a new random session key. They use this session key to encrypt all subsequent messages, providing a secure connection to figure out the rest of the file transfer details.

Figure 7-4
figure 4

SPAKE2 Diagram

Any attacker who tries to intercept the connection will get only one chance to guess the code correctly. If they’re wrong, the two session keys will be completely different, and they attacker won’t be able to decrypt the rest of the messages. The real clients will notice the mismatch and exit with an error message before trying to send any file data.

Once they establish the secure connection, the magic wormhole clients exchange information about what they want to transfer, and then they work together to establish a Transit connection over which the bulk data transfer will take place. This starts with both sides opening a listening TCP network socket. They figure out all the IP addresses that might refer to this socket (there could be multiple ones) and build a list of connection hints , which they encrypt with the session key and send through the rendezvous server to the other side.

Each side attempts to make a direct connection to every connection hint it receives. The first attempt that succeeds is used for the file transfer. This works if both sides are on the same local network (for example, when both computers are on the same conference WiFi). Since they both try to connect to each other (regardless of which side is sending the file), this also works if at least one of the machines is a server with a public IP address. In practice, this appears to establish a direct connection about two-thirds of the time.

If both machines are behind different NAT firewalls, all the direct connections will fail. In this case, they fall back to using a central transit relay server that basically glues the two inbound TCP connections together.

In all cases, the file data is encrypted by the session key, so neither the rendezvous server nor the transit relay gets to see the contents of the file.

This same protocol can be used in other applications by importing the wormhole library and making API calls. For example, an encrypted instant-messaging application like Signal or Wire could use this to securely add a friend’s public key to your address book: instead of copying a large key string, you would instead tell your friend a wormhole code.

Network Protocols, Transfer Latency, Client Compatibility

The total transfer time, from the moment the sender launches the tool, to the last byte arriving at the receiver, is roughly the sum of three phases:

  • waiting for the receiver to finish typing in the wormhole code;

  • performing key agreement and negotiating a transit connection;

  • transferring the file over the encrypted channel.

The first phase depends upon the humans: the program will cheerfully wait several days for the receiver to finally type in the wormhole code. The last phase depends upon the size of the file and the speed of the network. Only the middle phase is really under the control of the protocol, so we want to make it as fast as possible. We try to minimize the number of messages that must be exchanged, and use a low-latency real-time protocol to accelerate this phase.

The rendezvous server effectively provides a persistent broadcast channel (i.e., a “pubsub” server) for each pair of clients. The sender connects first, leaves a message for the receiver, and waits for a response. Later, when the human on the receiving side finally starts up their wormhole program, the receiver will connect and collect that message, and send a few of its own. If either client has a network problem, their connection might get dropped, and it must be reestablished.

Network Protocols and Client Compatibility

Twisted makes it quite easy to build custom protocols over TCP or UDP, as seen in the first chapter of this book. We could have built a simple TCP-based protocol for the rendezvous connection. But when we think about the future, we’d like to see Magic Wormhole clients in other languages and runtime environments, like web pages or mobile operating systems. The protocol we build for a command-line Twisted application might not be easy to implement in other languages, or it might require network access that’s forbidden to those programs:

  • Web browsers can do WebSockets and WebRTC, but not raw TCP connections.

  • Browser extensions can do everything a web page can, and more, but must be implemented in specialized JavaScript where binary protocols are not very natural.

  • iOS/Android can do HTTP, but power management may prohibit long-lived connections, and non-HTTP requests might not activate the radios.

So, for cross-runtime compatibility, we must stick to things that a web browser can do.

The simplest such protocol would do plain HTTP GETs and POSTs, using the excellent treq package, which provides a requests-like API to Twisted-based programs. However, it isn’t clear how frequently the client ought to poll the server: we might poll once per second, wasting a lot of bandwidth to check for a response that won’t happen for an hour. Or we might save bandwidth by only checking once a minute, at the cost of adding 60 seconds of latency to a utility that should only take a second or two. Even polling once per second adds an unnecessary delay. With a real-time connection, the connection completes as fast as the network can carry the messages.

One trick to reduce this latency is “HTTP long polling” (sometimes known as COMET). In this approach, the magic wormhole client would make a GET or a POST as usual, but the relay server would pretend to take a really long time to deliver the response (in fact, the server would just stall the response until the other client connects to receive the file). One limitation is that the server must usually respond somehow, usually with a “please try again” error, within 30–60 seconds, or the client HTTP library may give up. Also, back-to-back messages (like the second and third messages sent by the clients) aren’t delivered immediately: the time it takes to send a request must be added to the latency of each message.

Another web-compatible real-time technique is called “Server Sent Events,” which is exposed to web content as the EventSource JavaScript object. This is a more principled way to do long polling: the client does a regular GET, but sets the Accept request header to the special value text/event-stream to tell the server that the connection should be kept open. The response is expected to contain a stream of encoded events, each on a single line. This is pretty easy to implement on the server; however, there is no off-the-shelf library for Twisted. The messages only travel in one direction (server to client), but that’s all we need for our protocol because we can use POSTs in the upstream direction. The biggest downside is that some web browsers (in particular IE and Edge) don’t support it.

Our solution is to use WebSockets . This is a well-standardized protocol, implemented in most browsers, and available as a library in many programming languages. It’s easy to use from Python and Twisted, thanks to the excellent Autobahn library (described in the next chapter). The connection looks just like a long-lived HTTP session, which makes it easier to integrate with existing HTTP stacks (and makes it more likely to work through proxies and TLS terminators). Keepalives are handled automatically. And it is a fast, real-time protocol, so messages are delivered as quickly as possible.

If we didn’t have Autobahn, we might reconsider. WebSockets are somewhat complicated to implement because they use a special kind of framing (to prevent confused servers from misinterpreting the traffic as some other protocol: you wouldn’t want an attacker’s web page to make your browser send DELETE commands to your company’s internal FTP server).

In the future, the rendezvous server will probably speak multiple protocols, not just WebSockets. WebRTC is the most compelling, because it includes support for ICE and STUN. These are protocols to perform “NAT hole-punching”, so two clients can make a direct Transit connection despite both of them being behind firewalls. WebRTC is mostly used for audio/videochat, but it includes APIs specifically for ordinary data transfer. And WebRTC is well-supported by most browsers. A browser-to-browser Magic Wormhole would be fairly easy to build and might perform better than the current CLI tool.

The problem is that support outside a browser environment is minimal, partially because of the audio/video focus. Most libraries seem to spend all their energy trying to support the audio codecs and video compression algorithms, leaving them less time for the basic connectivity layer. The most promising ones I’ve seen are written in C++, for which Python bindings are second class, making build and packaging difficult.

One other contender is the libp2p protocol developed for IPFS. This relies upon a swarm of nodes in a large distributed hash table (DHT), rather than a central server, but has been well tested, and has good implementations in at least Go and JavaScript. A Python version of libp2p could be very promising.

Server Architecture

The Rendezvous Server is written as a twisted.application.service.MultiService, with a listening port for the main WebSocket connection.

WebSockets are basically HTTP, and the Autobahn library makes it possible to use the same port for both. In the future this will let us host the pages and other assets of a web-based version of Magic Wormhole from the same origin as the rendezvous service. To set this up, the Rendezvous Server looks like this:

from twisted.application import service from twisted.web import static, resource from autobahn.twisted.resource import WebSocketResource from .rendezvous_websocket import WebSocketRendezvousFactory class Root(resource.Resource):     def __init__ (self):         resource.Resource. __init__ (self)         self.putChild(b"", static.Data(b"Wormhole Relay\n", "text/plain")) class RelayServer(service.MultiService):     def __init__ (self, rendezvous_web_port):         service.MultiService. __init__ (self)         ...         root = Root()         wsrf = WebSocketRendezvousFactory(None,self._rendezvous)         root.putChild(b"v1", WebSocketResource(wsrf))

self._rendezvous is our Rendezvous object that provides the internal API for the Rendezvous Server actions: adding messages to a channel, subscribing to channels, etc. When we add additional protocols, they will all use this same object.

WebSocketResource is Autobahn’s class for adding a WebSocket handler at any HTTP endpoint. We attach it as the “v1” child of Root, so if our server is on magic-wormhole.io, then the Rendezvous service will live at a URL of ws://magic-wormhole.io/v1. We reserve v2/ and the like for future versions of the protocol.

The WebSocketResource must be given a factory: we use our WebSocketRendezvousFactory from a neighboring module. This factory produces Protocol instances of our WebSocketRendezvous class, which has an onMessage method that examines the payload of each message, parses the contents, and invokes the appropriate action:

def onMessage(self, payload, isBinary):     msg = bytes_to_dict(payload)     try:         if "type" not in msg:             raise Error("missing 'type'")         self.send("ack", id=msg.get("id"))         mtype = msg["type"]         if mtype == "ping":             return self.handle_ping(msg)         if mtype == "bind":             return self.handle_bind(msg)         ...

Persistent Database

When both clients are connected at the same time, the rendezvous server delivers messages from one to the other right away. But at least the initial message must be buffered while waiting for the second client to connect: sometimes for just a few seconds, but sometimes for hours or days.

Early versions of the rendezvous server held these messages in memory. But then each time the host was rebooted (e.g., to upgrade the operating system), these messages were lost, and any clients waiting at that moment would fail.

To fix this, the server was rewritten to store all messages in an SQLite database. Every time a message arrives, the first thing the server does is to append it to a table. Once the message is safely stored, a copy is forwarded to the other client. The Rendezvous object wraps a database connection, and each method performs SELECTs and INSERTs.

The clients were also rewritten to tolerate losing a connection, as described in the next section, with state machines that retransmit any message that hasn’t been acknowledged by the server.

An interesting side effect of this work was that it enables an “offline mode”: two clients can exchange messages without ever being connected at the same time. While this doesn’t enable a direct file-exchange operation, it does allow use cases like exchanging public keys for a messaging application.

Transit Client: Cancelable Deferreds

After a session key is computed, the wormhole clients can communicate securely, but all their data is still being relayed by the rendezvous server. This is too slow for the bulk file-transfer phase: every byte must go up to the server, and then back down to the other client. It would be faster (and cheaper) to use a direct connection. However, sometimes the clients cannot make a direct connection (e.g., they are both behind NAT boxes), in which case they must use a “transit relay” server. The Transit Client is responsible for making the best connection that is possible.

As described earlier, the clients each open a listening TCP port, figure out their IP addresses, then send the address+port to the other side (through the encrypted rendezvous channel). To accommodate future connection mechanisms (perhaps WebRTC), this is generalized as a set of “connection hints” of various types. The current client recognizes three kinds of hints: direct TCP, transit-relay TCP, and Tor hidden-service TCP. Each hint includes a priority, so a client can encourage the use of cheaper connections.

Both sides initiate connections to every hint that they can recognize, starting with the high-priority hints first. Any hints that use the transit relay are delayed by a few seconds, to favor a direct connection.

The first connection that completes the negotiation process will win the race, at which point we use defer.cancel() to abandon all the losers. Those might still be waiting to start (sitting in the two-second delay imposed on relay connections), or trying to complete DNS resolution, or connected but waiting for negotiation to finish.

Deferred cancellation neatly handles all of these cases, because it gives the original creator of the Deferred an opportunity to avoid doing some work that’s now going to be ignored anyway. And if the Deferred has chained to another, the cancel() call follows this chain and gets delivered to the first Deferred that has not yet fired. For us, that means canceling a contender that is waiting for a socket to connect will cancel the connection attempt. Or canceling one that is connected but still waiting for a connection handshake will shut down the connection instead.

By structuring each step of the process as another Deferred, we don’t need to keep track of those steps: a single cancel() will do the right thing.

We manage this race with a utility function in src/wormhole/transit.py:

class _ThereCanBeOnlyOne:     """Accept a list of contender Deferreds, and return a summary Deferred. When the first contender fires successfully, cancel the rest and fire the summary with the winning contender's result. If all error, errback the summary.     """     def __init__ (self, contenders):         self._remaining = set(contenders)         self._winner_d = defer.Deferred(self._cancel)         self._first_success = None         self._first_failure = None         self._have_winner = False         self._fired = False def _cancel(self, _):     for d in list(self._remaining):         d.cancel()     # since that will errback everything in _remaining, we'll have     # hit _maybe_done() and fired self._winner_d by this point     def run(self):         for d in list(self._remaining):             d.addBoth(self._remove, d)             d.addCallbacks(self._succeeded,self._failed)             d.addCallback(self._maybe_done)         return self._winner_d     def _remove(self, res, d):         self._remaining.remove(d)         return res     def _succeeded(self, res):         self._have_winner = True         self._first_success = res         for d in list(self._remaining):             d.cancel()     def _failed(self, f):         if self._first_failure is None:             self._first_failure = f     def _maybe_done(self, _):         if self._remaining:             return         if self._fired:             return self._fired = True         if self._have_winner:             self._winner_d.callback(self._first_success)         else:             self._winner_d.errback(self._first_failure) def there_can_be_only_one(contenders):     return _ThereCanBeOnlyOne(contenders).run()

This is exposed as a function, not a class. We need to turn a collection of Deferreds into a single new Deferred, and a class constructor can only return the new instance (not a Deferred). If we exposed _ThereCanBeOnlyOne as the main API, callers would be forced to use an awkward d = ClassXYZ(args).run() syntax (precisely the syntax we hide inside our function). This would add several opportunities for mistakes:

  • What if they call run() twice?

  • What if they subclass it? what sort of compatibility are we promising?

Note that if all the contender Deferreds fail, the summary Deferred will fail too. In this case, the errback function will receive whatever Failure instance was delivered with the first contender failure. The idea here is to report common-mode failures usefully. Each target will probably behave in one of three ways:

  • successful connection (maybe fast or maybe slow);

  • fail because of something specific to the target: it uses an IP address that we can’t reach, or a network filter blocks the packets;

  • fail because of something not specific to the target, for example, we aren’t even connected to the internet;

If we’re in the latter case, all the connection failures will be the same, so it doesn’t matter which one we report. Recording the first should be enough to let the user figure out what went wrong.

Transit Relay Server

The code for the Transit Relay is in the magic-wormhole-transit-relay package. It currently uses a custom TCP protocol, but I hope to add a WebSockets interface to enable browser-based clients to use it too.

The core of the relay is a Protocol for which pairs of instances (one per client) are linked together. Each instance has a “buddy,” and every time data arrives, that same data is written out to the buddy:

class TransitConnection(protocol.Protocol):     def dataReceived(self, data):         if self._sent_ok:             self._total_sent += len(data)             self._buddy.transport.write(data)             return         ...     def buddy_connected(self, them):         self._buddy = them         ...         # Connect the two as a producer/consumer pair. We use streaming=True,         # so this expects the IPushProducer interface, and uses         # pauseProducing() to throttle, and resumeProducing() to unthrottle.         self._buddy.transport.registerProducer(self.transport,True)         # The Transit object calls buddy_connected() on both protocols, so         # there will be two producer/consumer pairs.     def buddy_disconnected(self):         self._buddy = None         self.transport.loseConnection()     def connectionLost(self, reason):         if self._buddy:             self._buddy.buddy_disconnected()         ...

The rest of the code has to do with identifying exactly which connections should be paired together. Transit clients write a handshake string as soon as they connect, and the relay looks for two clients that wrote the same handshake. The remainder of the dataReceived method implements a state machine that waits for the handshake to arrive, then compares it against other connections to find a match.

When the buddies are linked, we establish a Producer/Consumer relationship between them: Alice’s TCP transport is registered as a producer for Bob’s, and vice versa. When Alice’s upstream link is faster than Bob’s downstream link, the TCP Transport connected to Bob’s TransitConnection will fill up. It will then call pauseProducing() on Alice’s Transport, which will remove her TCP socket from the reactor’s readable list (until resumeProducing() is called). This means the relay won’t read from that socket for a while, causing the kernel’s inbound buffer to fill, at which point the kernel’s TCP stack shrinks the TCP window advertisement, which tells Alice’s computer to stop sending data until it catches up.

The net result is that Alice observes a transfer rate that is no greater than what Bob can handle. Without this Producer/Consumer linkage, Alice would write data to the relay as fast as her connection allows, and the relay would have to buffer all of it until Bob caught up. Before we added this, the relay would occasionally run out of memory when people sent very large files to very slow recipients.

Wormhole Client Architecture

On the client side, the wormhole package provides a Wormhole library to establish wormhole-style connections through the server, a Transit library to make encrypted direct TCP connections (possibly through a relay), and a command-line tool to drive the file-transfer requests. Most of the code is in the Wormhole library.

The Wormhole object is built with a simple factory function, and has a Deferred-based API to allocate a wormhole code, discover what code was selected, and then send/receive messages:

import wormhole @inlineCallbacks def run():     w = wormhole.create(appid, relay_url, reactor)     w.allocate_code()     code = yield w.get_code()     print "wormhole code:", code     w.send_message(b"outbound message")     inbound = yield w.get_message()     yield w.close()

We use a create factory function, not a class constructor, to build our Wormhole object. This lets us keep the actual class private, so we can change the implementation details without causing compability breaks in the future. For example, there are actually two flavors of Wormhole objects. The default has a Deferred-based interface, but if you pass an optional delegate= argument into create, you get an alternate one that makes calls to the delegate object intead of firing a Deferred.

create takes a Reactor, rather than importing one internally, to allow the calling application to control which type of reactor is used. This also makes unit tests easier to write, because we can pass in a fake reactor where, for example, network sockets are stubbed out, or one where we get explicit control over the clock.

Internally, our Wormhole object uses over a dozen small state machines, each of which is responsible for a small part of the connection and key-negotiation process. For example, the short integer at the beginning of a wormhole code (the “4” in 4-bravado-waffle) is called a Nameplate , and these are allocated, used, and released, all by a single dedicated state machine. Likewise, the server hosts a Mailbox where the two clients can exchange messages: each client has a state machine that manages their view of this Mailbox, and knows when they want it to be opened or closed, and ensures that all messages are sent at the right time.

Deferreds vs State Machines, One-Shot Observer

While the basic message flow is pretty simple, the full protocol is fairly complex. This complexity stems from a design goal of tolerating connection failures (and subsequent reconnections), as well as server shutdowns (and subsequent restarts).

Each resource that the client might allocate or reserve must be freed at the right time. So, the process of claiming Nameplates and Mailboxes is carefully designed to always move forward, despite connections coming and going.

It is further complicated by another design goal: applications that use the library can save their state to disk, shut down completely, then restart at a later time and pick up where they left off. This is intended for messaging applications that get started and shut down all the time. For this to work, the application needs to know when a wormhole message has arrived, and how to serialize the protocol’s state (along with everything else in the application). Such applications must use the Delegate API.

Deferreds are a good choice for dataflow-driven systems in which any given action can happen exactly once, but they are hard to serialize. And for states that might roll forward and then roll back, or for events which can occur multiple times (more of a “stream” interface), state machines might be better. Earlier versions of the wormhole code used more Deferreds, and it was harder to handle connections being lost and restarted. In the current version, Deferreds are only used for the top-level API. Everything else is a state machine.

The Wormhole object uses over a dozen interlocking state machines, all of which are implemented with Automat. Automat is not a part of Twisted per se, but it was written by members of the Twisted community, and one of its first use cases was Twisted’s ClientService (this is a utility that maintains a connection to a given endpoint, reconnecting any time the connection is lost, or when the connection process fails; Magic Wormhole uses ClientService for the connection to the Rendezvous server).

As a specific example, Figure 7-5 shows the Allocator state machine, which manages the allocation of Nameplates. These are allocated by the rendezvous server upon request by the sending side (unless the sender and receiver have decided upon a code offline, in which case both sides type the code into their clients directly).

At any given moment, the connection to the rendezvous server is either established or not, and the transitions between these two states causes a connected or lost message to be dispatched to most state machines, including the Allocator. The allocator remains in one of the two “idle” states (S0A idle+disconnected, or S0B idle+connected) until/unless it is needed. If the higher-level code decides that a nameplate is required, it sends the allocate event. If the Allocator was connected at that moment, it tells the Rendezvous Connector to transmit an allocate message (the box labelled RC.tx_allocate), then moves to state S1B where it waits for a response. When the response arrives (rx_allocated), it will choose random words that make up the rest of the code, inform the Code state machine that one has been allocated (C.allocated()), and move to the terminal S2: done state.

Until the rx_allocated response is received, we can’t know if the request was delivered successfully or not. So we must 1: make sure to retransmit the request each time the connection is reestablished; and 2: make sure the request is idempotent, so that the server reacts to two or more requests the same way it would react to a single request. This ensures that the server behaves correctly in both cases.

Figure 7-5
figure 5

Allocator state machine

We might be asked to allocate a nameplate before the connection has been established. The path from S1A to S1B is where the allocate request is transmitted in either case: connecting before discovering the need to allocate, and reconnecting after sending the allocation request but not yet hearing the response.

This pattern appears in most of our state machines. For more complex examples, look at the Nameplate or the Mailbox machines, which create or subscribe to a named channel on the rendezvous server. In both cases, the states line up into two columns: either “disconnected” on the left, or “connected” on the right. The vertical position within the column indicates what we’ve accomplished so far (or what we still need to do). Losing a connection moves us from right to left. Establishing a connection moves us from left to right, and generally sends a new request message (or retransmits an earlier one). Receiving a response moves us downward, as does being instructed to achieve something from a higher-level state machine.

The top-level Boss machine is where the state machines give way to Deferreds. Applications that import the magic wormhole library can ask for a Deferred that will fire when an important event occurs. For example, an application can create a Wormhole object and allocate a code like this:

from twisted.internet import reactor from wormhole.cli.public_relay import RENDEZVOUS_RELAY import wormhole # set APPID to something application-specific w = wormhole.create(APPID, RENDEZVOUS_RELAY, reactor) w.allocate_code() d = w.get_code() def allocated_code(code):     print("the wormhole code is:{}".format(code)) d.addCallback(allocated_code)

The Allocator state machine delivers the allocated messages to the Code machine (C.allocated). The Code machine will deliver the code to the Boss (B.got_code), the Boss machine will deliver it to the Wormhole object (W.got_code), and the Wormhole object will deliver it to any waiting Deferreds (which were constructed by calling get_code()).

One-Shot Observers

The following excerpt from src/wormhole/wormhole.py shows the “one-shot observer” pattern used to manage the delivery of wormhole codes, both from allocation (described above) and interactive input:

@implementer(IWormhole, IDeferredWormhole) class _DeferredWormhole(object):     def __init__ (self):         self._code = None         self._code_observers = []         self._observer_result = None         ...     def get_code(self):         if self._observer_result is not None:             return defer.fail(self._observer_result)         if self._code is not None:             return defer.succeed(self._code)         d=defer.Deferred()         self._code_observers.append(d)         return d     def got_code(self, code):         self._code = code         for d in self._code_observers:             d.callback(code)         self._code_observers[:] = []     def closed(self, result):         if isinstance(result,Exception):             self._observer_result = failure.Failure(result)         else:             # pending Deferreds get an error             self._observer_result = WormholeClosed(result)         ...         for d in self._code_observers:             d.errback(self._observer_result)

get_code() might be called any number of times. For the standard CLI filetransfer tool, the sending client allocates the code, and waits for get_code() to fire so it can display the code to the user (who must dictate it to the receiver). The receiving client is told the code (either as an invocation argument, or via interactive input, with tab completion on the words), so it doesn’t bother calling get_code(). Other applications might have reasons to call it multiple times.

We want all these queries to get the same answer (or error). And we want their callback chains to be independent.

Promises/Futures vs. Deferreds

Futures come from the Actor model, by Carl Hewitt, and languages like Joule and E, and other early object-capability systems (in which they’re known as Promises). They represent a value that is not available yet, but which (might) resolve to something eventually, or might “break” and never refer to anything.

This lets programs talk about things that don’t yet exist. This might seem unhelpful, but there are plenty of useful things that can be done with not-yet-existent things. You can schedule work to happen when they do become available, and you can pass them into functions that can themselves schedule this work. In more advanced systems, Promise Pipelining lets you send messages to a Promise, and if that promise actually lives on a different computer entirely, the message will chase the promise to the target system, which can cut out several roundtrips. In general, they help programmers describe their future intentions to the compiler or interpreter, so it can better plan out what to do.

Deferreds are closely related, but are unique to Twisted. They serve more as a callback management tool than a fully fledged Promise. To explore how they differ, we should first explain how real Promises work.

In E, the object-capability language that most fully explored Promises, there is a function named makePromiseResolverPair() , which returns two separate objects: a Promise and a Resolver. The only way to resolve the promise is with the Resolver, and the only way to learn of the resolution is with the Promise. The language provides a special syntax, the “when” block, which lets the programmer write code that will execute only after the promise has been resolved to some concrete value. If Magic Wormhole were written in E, the get_code() method would return a Promise, and it would be displayed to the user like this:

p = w.get_code(); when (p) {     writeln("The code is:", p); }

Promises are available in modern JavaScript (ES6), thanks to the sizable overlap between the object-capability community and the TC39 standards organization. These Promises do not have any special syntax to wait for resolution, instead relying upon JavaScript’s convenient anonymous functions (including the arrow function syntax introduced in ES6). The corresponding JavaScript code would look like:

p=w.get_code(); p.then(code=>{console.log("The code is:",code);});

A significant difference between E’s Promises, JS Promises, and Twisted’s Deferreds is in how you chain them together. The Javascript then() method returns a new Promise, which fires if and when the callback function finishes (if the callback returns an intermediate promise, the then() promise won’t fire until the intermediate one fires). So, given a single “parent” promise, you can build two separate processing chains like this:

p=w.get_code(); function format_code(code){     return slow_formatter_that_returns_a_promise(code); } p.then(format_code).then(formatted => {console.log(formatted);}); function notify_user(code){     return display_box_and_wait_for_approval(code); } p.then(notify_user).then(approved => {console.log("code delivered!");});

In JavaScript, these two actions will run “in parallel,” or at least neither will interfere with the other.

Twisted’s Deferreds, on the other hand, build a chain of callbacks without creating additional Deferreds.

d1=w.get_code() d=d1.addCallback(format_code) assert d1 is d # addCallback returns the same Deferred!

This looks a bit like the JavaScript “attribute construction” pattern, common in web frameworks (e.g., d3.js, jQuery) that build up an object across many attribute-invocation calls:

s = d3.scale()       .linear()       .domain([0,100])       .range([2,40]);

This chaining behavior of Deferreds can cause surprises, especially when trying to create parallel lines of execution:

d1 = w.get_code() d1.addCallback(format_code).addCallback(print_formatted) # wrong! d1.addCallback(notify_user).addCallback(log_delivery)

In that example, notify_user is only called after print_formatted finishes, and it won’t be called with the code: instead it will get whatever value print_formatted returned. Our coding pattern (two lines, each of which starts with d1.addCallback) is deceptive. In fact, the code above is exactly equivalent to:

d1 = w.get_code() d1.addCallback(format_code) d1.addCallback(print_formatted) d1.addCallback(notify_user) # even more obviously wrong! d1.addCallback(log_delivery)

Instead, we need a new Deferred that will fire with the same value but lets us establish a new chain of execution:

def fanout(parent_deferred, count):     child_deferreds = [Deferred() for i in range(count)]     def fire(result):         for d in child_deferreds:             d.callback(result)     parent_deferred.addBoth(fire)     return child_deferreds d1 = w.get_code() d2, d3 = fanout(d1,2) d2.addCallback(format_code) d2.addCallback(print_formatted) d3.addCallback(notify_user) d3.addCallback(log_delivery)

This is enough of a nuisance that in my projects, I usually create a utility class named OneShotObserverList. This “observer” has a when_fired() method (that returns a new, independent Deferred), and a fire() method (which fires them all). when_fired() can be called either before or after fire().

The Magic Wormhole code quoted above (get_code() / got_code()) is a subset of the full OneShotObserverList. There are several ways that the connection process might fail, but they all call closed() with a Failure instance (a successful/intentional close will call closed() with a non-Failure, which is then wrapped in a WormholeClosed exception). This code ensures that every Deferred returned by get_code() will be fired exactly once, with either success (and the code), or a Failure.

Eventual-Send, Synchronous Testing

Another aspect of Promises that comes from E and the object-capability community is the eventual send . This is a facility to queue a method invocation for some subsequent turn of the event loop. In Twisted, this is basically a reactor.callLater(0, callable, argument). In E and JavaScript, Promises automatically provide this guarantee for their callbacks.

Eventual send is a simple and robust way to avoid a number of ordering hazards. For example, imagine a general observer pattern (with more functionality than the simple OneShotObserverList described above):

class Observer:     def __init__ (self):         self.observers = set()     def subscribe(self, callback):         self.observers.add(callback)     def unsubscribe(self, callback):         self.observers.remove(callback)     def publish(self, data):         for ob in self.observers:             ob(data)

Now, what happens if one of the callback functions invokes subscribe or unsubscribe, modifying the list of observers while in the middle of the loop? Depending upon how iteration works, the newly added callback might receive the current event, or it might not. In Java, the iterator might even throw a ConcurrentModificationException.

Reentrancy is another potential surprise: if some callback publishes a new message to the same observer, then the publish function will be invoked a second time while the first invocation is still running, which can violate many common assumptions the programmer might have made (especially if the function keeps state in instance variables). Finally, if a callback raises an exception, do the remaining observers see the event, or are they bypassed?

These unexpected interactions are collectively known as “plan-coordination hazards,” and the consequences include dropped events, duplicated events, non-deterministic ordering, and infinite loops.

Meticulous programming can avoid many of these failure modes: we could duplicate the observer list before iteration, catch/discard exceptions in the callbacks, and use a flag to detect reentrant calls. But it is far simpler and more robust to use an eventual send with each call:

def publish(self, data):     for ob in self.observers:         reactor.callLater(0, ob, data)

I’ve used this with great success in many projects (Foolscap, Tahoe-LAFS), and it removes entire classes of bugs. The downside is that testing becomes more difficult, since the effects of an eventual send cannot be checked synchronously. In addition, the lack of causal stack traces makes debugging tricky: if the callback raises an exception, the traceback doesn’t make it clear why that function was called. Deferreds have similar concerns, for which the defer.setDebugging(True) function can help.

With Magic Wormhole, I’ve been experimenting with using synchronous unit tests instead of eventual send.

Asynchronous Testing with Deferreds

Twisted has a unit test system named Trial, which builds upon the stdlib unittest package by providing specialized methods for handling Deferreds. The most obvious feature is that a test case can return a Deferred, and the test runner will wait for it to fire before declaring success (or allowing the next test to run). When combined with inlineCallbacks, this makes it easy to test that certain things happen in a specific order:

@inlineCallbacks def test_allocate_default(self):     w = wormhole.create(APPID,self.relayurl, reactor)     w.allocate_code()     code = yield w.get_code()     mo = re.search(r"^\d+-\w+-\w+$", code)     self.assert_(mo, code)     # w.close() fails because we closed before connecting     yield self.assertFailure(w.close(), LonelyError)

In that test, w.allocate_code() initiates the allocation of a code, and w.get_code() returns a Deferred that will eventually fire with the complete code. In between, the Wormhole object must contact the server and allocate a nameplate (the test launches a local rendezvous server in setUp(), rather than relying upon the real server). The yield w.get_code() takes that Deferred and waits for it to finish, then assigns the result to code so we can test its structure later.

Of course, what really happens is that the test function returns a Deferred and goes back to the event loop, then at some point in the future the server’s response arrives and causes the function to be resumed where it left off. If a bug prevents the get_code() Deferred from being fired, the test will wait quietly for two minutes (the default timeout), then declare an error.

The self.assertFailure() clause takes a Deferred and a list (*args) of exception types. It waits for the Deferred to resolve, then requires that it was errbacked with one of those exceptions: if the Deferred’s .callback() is invoked (i.e., not an error), assertFailure flunks the test. And if the Deferred’s .errback() is invoked with the wrong kind of error, it also flunks the test.

For us, this serves three purposes. The Wormhole API requires that you call w.close() when you’re done, and close returns a Deferred that fires when everything is fully shut down. We use this to avoid moving on to the next test until everything has stopped moving from the previous one (all network sockets are shut down, all timers have been retired), which also avoids triggering an “unclean reactor” error from Trial.

This Deferred also gives applications a way to discover connection errors. In this test, we’re only running a single client, so there’s nobody for it to connect to, and the close Deferred will be errbacked with LonelyError. We use assertFailure to make sure that no other error happened, which catches all the usual coding errors that our unit tests are designed to find, like maybe a NameError because we misspelled a method somewhere.

The third purpose is that it keeps the overall test from being flunked. In other tests, where the wormhole connects successfully, we use a simple yield w.close() at the end of the test. But in this case, the LonelyError errback would look like a problem to Trial, which would mark the test as failed. Using assertFailure tells Trial that it’s ok for this Deferred to fail, as long as it fails in a very specific way.

Synchronous Testing with Deferreds

test_allocate_default is really an integration test , which is exercising multiple pieces of the system at once (including the rendezvous server and the loopback network interface). These tests tend to be thorough but somewhat slow. They also don’t provide predictable coverage.

Tests that wait for a Deferred to happen (either by returning one from the test, yielding one in the middle of an @inlineCallbacks function, or calling assertFailure) imply that you aren’t entirely sure quite when that event will happen. This separation of concerns is fine when an application is waiting for a library to do something: the details of what will trigger the callback are the library’s job, not the application. But during unit tests, you should know exactly what to expect.

Trial offers three Deferred-managing tools that do not wait for the Deferred to fire: successResultOf , failureResultOf , and assertNoResult . These assert that the Deferred is currently in a specific state, rather than waiting for a transition to occur.

They are most commonly used with the Mock class, to reach “into” some code under test, to provoke specific internal transitions at a known time.

As an example, we’ll look at the tests of Magic Wormhole’s tor support. This feature adds an argument to the command-line tools, which causes all connections to be routed through a Tor daemon, so wormhole send --tor won’t reveal your IP address to the rendezvous server (or the recipient). The details of finding (or launching) a suitable Tor daemon are encapsulated in a TorManager class, and depends upon the external txtorcon library. We can replace txtorcon with a Mock, then we exercise everything above it to make sure our TorManager code behaves as expected.

These tests exercise all of our Tor code, without actually talking to a real Tor daemon (which would clearly be slow, unreliable, and unportable). They accomplish this by assuming that txtorcon works as advertised. We don’t assert anything about what txtorcon actually does: instead we record and inspect everything we told txtorcon to do, then we simulate the correct txtorcon responses and examine everything that our own code does in reaction to those responses.

The simplest test checks to see what happens when txtorcon is not installed: normal operation should not be affected, but trying to use --tor should cause an error message. To make this easier to simulate, the tor_manager.py module is written to handle an import error by setting the txtorcon variable to None:

# tor_manager.py try:     import txtorcon except ImportError:     txtorcon = None

This module has a get_tor() function , which is defined to return a Deferred that either fires with a TorManager object, or with a NoTorError Failure. It returns a Deferred because, in normal use, it must establish a connection to the Tor control port before anything else can happen, and that takes time. But in this specific case, we know it should resolve immediately (with NoTorError), because we discover the ImportError without waiting for anything. So, the test looks like this:

from ..tor_manager import get_tor class Tor(unittest.TestCase):     def test_no_txtorcon(self):         with mock.patch("wormhole.tor_manager.txtorcon",None):             d = get_tor(None)         self.failureResultOf(d, NoTorError)

The mock.patch ensures that the txtorcon variable is None, even though the txtorcon package is always importable during tests (our setup.py marks txtorcon as a dependency in the [dev] extra). The Deferred returned by get_tor() is already in the errback state by the time our test regains control. self.failureResultOf(d, *errortypes) asserts that the given Deferred has already failed, with one of the given error classes. And because failureResultOf tests the Deferred immediately, it returns immediately. Our test_no_txtorcon does not return a Deferred, nor does it use @inlineCallbacks.

A similar test exercises the precondition checks inside get_tor() . For each typecheck that this function does, we exercise it with a call. For example, the launch_tor= argument is a Boolean flag that says whether the tor_manager should spawn a new copy of Tor, or try to use a preexisting one. If we pass in a value that isn’t True or False, we should expect the Deferred to fire with a TypeError:

def test_bad_args(self):     d = get_tor(None, launch_tor="not boolean")     f = self.failureResultOf(d,TypeError)     self.assertEqual(str(f.value), "launch_tor= must be boolean")

This entire test runs synchronously, without waiting for any Deferreds. A collection of tests like this exercises every line and every branch in the tor_manager module in 11 milliseconds.

Another common test is to make sure that a Deferred has not fired yet, because we haven’t yet triggered the condition that would allow it to fire. This is usually followed by a line that triggers the event, then an assertion that the Deferred is either resolved successfully (with some specific value), or has failed (with some specific exception).

The magic wormhole Transit class manages the (hopefully direct) client-to-client TCP connections used for bulk data transfer. Each side listens on a port and builds a list of “connection hints” based on every IP address it might possibly have (including several local addresses that are unlikely to be reachable). Each side then initiates connections to all of their peer’s hints at the same time. The first one to connect successfully and perform the right handshake is declared the winner, and all the others are canceled.

A utility function named there_can_be_only_one() (described earlier) is used to manage this race. It takes a number of individual Deferreds, and returns a single Deferred that fires when the first has succeeded. Twisted has some utility functions that do something similar (DeferredList has been around forever), but we needed something that would cancel all the losing contenders.

To test this, we use Trial’s assertNoResult(d) and value = successResultOf(d)

features:

class Highlander(unittest.TestCase):     def test_one_winner(self):         cancelled = set()         contenders = [Deferred(lambda d, i=i: cancelled.add(i))                       for i in range(5)]         d = transit.there_can_be_only_one(contenders)         self.assertNoResult(d)         contenders[0].errback(ValueError())         self.assertNoResult(d)         contenders[1].errback(TypeError())         self.assertNoResult(d)         contenders[2].callback("yay")         self.assertEqual(self.successResultOf(d),"yay")         self.assertEqual(cancelled, set([3,4]))

In this test, we make sure that the combined Deferred has not fired right away, and also that it does not fire even when some of the component Deferreds have failed. When a component member does succeed, we check that both the combined Deferred has fired with the correct value, and that the remaining contenders have been canceled.

successResultOf() and failureResultOf() have one catch: you can’t call them multiple times on the same Deferred, because internally they add a callback to the Deferred, which interferes with any subsequent callbacks (including additional calls to successResultOf). There’s no good reason to do this, but it might cause you some confusion if you have a subroutine that checks the state of a Deferred, and you use that subroutine multiple times. However, assertNoResult can be called as many times as you like.

Synchronous Testing and Eventual Send

The Twisted community has been moving toward this immediate/mocked style for several years. I’ve only recently started using it, but I’m pleased with the results: my tests are faster, more thorough, and more deterministic. However I’m still torn: there’s a lot of value in using eventual send. In there_can_be_only_one(), the contender Deferreds are mostly independent of the callbacks attached to the result, but I’m still worried about bugs, and I’d feel more comfortable if the callback was executed on a different turn of the event loop.

But anything involving the actual Reactor is difficult to test without waiting for a Deferred to fire. So, I’m looking for ways to combine this immediate test style with an eventual-send utility.

When I first started using eventual send, and Glyph saw what I was doing with reactor.callLater(0, f), he wrote me a better version, which we use in both Foolscap and Tahoe-LAFS. It maintains a separate queue of callbacks, and only has one callLater outstanding at any given moment: this is more efficient if there are thousands of active calls, and avoids depending upon reactor.callLater maintaining the activation order of equal-value timers.

The nice feature of his eventually() is that it comes with a special function named flushEventualQueue() , which repeatedly cycles the queue until it is empty. This should allow tests to be written like this:

class Highlander(unittest.TestCase):     def test_one_winner(self):         cancelled = set()         contenders = [Deferred(lambda d, i=i: cancelled.add(i))                       for i in range(5)]         d = transit.there_can_be_only_one(contenders)         flushEventualQueue()         self.assertNoResult(d)         contenders[0].errback(ValueError())         flushEventualQueue()         self.assertNoResult(d)         contenders[1].errback(TypeError())         flushEventualQueue()         self.assertNoResult(d)         contenders[2].callback("yay")         flushEventualQueue()         self.assertEqual(self.successResultOf(d),"yay")         self.assertEqual(cancelled, set([3,4]))

The downside is that flushEventualQueue lives on a singleton instance of the eventual-send manager, which has all the problems of using an ambient reactor. To handle this cleanly, there_can_be_only_one() should be given this manager as an argument, just like modern Twisted code passes the Reactor into functions that need it, rather than importing one directly. In fact, if we were to rely upon reactor.callLater(0), we could test this code with a Clock() instance and manually cycle the time forward to flush the queue. Future versions of the code will probably use this pattern.

Summary

Magic Wormhole is a file-transfer application with strong security properties that stem from the SPAKE2 cryptographic algorithm at its core, with a library API for embedding into other applications. It uses Twisted to manage multiple simultaneous TCP connections, which usually enables fast direct transfers between the two clients. The Autobahn library provides WebSocket connections that will enable compatibility with future browser-based clients. The test suite uses Twisted utility functions to examine the state of each Deferred as they are cycled through their operating phases, allowing fast synchronous tests.

References