Merge branch 'master' into oping-flm

author: Thijs Paelman <[email protected]> 2022-12-21 17:22:36 +0100
committer: Thijs Paelman <[email protected]> 2022-12-21 17:22:36 +0100
commit: 11979c8b9c88c3bb993350898844561ea55dcd06 (patch)
tree: 4b6bb1174d38a930b1175ce95c29134be7666ee9 /content/en/blog/20221207-loc-id-split.md
parent: 2350da508cc6d2207862d8e4b0282260963a0787 (diff)
parent: f35d305da7728cecb4ea91aed5b9c1fd0ddaebb6 (diff)
download: website-11979c8b9c88c3bb993350898844561ea55dcd06.tar.gz
website-11979c8b9c88c3bb993350898844561ea55dcd06.zip
1 files changed, 216 insertions, 0 deletions
diff --git a/content/en/blog/20221207-loc-id-split.md b/content/en/blog/20221207-loc-id-split.md
new file mode 100644
index 0000000..bad82ac
--- /dev/null
+++ b/content/en/blog/20221207-loc-id-split.md
@@ -0,0 +1,216 @@
+---
+date: 2022-12-07
+title: "Loc/Id split and the Ouroboros network model"
+linkTitle: "On Loc/Id split"
+author: Dimitri Staessens
+---
+
+A few weeks back I had a drink with Thijs who is now doing a master's
+thesis on Loc/Id split, so we dug into the concepts behind Locators
+and Identifiers and see if matches or in any way interferes with the
+Ouroboros network model.
+
+For this, we started from the paper _Locator/Identifier Split
+Networking: A Promising Future Internet Architecture_[^1].
+
+# Loc/Id split?
+
+In a nutshell, Loc/Id split starts from the observation that the
+transport layer (TCP, UDP) is tightly coupled to network (IP)
+addresses via a certain TCP/UDP port.
+
+Assuming our IPv4 local address is 10.10.0.1 /24 and there is an SSH
+server on 10.10.5.253 /24 listening on port 22, after making a
+connection, our client application could be bound to 10.10.0.1 /24 on
+port 25406. If we move our laptop to another room that is on an access
+point in a different subnet, and we receive IP address 10.10.4.7 /24,
+our TCP connection to the SSL server will break.
+
+Loc/Id split suggest to split the "address" into two parts, an
+Identifier that is location-independent and specifies the _who_ at the
+transport layer, and a locator that is location-dependent and
+specifies the _where_ at the network layer. Since an IPv6 address has
+more than enough (128) bits, there's plenty of space to chop it up and
+attach some semantics to the individual pieces.
+
+Of course, after the split, identifiers need to be mapped to locators,
+so there is a mapping system needed to resolve the locator given the
+identifier. This mapping system resides in a Sub-Layer between the
+transport layer and the network layer. If this mapping system sounds a
+lot like DNS to you, then you're right, but then remember that TCP
+doesn't bind to a DNS name + port, but to an IP address + port. That's
+where the issue lies that the Identifier tries to solve.
+
+Resolving the Locator from the Identifier usually happens in the
+end-host, but some Loc/Id split proposals may forward this
+responsibility to other nodes in the network. When only end-hosts
+perfom Id->Loc resolution, it's called a host-based Loc/Id split
+architecture, if some other nodes perform Id->Loc resolution it's
+called a network-based architecture. In a network-based architecture,
+the identifier MUST be part of the packet header (in a host-based
+architecture it's optional), and the network nodes forward towards a
+resolver node based on the identifier and then when the locator is
+known based on the locator towards the end-host. I have my doubts that
+this can ever scale, so in this article, I'll focus on host based
+Loc/Id split. Host-based architectures are summarized in the figure
+below, taken from the survey paper[^1].
+
+{{<figure width="60%" src="/blog/20221207-loc-id.png">}}
+
+My first reaction to seeing that was _sounds about right to me_, it's
+almost identical to what O7s proposes for a fully scalable and
+evolvable architecture. But before I get to that, let's first dig a
+bit deeper into those locators and identifiers. What _are_ these
+beasts?
+
+# Mobility in Loc/Id split
+
+{{<figure width="40%" src="/blog/20221207-loc-id-mobility-1.png">}}
+
+Let's assume the previous example where, from my laptop, I'm connected
+to some SSH server, but this time we're in a Loc/Id split network.  So
+my laptop got a different address for its interface, an identifier,
+say COFF33D00D, and, since I'm in the green network, a locator that is
+conveniently the IPv4 address for my wireless LAN interface,
+10.10.0.1 /24.  The TCP connection in the SSH client is Loc/Id aware,
+and now bound to C0FF33D00D:25406. After connecting to the client at
+008BADF00D, It learns that I'm C0FF33D00D and my locator is 10.10.0.1.
+
+When I move to another floor, the laptop WLAN interface gets a new
+locator, but my identifier stays the same. It's now
+C0FF33D00D:10.10.4.7. The OS is implementing a host-based Loc/Id split
+architecture, so I quickly send a _loc/id update_ message to the
+server at 10.10.5.253 that my locator for C0FF33D00D has changed to
+10.10.4.7, and it updates its mapping. The Loc/Id-aware TCP state
+machine in my laptop had some packet loss to deal with while I was in
+the elevator, but other than that, since it was bound to my identifier
+the connection remains intact.
+
+Nice! Splitting an address into a locator and identifier has a pretty
+elegant solution to mobility.
+
+Notice I didn't give the routers identifiers parts in their
+address? That's on purpose.
+
+Let's take a little thought experiment.
+
+Instead of moving to the other floor, I already have a laptop already
+sitting there. Its WLAN interface has address COFFEEBABE:10.10.4.7.
+
+{{<figure width="40%" src="/blog/20221207-loc-id-mobility-2.png">}}
+
+Now, what I do in this thought experiment, is copy the entire _program
+state_ of my SSH client to that other laptop, _including_ the TCP
+state[^2] and fork it as a new process on the other laptop. What is
+needed to make it work from a network perspective?
+
+Well, like when actually moving with my laptop, I need to update the
+server that my identifier C0FF33D00D has moved to another locator at
+10.10.4.7. That should do the trick, quite easy.
+
+Unless there was already another application connected on port 25406
+on that destination laptop. Then there is no way for the incoming
+laptop to know where to deliver the packets to. Unless the identifier
+is in the packet header. But host-based Loc/Id split had them
+optional?  This seems to hint that host-based Loc/Id split supports
+device mobility but cannot fully support application mobility[^3].
+
+So, what is that identifier actually naming? Well, all that moved was
+the application state, and the identifier seemed to move with
+it... And since the routers in the example don't run "end-host"
+applications, they don't need identifiers.
+
+# What does the Ouroboros model say?
+
+Ouroboros[^4] gives each application process a name, which is mapped
+to an IPCP's address[^5]. The O7s application name basically
+corresponds to the _identifier_, and the IPCPs address maps to the
+_locator_.
+
+{{<figure width="30%" src="/blog/20220228-flm-app.png">}}
+
+Let's compare the architecture of Ouroboros above with the figure at
+the top.
+
+First, the similarities. The Ouroboros model conjectures a split of
+the transport layer into an _application end-to-end layer_ (roughly
+TCP without congestion avoidance) and a network end-to-end layer that
+includes the _flow allocator_.
+
+The _flow allocator_ in O7s performs the name <--> address mapping
+that is similar to id <--> loc mapping. Interesting to note is that in
+O7s, the Flow allocator is present in every IPCP, which is needed for
+Congestion Notifications. Given that identifiers are mapping to
+application names, resolving in name <--> address in other nodes than
+the source, like in network-based Loc/Id split, is not violating the
+O7s architecture. But we haven't considered this as it doesn't look
+feasible from a scalability perspective.
+
+Now, the differences. First, the naming. The "identifier" in Ouroboros
+is a network/globally unique application name[^6]. Processes[^7] can
+be _bound_ to an application name. If a single process binds to an
+application name it's unicast, if multiple processes on the same
+server bind to the same name, it provides per-connection
+load-balancing between these processes. If multiple processes on
+different servers bind to the same name, it provides a form of anycast
+name-based load-balancing.
+
+Second, Ouroboros endpoint identifiers (EIDs) are only known to the
+Flow Allocator at the endpoint and specify the application. The O7s
+EID can be viewed as a combination of the L3 _protocol_ field and the
+L4 _port_ field into a single field that sits in between L3 and L4
+(the Loc/Id proposed sublayer). This allows O7s to allocate a new flow
+(assigning new EIDs) while keeping the connection state in the process
+(FRCP) intact, and thus allowing full application mobility in addition
+to device mobility. Taking another look at the Loc/Id split figure,
+note that Ouroboros splits "network" from "application" just above the
+"Sub-layer", instead of above the "transport layer".
+
+# Wrapping up
+
+The discussions on Loc/Id split were quite interesting. A lot of the
+steps and solutions it proposes are in line with the O7s model. What
+strikes me most is that LoC/Id split is still not very well-defined as
+a _model_. What exactly _are_ identifiers? What exactly _are_
+locators? The thing that sets O7s apart is that the model consists of
+a limited amount of objects (forwarding elements and flooding
+elements, which form Layers[^8], application, process, ...) that have
+well-defined names[^9] that are immutable and exist only for as long
+as the object exists.
+
+
+[^1]: https://doi.org/10.1109/COMST.2017.2728478
+
+[^2]: This is hard to do with TCP state being in the kernel, but let's
+      forget about that and memory addresses and others stuff for a
+      moment and assume the complete application state is a nice
+      containerized package.
+
+[^3]: The Ouroboros model does allow complete application
+      mobility. The problem in this Loc/Id proposal is that the port
+      is still part of the Transport Layer state (see the figure at
+      the start of the post).
+
+[^4]: This, and a lot of other things in O7s, were proposed in the
+      RINA architecture, that's where the credit should go.
+
+[^5]: To be accurate: we hash the application name.
+
+[^6]: At least, for a public Internetwork, they should be globally
+      unique.
+
+[^7]: In O7s, processes are named with a process name (which in the
+      implementation maps to the linux process id (pid). Process names
+      are only local (system) scope.
+
+[^8]: I capitalize Layers, as these Layers that are made up of
+      forwarding elements (unicast Layers) or flooding elements
+      (broadcast Layers) have a different meaning than the layers in
+      the discussion above. Maybe we should call them _strata_ instead
+      of Layers...
+
+[^9]: Synonyms are allowed, but they serve no function in the
+      architecture. As an example, application names are hashed (a
+      synonym) which has practical implications for security and
+      implementation simplicity, but the architecture is theoretically
+      identical without that hash.
+\ No newline at end of file
author	Thijs Paelman <[email protected]>	2022-12-21 17:22:36 +0100
committer	Thijs Paelman <[email protected]>	2022-12-21 17:22:36 +0100
commit	11979c8b9c88c3bb993350898844561ea55dcd06 (patch)
tree	4b6bb1174d38a930b1175ce95c29134be7666ee9 /content/en/blog/20221207-loc-id-split.md
parent	2350da508cc6d2207862d8e4b0282260963a0787 (diff)
parent	f35d305da7728cecb4ea91aed5b9c1fd0ddaebb6 (diff)
download	website-11979c8b9c88c3bb993350898844561ea55dcd06.tar.gz website-11979c8b9c88c3bb993350898844561ea55dcd06.zip