diff options
author | Dimitri Staessens <[email protected]> | 2020-01-17 17:43:20 +0100 |
---|---|---|
committer | Dimitri Staessens <[email protected]> | 2020-01-17 17:43:20 +0100 |
commit | 760d148adbce24f42cbaff34a0794aef900d7d6d (patch) | |
tree | 5843f8035f914a8e5d539a76f8f3a3c5e221a798 /content/en/docs/Concepts/fa.md | |
parent | d3fc1fccf000f86e81d93c60d5dca28f3163c0ed (diff) | |
download | website-760d148adbce24f42cbaff34a0794aef900d7d6d.tar.gz website-760d148adbce24f42cbaff34a0794aef900d7d6d.zip |
content: Add flow allocator description
Diffstat (limited to 'content/en/docs/Concepts/fa.md')
-rw-r--r-- | content/en/docs/Concepts/fa.md | 221 |
1 files changed, 221 insertions, 0 deletions
diff --git a/content/en/docs/Concepts/fa.md b/content/en/docs/Concepts/fa.md new file mode 100644 index 0000000..7e94d1c --- /dev/null +++ b/content/en/docs/Concepts/fa.md @@ -0,0 +1,221 @@ +--- +title: "Flow allocation" +author: "Dimitri Staessens" +#description: protocols +date: 2020-01-17 +weight: 30 +draft: false +description: > + The most important concept in Ouroboros +--- + +Arguably the most important concept to grasp in Ouroboros is flow +allocation.[^1] It is the process by which a pair of programs agree to +start sending and receiving data. A flow is always unicast, thus +between a source program and a destination program, and is always +established from the source. Flows are provided by unicast layers, and +the endpoints of the flows are accessible for reading and writing by +the requesting processes using an identifier called a _flow +descriptor_. Think of a file descriptor but just for Ouroboros flows. +Maybe one important thing to keep in mind: in Ouroboros terminology, a +flow does not imply ordering or reliable transfer. It just denotes the +network resources inside a layer that are needed for forwarding +packets from a source to a destination in a best effort way. + +{{<figure width="60%" src="/docs/concepts/fa_1.jpg">}} + +The figure above gives an example. There are 2 systems, and each +system has an Ouroboros IRMd and a unicast IPCP. These IPCPs work +together to create a logical "layer". System 1 runs a "client" +program, System 2 runs a "server" program. + +We are going to explain in some detail the steps that Ourobros takes +to establish a flow between the "client" and "server" program so they +can communicate. + +The three subcomponents inside the IPCP that are of interest to us are +the Directory (DIR), the Flow Allocator (FA) and the Data Transfer +component (DT). + +The DT component is at the heart of the network functionality in the +layer. It is a protocol machine responsible for forwarding packets and +maintains a forwarding table that maps destination addresses to lower +layer flows. [^2] The name of the DT is what is generally considered the +"address" of the IPCP. [^3] In the example, IPCP 1 has address 720, +and IPCP 2 address 1000. If DT 720 receives a packet for DT 1000, it +will know how to forward it to 1000 and vice versa. I will not go into +the details of how routing information is distributed, suffice to say +it's similar in operation to the IS-IS protocol. The only other thing +that is of current interest is the protocol format of the DT +component. The DT protocol has [5 fields](../protocols) [^4]: + +``` +DST | TTL | QoS | ECN | EID | +``` + +To understand the flow allocation procedure, we need to consider only +2 of these fields, the destination address (DST), and the endpoint ID +(EID). I will denote the relevant packet header information for DT in +the format __DST:EID__ So, __1000:78__ would indicate a packet +destined for IPCP 2 with EID 78. EID's are a bit like a tcp port, but +they are not well-known (i.e. there is no IANA in Ouroboros). The flow +allocation process will assign the EIDs. + +The directory (DIR) component keeps a mapping of registered hashes to +DT names (addresses). For the server application to be reachable over +a layer, the DIR component in its IPCPs will have to know this +mapping. In our example, the server, which is named server is known by +the layer to be at location 1000. The interface to register a name is +actually using hashes, so "server" is hashed (by default an SHA3-256 +hash) to _d19778d2_[^5] and a mapping (_d19778d2_, 1000) is kept in the +directory. The default implementation for the DIR component in the +Ouroboros IPCP is a Distributed Hash Table (DHT) based on the Kademlia +protocol. + +The third subcomponent in the IPCP that is relevant here -- the most +important one -- is the Flow Allocator (FA). This component is +responsible for implementing the requested flows, in our case between +"client" and "server". It needs to establish some shared state between +the two endpoints. A (bidirectional) flow is fully identified in a +layer by a 4-tuple (A1,X,A2,Y) containing two addresses and two EIDs, +in our example A1=720 and A2=1000). This 4-tuple needs to be known at +both endpoints to identify where to send the packets it receives from +the higher-layer application (the client), and to deliver packets that +it reads from a lower layer flow. The flow allocation protocol is +responsible to send this information. It is a request-response +protocol. The flow allocator is identified by the DT component as EID +0. So, all packets in the layer with DT header __DST:0__ are delivered +to the flow allocator inside the destintation IPCP. + +When the source FA in IPCP 1 receives a request for a flow to +"server", it will query its DIR for _d197782_ and receive 1000 as the +response and it will generate an EID (X) for the flow. Let's assume +X=75. The flow allocation request protocol message from FA 1 to FA 2 +looks like __1000:0:REQ:720:75:d19778d2__, and when FA 2 received this +message, it will generate its EID, let's say 81 and send the following +response to FA 1: __720:0:RESP:75:81__. REQ and RESP are internal +codes to identify a request and reponse (0 and 1 respectively). From +this small exchange both flow allocators can now identify the flow. + +Finally, there is the IRMd in each system. The IRMd should be seen as +part of the operating system. One of its tasks is to map process IDs +(PIDs) of a process to names. In our example above, the IRMd in System +two will have a mapping that maps _d19778d2_ to the PID of +"server". When the "server" program calls the Ouroboros +_flow\_accept()_ routine, the IRMd knows that when there is an +incoming flow allocation request, the "server" process can handle +it. Populating this mapping in the IRMd is a process we call _binding_ +a name to a process. + +Let's now go step-by-step through the full flow allocation process in +the example above. + +{{<figure width="60%" src="/docs/concepts/fa_2.jpg">}} + +The first few steps are shown in the figure above. The client +application requests a flow to "server" to the Ouroboros IRMd using +the _flow\_alloc()_ call __(1)__. Now the IRMd will ask the layers in +the system if they know that name "server", indirectly by using the +SHA3-256 hash, _d1977d2_ __(2)__. The hash algorithm that a layer uses +is configurable, and the IRMd is informed of the hash algorithm to use +when an IPCP joins a layer (at bootstrap or enrollment). In our case, +the layer shown will respond to the query with "True" __(3)__, (multiple +layers can respond true, and then the IRMd will choose one, usually +the "lowest" in rank). Note that the results of these queries can be +cached locally in the IRMd to speed up the process. + +So, now that the IRMd knows that the layer in the figure knows the +destination program, it can send a flow allocation request to the +layer. But first, it will start creating some local resources: the +flow endpoint, indicated by a flow_id (FID) __(4)__. It contains a set of +ring-buffers in shared memory that contain pointer information on +where to read/write the next packet. The FID will be in _PENDING_ +state [^6]. + +{{<figure width="60%" src="/docs/concepts/fa_3.jpg">}} + +When the FID resources are ready, the IRMd sends a _FLOW\_ALLOC_ to +the IPCP with the pending FID as endpoint __(5)__. The FA in IPCP 1 +will create a _flow descriptor_ for this flow [^7], let's say 75 +__(6)__. All packets that are written by the IPCP to fd 75 can be read +from FID 9. Now, a couple of paragraphs ago I mentioned that the FA +will generate an EID for the flow. In the implementation, the EID for +the flow equals the fd. So packets coming from within the layer with +EID 75 will be written to this flow. + +This is the point where the FA will do the flow allocation protocol +exchange already described above. The destination hash is resolved +from the directory to the destination IPCP address, 1000, and the +following flow allocation request message is sent over the DT +component to the destination IPCP: __1000:0:REQ:720:75:d19778d2__ __(7)__. + +{{<figure width="60%" src="/docs/concepts/fa_4.jpg">}} + +We can now turn our attention to System 2, which receives this request +message on IPCP 2. The DT header contains __1000:0__ which has the +correct address (1000) and EID 0, which indicates the packet should be +delivered to the flow allocator. So the FA interprets the following +information from the received packet: There is a flow allocation +request for the hash _d19778d2_ coming from source 720 on remote EID +75. + +Now it send a _FLOW\_ALLOC\_REQ()_ message to the IRMd __(8)__. The +IRMd has in its process table an entry that says that there is a +process that listens to this hash. It will create a flow endpoint, for +instance with FID=16 __(9)__ and respond to the IPCPd that the flow is +accepted with FID=16 __(10)__. The _flow\_accept()_ call on the server +side will return with an fd=71 that points to the FID 9. From this +point on, the server can use the flow __(11)__. + +The flow allocator in IPCP 2 can now complete its enpoint +configuration. It will create a mapping [S_EID -> R_ADDR, R_EID], in +this case [81 -> 720, 75]. So all packets that it reads from EID 81 +will get a header __720:75__ from the DT component __(12)__. It will +now complete the flow allocation protocol and send a response message +that flow allocation succeeded. The contents of this message is +__720:0:RES:75:81__ __(13)__. This concludes all operations on the +server side. + +{{<figure width="60%" src="/docs/concepts/fa_5.jpg">}} + +Back to the client. The FA in System 1 receives the packet, and from +EID 0 knows it is for the flow allocator, which gets its last piece of +information: the remote EID for the flow, 81. It can now create its +own mapping, [75->1000, 81] __(14)__ and respond to its IRMd that the +flow is created __(15)__. The IRMd will change the state of the flow +from _PENDING_ to _ALLOCATED_ __(16)__ and the _flow\_accept()_ call +on the client program will return with an FD for the flow. The flow is +now allocated. + +So, from now on, communication between the server and the client is +pretty straightforward. Data is written to some shared memory in an +buffer that allows for some space to prepend headers and append +CRCs. To avoid memory copies, pointers to these locations are passed +over the ringbuffers in the flow endpoints to the IPCP, which reads +the pointers, adds headers in the right location, and then uses the +same procedure to pass it onto the next layer towards the destination. +The translation of the header is an O(1) lookup on the send side, and +a nop on the receiver side (since FD == EID and it's passed in the +packet). + +[^1]: This concept is also present in RINA, but there are +differences. This only applies to Ouroboros. + +[^2]: This is a recursive network, adjancencies in layer N are +implemented as flows in layer N - 1. + +[^3]: If there is one DT, it is what is usually considered a "flat" +address. More complex addressing schemes are accomplished by having +more of these DT components inside one IPCP. But this would lead us +too far. + +[^4]: I will explain QoS in a different post. + +[^5]: In full: +d19778d2e34a1e3ddfc04b48c94152cced725d741756b131543616d20f250f31. + +[^6]: Note that the _flow\_alloc()_ call __1__ is currently +blocking. Asynchronous allocation implementation is on the TODO list. + +[^7]: All this mapping of fd's is done by the library that is used by +all Ouroboros programs.
\ No newline at end of file |