ouroboros - Ouroboros main repository

	Commit message (Collapse)	Author	Age	Files	Lines
*	build: Update copyright to 2021	Dimitri Staessens	2021-01-03	206	-206/+206
\| \| \| \| \| \| \|	Happy New Year, Ouroboros! Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Pass qoscube to ECN marking function	Dimitri Staessens	2020-12-20	8	-3/+13
\| \| \| \| \| \| \| \|	The ECN marking function should be able to use the packet QoS to allow prioritizing traffic under congestion. Not yet implemented in MB-ECN. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Remove unused variable in MB-ECN policy	Dimitri Staessens	2020-12-12	1	-4/+0
\| \| \| \| \| \| \| \|	The t_sent variable is a remnant from the first version and isn't needed anymore. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Pass previous ECN value in congestion API	Dimitri Staessens	2020-12-12	8	-18/+30
\| \| \| \| \| \| \| \|	The previous value of the ECN field should be passed to the congestion notification function. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Fix flow_accept without openssl	Dimitri Staessens	2020-12-12	2	-5/+7
\| \| \| \| \| \| \| \|	DH key creation was returning -ECRYPT if opennssl is not installed, instead of success (0). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Fix return value in function returning void	Dimitri Staessens	2020-12-12	1	-1/+1
\| \| \| \| \| \| \|	This causes builds to fail on systems where OpenSSL is not available. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Use 64-bit flow endpoint IDs for DT	Dimitri Staessens	2020-12-07	6	-54/+117
\| \| \| \| \| \| \| \| \| \| \| \| \|	The EIDs are now 64-bit. This makes it a tad harder to guess them (think of port scanning). The implementation has only the most significant 32 bits random to quickly map EIDs to N+1 flows. While this is equivalent to a random cookie as a check on flows, the rationale is that valid endpoint IDs should be pretty hard to guess (and thus be 64-bit random at least). Ideally one would use content-addressable memory for this kind of mapping. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Fix slow start in multi-bit (F)ECN policy	Dimitri Staessens	2020-12-07	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \|	There is a check not to rapidly double the window to astronomical sizes when there is no congestion experienced for long periods of time, but the if-else logic was botched and it still grew to astronomical sizes (albeit linear instead of exponential). I also lowered the ECN threshold a bit. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Remove DT-FA bypass on receiver side	Dimitri Staessens	2020-12-07	5	-63/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DT will now post all packets for N+1 flows through the flow allocator component. This means that N+1 flows can be monitored through the flow allocator stats, and N-1 flows through the DT stats. The DT component still keeps stats for the local components (FA and DHT), but this can be removed once the DHT has its own RIB output. The flow allocator show statistics for Sent packets: total packets that were presented for sending on this specific flow Send failed: packets that were unable to be sent Received packets: total packets that were presented by the DT component on this specific flow Received failed: packets that were unable to be delivered These stats are presented as both packet counts and byte counts. To know how many were successful, the values for failed need to be subtracted from the values for total. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Fix off-by-one in Multi-bit (F)ECN policy	Dimitri Staessens	2020-12-05	1	-2/+1
\| \| \| \| \| \| \| \|	Noticed an off-by-one in the packet counter because it was incremented before and the byte counter after the flow update. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Add RIB statistics for flow allocator	Dimitri Staessens	2020-12-05	8	-12/+255
\| \| \| \| \| \| \| \| \| \|	The RIB will now show some stats for the flow allocator, including congestion avoidance statistics. This is needed before decoupling the data transfer component and the flow allocator as some current stats show in DT will move to FA. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Simplify multi-bit (F)ECN policy	Dimitri Staessens	2020-12-02	1	-54/+69
\| \| \| \| \| \| \| \| \| \| \| \|	The mb-ecn policy has a couple of divisions in the math, which I wanted to avoid. Now it measures the number of bytes sent in a window, and updates the next window with AIMD logic. If the number of bytes in the window is reached, the call blocks. To avoid long packet bursts, the window size continually scales to contain between CA_MINPS (8) and CA_MAXPS (64) packets. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Don't update for deallocated flows	Dimitri Staessens	2020-12-02	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	The dt component bypasses the flow allocator on the receiver side, and may try to update congestion context when the flow has already been deallocated by the receiver. I will fix this bypass and always pass through the flow allocator sometime soon; for now, I added a check in the flow allocator call to avoid the SEGV. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Fix enrollment for congestion avoidance	Dimitri Staessens	2020-12-02	1	-0/+3
\| \| \| \| \| \| \| \|	The enrollment procedure was not passing the policy for congestion avoidance. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Add congestion avoidance policies	Dimitri Staessens	2020-12-02	21	-140/+957
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds congestion avoidance policies to the unicast IPCP. The default policy is a multi-bit explicit congestion avoidance algorithm based on data-center TCP congestion avoidance (DCTCP) to relay information about the maximum queue depth that packets experienced to the receiver. There's also a "nop" policy to disable congestion avoidance for testing and benchmarking purposes. The (initial) API for congestion avoidance policies is: void * (* ctx_create)(void); void (* ctx_destroy)(void * ctx); These calls create / and or destroy a context for congestion control for a specific flow. Thread-safety of the context is the responsability of the flow allocator (operations on the ctx should be performed under a lock). ca_wnd_t (* ctx_update_snd)(void * ctx, size_t len); This is the sender call to update the context, and should be called for every packet that is sent on the flow. The len parameter in this API is the packet length, which allows calculating the bandwidth. It returns an opaque union type that is used for the call to check/wait if the congestion window is open or closed (and allowing to release locks before waiting). bool (* ctx_update_rcv)(void * ctx, size_t len, uint8_t ecn, uint16_t * ece); This is the call to update the flow congestion context on the receiver side. It should be called for every received packet. It gets the ecn value from the packet and its length, and returns the ECE (explicit congestion experienced) value to be sent to the sender in case of congestion. The boolean returned signals whether or not a congestion update needs to be sent. void (* ctx_update_ece)(void * ctx, uint16_t ece); This is the call for the sending side top update the context when it receives an ECE update from the receiver. void (* wnd_wait)(ca_wnd_t wnd); This is a (blocking) call that waits for the congestion window to clear. It should be stateless (to avoid waiting under locks). This may change later on if passing the context is needed for different algorithms. uint8_t (* calc_ecn)(int fd, size_t len); This is the call that intermediate IPCPs(routers) should use to update the ECN field on passing packets. The multi-bit ECN policy bases the value for the ECN field on the depth of the rbuff queue packets will be sent on. I created another call to grab the queue depth as fccntl is write-locking the application. We can further optimize this to avoid most locking on the rbuff. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	tools: Default ocbr to sleep and add --spin option	Dimitri Staessens	2020-12-02	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	The ocbr client was spinning the CPU by default, which made sense on lab servers with dual xeons, but not so much for average users. Now sleeping becomes the default. Busy waiting can be enabled using --spin if needed. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	tools: Use read timeouts in ocbr server	Dimitri Staessens	2020-11-25	1	-1/+3
\| \| \| \| \| \| \| \| \|	The ocbr server was using non-blocking reads (probably because we didn't have read timeouts when we wrote it) and was using a whole CPU core per thread. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Reduce timerwheel CPU consumption	Dimitri Staessens	2020-11-25	2	-1/+10
\| \| \| \| \| \| \| \| \| \| \|	The timerwheel is checked during IPC calls (fevent, flow_read), causing huge load on CPU consumption in IPCPs, since they have a lot of fevent() threads for QoS. The timerwheel will need further optimization), but for now I reduced the default tick time to 5 ms and added a boolean to check that the wheel is actually used. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	irmd: Fix data race in flow allocation	Dimitri Staessens	2020-11-25	1	-73/+100
\| \| \| \| \| \| \| \| \| \| \|	The flow information in the main loop is passed as a direct pointer to an irm_flow object in the flow database. This was (probably) not really an issue due to how the flow allocation operations work, but the thread sanitizer was barfing a lot of (correct) data race errors when running bigger tests, so now makes a safe copy of the data. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Don't default to lockless rbuff	Dimitri Staessens	2020-11-22	1	-1/+1
\| \| \| \| \| \| \| \|	I mistakenly set the default to the (buggy) lockless rbuff implementation instead of the pthread one in commit 3aec660e. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	Revert "lib: Unmount stale RIB directories"	Sander Vrijders	2020-11-11	1	-10/+1
\| \| \| \| \| \| \| \|	This reverts commit 978266fe4beba21292daad2d341fe5ff22e08aba. We were incorrectly unmounting the directory under normal conditions. Signed-off-by: Sander Vrijders <[email protected]> Signed-off-by: Dimitri Staessens <[email protected]>
*	ipcpd: Refactor DT component	Dimitri Staessens	2020-11-11	1	-76/+38
\| \| \| \| \| \| \|	The flow stats had quite a lot of duplication. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Add Rendez-Vous mechanism for flow control	Dimitri Staessens	2020-10-11	3	-31/+130
\| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the rendez-vous mechanism to handle the case where the sending window is closed and window updates get lost. If the sending window is closed, the sender side will send an RDVS every DELT_RDV time (100ms), and give up after MAX_RDV time (1 second). Upon reception of a RDVS packet, a window update is sent immediately. We can make this much more configurable later on (build options for defaults, fccntl for runtime tuning). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Block on closed flow control window	Dimitri Staessens	2020-10-11	2	-18/+129
\| \| \| \| \| \| \| \| \| \| \|	If the sending window for flow control is closed, the sending application will now block until the window opens. Beware that until the rendez-vous mechanism is implemented, shutting down a server while the client is sending (with non-timed-out blocking write) will cause the client to hang indefinitely because its window will close. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Refactor flow_write	Dimitri Staessens	2020-10-11	1	-20/+11
\| \| \| \| \| \| \|	Refactor flow_write cleanup. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Send and receive window updates	Dimitri Staessens	2020-10-11	4	-12/+41
\| \| \| \| \| \| \| \| \|	This adds sending and receiving window updates for flow control. I used the 8 pad bits as part of the window update field, so it's 24 bits, allowing for ~16 million packets in flight. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	tools: Fix error handling in oping write thread	Dimitri Staessens	2020-10-11	1	-3/+0
\| \| \| \| \| \| \| \| \|	The function was returning under a cleanup handler, which is not allowed. We don't do anything with the return value if the write thread ends, so just stopping the thread is fine. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Fix condition variable in flow allocator	Dimitri Staessens	2020-10-11	1	-2/+13
\| \| \| \| \| \| \| \|	The condition variable was not initialized correctly and using the wrong clock for pthread_cond_timedwait. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Add compiler configuration for FRCP	Dimitri Staessens	2020-10-11	5	-93/+153
\| \| \| \| \| \| \| \| \| \| \| \| \|	This allows configuring some parameters for FRCP at compile time, such as default values for Delta-t and configuration of the timerwheel. The timerwheel will now reschedule when it fails to create a packet, instead of setting the flow down immediately. Some new things added are options to store packets for retransmission on the heap, and using non-blocking calls for retransmission. The defaults do not change the current behaviour. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Fix locking for FRCT	Dimitri Staessens	2020-09-26	1	-2/+4
\| \| \| \| \| \| \| \|	Flows should be locked when moving the timerwheel. For frcti_snd, a rdlock is enough. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Improve locking np1_flow_set in eth and udp	Dimitri Staessens	2020-09-26	2	-13/+10
\| \| \| \| \| \| \|	A flow_set is thread-safe and doesn't need to be protected by a lock. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Fix assert in dht	Dimitri Staessens	2020-09-26	1	-1/+1
\| \| \| \| \| \| \|	Fix assignment instead of comparison operator. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	tools: Improve locking in oping server	Dimitri Staessens	2020-09-25	1	-9/+15
\| \| \| \| \| \| \| \| \|	There was a dealloc() call in oping server under mutex, which could leave that mutex locked when the thread was cancelled, causing oping to hang on exit. This avoids calling dealloc under lock. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Complete retransmission logic	Dimitri Staessens	2020-09-25	9	-399/+693
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This completes the retransmission (automated repeat-request, ARQ) logic, sending (delayed) ACK messages when needed. On deallocation, flows will ACK try to retransmit any remaining unacknowledged messages (unless the FRCTFLINGER flag is turned off; this is on by default). Applications can safely shut down as soon as everything is ACK'd (i.e. the current Delta-t run is done). The activity timeout is now passed to the IPCP for it to sleep before completing deallocation (and releasing the flow_id). That should be moved to the IRMd in due time. The timerwheel is revised to be multi-level to reduce memory consumption. The resolution bumps by a factor of 1 << RXMQ_BUMP (16) and each level has RXMQ_SLOTS (1 << 8) slots. The lowest level has a resolution of (1 << RXMQ_RES) (20) ns, which is roughly a millisecond. Currently, 3 levels are defined, so the largest delay we can schedule at each level is: Level 0: 256ms Level 1: 4s Level 2: about a minute. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Allow pure acknowledgment packets in FRCT	Dimitri Staessens	2020-06-06	4	-152/+298
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the logic to send a pure acknowledgment packet without any data to send. This needed the event filter for the fqueue, as these non-data packets should not trigger application PKT events. The default timeout is now 10ms, until we have FRCP tuning as part of fccntl. Karn's algorithm seems to be very unstable with low (sub-ms) RTT estimates. Doubling RTO (every RTO) seems still too slow to prevent rtx storms when the measured rtt suddenly spikes several orders of magnitude. Just assuming the ACK'd packet is the last one transmitted seems to be a lot more stable. It can lead to temporary underestimation, but this is not a throughput-killer in FRCP. Changes most time units to nanoseconds for faster computation. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Check rdrbuff sanitize for robust mutexes	Dimitri Staessens	2020-05-29	1	-0/+2
\| \| \| \| \| \| \| \|	The sanitize function in the rdrbuff should only be compiled if robust mutexes are present on the system. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Fix leak and unitialized variable in DHT	Dimitri Staessens	2020-05-29	1	-1/+4
\| \| \| \| \| \| \| \|	There were some issues identified by the Clang static analyzer that are now fixed. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Fix bad index in packet scheduler	Dimitri Staessens	2020-05-29	1	-1/+1
\| \| \| \| \| \| \| \|	GCC 10 static analyzer found that the wrong index was used in the fail path of psched_create, causing double (multiple) frees. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	build: Add support for GCC 100.17.5	Dimitri Staessens	2020-05-23	13	-8/+21
\| \| \| \| \| \| \| \| \|	GCC 10 defaults to -fno-common, so some variables that were defined in the headers needed to be declared "extern". The GCC 10 static analyzer can now be invoked using the DebugAnalyzer build option. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Refactor FRCT	Dimitri Staessens	2020-05-04	2	-72/+59
\| \| \| \| \| \| \| \| \| \|	This is a small refactor of FRCT because I found some things a bit hard to read. I tried to refactor frcti_rcv to always queue the packet, but that causes unnecessarily retaking the lock when calling queued_pdu and thus returning idx is a tiny bit faster. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	ipcpd: Remove some unused variables0.17.4	Dimitri Staessens	2020-05-02	3	-4/+3
\| \| \| \| \| \| \|	The compiler spotted some variables that weren't really used. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	build: Set implicit fallthrough warning level 4	Dimitri Staessens	2020-05-02	1	-1/+1
\| \| \| \| \| \| \| \|	GCC 9.3.0 started complaining despite the /* FALLTHRU */ comments. Apparently this changed level. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Fix retransmission DRF update	Dimitri Staessens	2020-05-02	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	The retransmission was always disabling the DRF flag. This caused problems with the loss of the first packet, which of course needs a DRF flag set. The retransmitted packet will now contain a the original DRF flag and an updated ack number. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Create an rxmwheel per flow	Dimitri Staessens	2020-05-02	4	-128/+149
\| \| \| \| \| \| \| \| \|	The single retransmission wheel caused locking headaches as the calls for different flows could block on the same rxmwheel. This stabilizes the stack, but if the rdrbuff gets full there can now be big delays. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	tools: Stop oping client cleanly on bad write	Dimitri Staessens	2020-05-02	1	-0/+1
\| \| \| \| \| \| \| \|	On a bad write, the writer thread would shutdown, leaving the client hanging. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	doc: Add missing ecmp option	Dimitri Staessens	2020-05-01	1	-3/+10
\| \| \| \| \| \| \| \|	The equal-cost multipath option wasn't mentioned in the Ouroboros man page. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Fix updating retransmission wheel	Dimitri Staessens	2020-05-01	4	-23/+28
\| \| \| \| \| \| \| \| \| \|	Fixes infinite rescheduling with RTO getting lower than the timerwheel resolution. For very low RTO values we'd need a big packet buffer with the current memory allocator implementation (rdrbuff). Setting a (configurable) minimum RTO (250 us) reduces this need. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	lib: Unmount stale RIB directories	Sander Vrijders	2020-04-30	1	-1/+10
\| \| \| \| \| \| \| \|	If Ouroboros crashed, the RIB directory might still be mounted. This checks if this is the case, then unmounts it. Signed-off-by: Sander Vrijders <[email protected]> Signed-off-by: Dimitri Staessens <[email protected]>
*	lib: Stabilize FRCP under packet loss conditions0.17.3	Dimitri Staessens	2020-04-30	3	-60/+69
\| \| \| \| \| \| \| \| \| \| \|	There were a bunch of bugs in FRCP that urgently needed fixing. Now data QoS is usable even with heavy packet loss (within some parameters). The current RTT estimator is the IETF one. It should be updated to the improved one used in the Linux kernel once the A-timer (ACKs without data) and graceful shutdown are implemented. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
*	irmd: Don't always send pub key in alloc response0.17.2	Dimitri Staessens	2020-03-30	2	-1/+6
\| \| \| \| \| \| \| \|	The allocation response was always containing an ECDHE key, which is not needed if the client doesn't request an encrypted flow. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>