summaryrefslogtreecommitdiff
path: root/src/lib/frct.c
Commit message (Collapse)AuthorAgeFilesLines
* build: Update copyright to 2022Dimitri Staessens2022-04-031-1/+1
| | | | | | | Growing pains. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix delayed ACK under high loadDimitri Staessens2022-04-031-3/+9
| | | | | | | | | | The delayed ACK was wrongly measuring the delay against the receiver activity instead of the sender activity. Also fixed receiver activity not being updated for non-data packets (and duplicates and other dropped traffic). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Add support for Linux RTT estimatorDimitri Staessens2022-04-031-2/+7
| | | | | | | | | | | | | This adds the option to use the Round-Trip-Time (RTT) estimation algorithm as it is implemented in the TCP implementation in Linux. It looks like it outperforms the TCP default algorithm, so I enabled this one by default. Also adds the option to change the RTO timeout calculation to include more (or less) than 4 times the mdev (specified as a power of 2. Left the default value to 2 (so, 4 mdevs), but 3 (8 mdevs) gives better results in my tests. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix timing of delayed ACKsDimitri Staessens2022-04-011-5/+25
| | | | | | | | | | Delayed ACKs are now sent after twice the internal tick time. Fixes initial ACK record (rcv_cr.seqno) being uninitialized (0) when the first ACK was to be sent. Adds some FRCT metrics for number of received delayed (bare) ACKs and the RTT estimator. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix unidirectional FRCT traffic handlingDimitri Staessens2022-03-301-7/+9
| | | | | | | | | | | | | Unidirectional traffic has one of the peers only send bare FRCT packets. These never set a DRF, since they have no sequence number. At the receiver, all these ACKs and window updates were always dropped as the receiver connection record was timed out. Also fixes a SEGV if flow control kicks in (passing NULL timeout to pthread_cond_timedwait). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix filtering encrypted packetsDimitri Staessens2022-03-301-52/+0
| | | | | | | | | | | | The frcti_filter was reading raw data from the buffers, causing the frcti_rcv to operate directly on encrypted packets. It decrypt and filter for invalid packets. I moved the function from frct to the fqueue implementation and renamed it fqueue_filter as it filters fqueues. Should be extended to filter out keepalives on non-FRCT flows, as these will now still cause spurious wakeups. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Move timerwheel processing to its own threadDimitri Staessens2022-03-301-8/+0
| | | | | | | | | | | | | | | This is the first step moving away from scheduling the FRCT and flow monitoring functions as part of the IPC calls (flow_read / flow_write / fevent) and towards the more scalable (and far less complicated) implementation to take care of these functions in separate threads. If a process creates the first flow that requires FRCT, it will spin up a thread to process events on the timerwheel (retransmissions and delayed ACKs). This single thread lives until the last flow with FRCT is deallocated. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Refactor reading packet from rbuffDimitri Staessens2022-03-301-1/+1
| | | | | | | | | | | | Reading packets from the rbuff and checking their validity (non-zero size, pass crc check, pass decryption) is now extracted into a function. Also adds a function to get the length of an sdu_du_buff instead of subtracting the tail and head pointers. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Non-configurable delayed acks in FRCPDimitri Staessens2022-03-301-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It doesn't really make sense to manually and one-sidedly configure the timeout of delayed acknowledgements, as setting it too high upsets the peer's sRTT estimates. Even worse, it also causes a lot of spurious retransmissions if it exceeds the sRTT mean deviation calculated by the receiver. Compensating on bare acknowledgment for the ack delay could improve the RTT estimate deviation, but not the spurious retransmissions if it was set too high. This sets the delayed ack to wait for a single RTT mean deviation. Probably needs more tweaking to further reduce differences between the RTT estimates at the sender and receiver, e.g. compensate the RTT estimate for delayed acks, or increase the RTO to add 8 mdevs to sRTT instead of 4. However, it looks like the mdev estimate is the trickiest one to get to sync, not the RTT average. Linux reduces the sample weight for mdev from 1/4 to 1/32 in some cases, will give that a shot some day too to see if that further align sRTT estimates. In any case, this patch already improves things a lot. Also fixes a bug where the sender was sending acknowlegments on the first packets in flight for the 0 sequence number. The receiver activity was measured in seconds but compared to a timeout value in nanoseconds. There's still a lot of spurious retransmissions that start after actual packet loss occurs, I'm still investigating what causes it. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Expose flow control metrics to RIB0.19.1Dimitri Staessens2022-03-161-12/+46
| | | | | | | | | | This exposes some additional metrics relating to FRCT / Flow control: the number of duplicate packets received, number of packets received out of the flow control window and / or reordering queue, and the number of rendez-vous messages sent. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix buffer allocation when retransmitting0.19.0Dimitri Staessens2022-03-111-5/+15
| | | | | | | | | | | | | | | The timerwheel was retransmitting packets and the error check for negative values of the rbuff allocation was instead checking for non-zero values, causing a buffer allocation to succeed but the program to continue down the unhappy path leaving that packet stuck in the buffer unattended. Also fixes wrongly scheduled retransmissions that cause packet storms. FRCP is much more stable now. Still needs some work for high bandwidth-delay products (fast-retransmit). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Pass Delta-t params to frcti_create()Dimitri Staessens2022-03-081-7/+7
| | | | | | | | | The parameters were set directly from the build configs. A first step to making FRCP configurable at runtime, is to pass the parameters to the frcti_create() function. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix RTT estimator invocation in FRCTDimitri Staessens2022-03-031-1/+1
| | | | | | | The notorious off-by-one hit again. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix lock reversal in timerwheelDimitri Staessens2022-03-031-5/+0
| | | | | | | | | There was a lock reversal in the timerwheel. There still is a thorough revision needed of the locking in dev.c after the FRCP logic is completed and tuned. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Encrypt bare FRCP messages on encrypted flowsDimitri Staessens2022-03-031-28/+19
| | | | | | | | Bare FRCP messages (ACKs without data, Rendez-vous packets) were not encrypted on encrypted flows, causing the receiver to fail decryption. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Add initial flow liveness monitoringDimitri Staessens2022-02-241-0/+1
| | | | | | | | | | | | | | | | | | This adds flow liveness monitoring for flows, with a fixed timeout of 120s. I will make it configurable at flow allocation later on (timeout needs to be communicated to the peer). If one peer dies, or doesn't call any IPC calls (flow_write/flow_read/fevent) it will stop sending keepalives and the other peer's read/writes will error on an -EFLOWDOWN after the timeout expires. Packets without a payload (0 length packets) are interpreted as keepalive packets for the flow. They can be sent from any application, but they will not trigger a message read at the receiver side (0 as a return value on flow_read indicates a previous partial read has completed at exactly the buffer size). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* ipcpd: Fix free in fail path of readdirDimitri Staessens2022-02-171-0/+9
| | | | | | | | | The free of the buffer in the failure path of the readdir RIB functions was taking the wrong pointer in a couple of places. The FRCT RIB readdir was missing error handling for malloc and strdup. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Add missing rwlock unlock in FRCTDimitri Staessens2021-12-221-2/+4
| | | | | | | There was a missing unlock in FRCT. Also fixes some indentation. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix flow dealloc after expired FRCT timeoutDimitri Staessens2021-12-221-0/+1
| | | | | | | | | | If the timeout is already expired, the wait variable would be negative and return a negative value for the __frcti_dealloc function, thinking that the timeout was not expired causing an unnecessary wait even if all packets are acknowledged. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Set initial sender rwe to sender seqnoDimitri Staessens2021-12-221-1/+1
| | | | | | | | | | | The initial sender right window edge (indicating acknowledged packet sequence number) was initialized to seqno - 1. This should be the same as seqno, since we acknowledge with the next expected sequence number. It also indicates that a flow without traffic has no outstanding acknowledgements. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Application RIB with FRCT statisticsDimitri Staessens2021-06-301-12/+129
| | | | | | | | | | Application flows can now be monitored from the RIB, exposing FRCT statistics (window edges, retransmission timeout, rtt estimate, etc). Application RIB requires user permissions to be able to access /dev/fuse. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib, ipcpd, irmd: Wrap pthread unlocks for cleanupDimitri Staessens2021-06-231-2/+1
| | | | | | | | | | | | This add an ouroboros/pthread.h header that wraps the pthread_..._unlock() functions for cleanup using pthread_cleanup_push() as this casting is not safe (and there were definitely bad casts in the code). The close() function is now also wrapped for cleanup in ouroboros/sockets.h. This allows enabling more compiler checks. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* build: Update email addressesDimitri Staessens2021-01-031-2/+2
| | | | | | | | The ugent email addresses are shut down, updated to Ouroboros mail addresses. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* build: Update copyright to 2021Dimitri Staessens2021-01-031-1/+1
| | | | | | | Happy New Year, Ouroboros! Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Add Rendez-Vous mechanism for flow controlDimitri Staessens2020-10-111-28/+128
| | | | | | | | | | | | | This adds the rendez-vous mechanism to handle the case where the sending window is closed and window updates get lost. If the sending window is closed, the sender side will send an RDVS every DELT_RDV time (100ms), and give up after MAX_RDV time (1 second). Upon reception of a RDVS packet, a window update is sent immediately. We can make this much more configurable later on (build options for defaults, fccntl for runtime tuning). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Block on closed flow control windowDimitri Staessens2020-10-111-5/+104
| | | | | | | | | | | If the sending window for flow control is closed, the sending application will now block until the window opens. Beware that until the rendez-vous mechanism is implemented, shutting down a server while the client is sending (with non-timed-out blocking write) will cause the client to hang indefinitely because its window will close. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Send and receive window updatesDimitri Staessens2020-10-111-9/+35
| | | | | | | | | This adds sending and receiving window updates for flow control. I used the 8 pad bits as part of the window update field, so it's 24 bits, allowing for ~16 million packets in flight. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Add compiler configuration for FRCPDimitri Staessens2020-10-111-16/+16
| | | | | | | | | | | | | This allows configuring some parameters for FRCP at compile time, such as default values for Delta-t and configuration of the timerwheel. The timerwheel will now reschedule when it fails to create a packet, instead of setting the flow down immediately. Some new things added are options to store packets for retransmission on the heap, and using non-blocking calls for retransmission. The defaults do not change the current behaviour. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Complete retransmission logicDimitri Staessens2020-09-251-97/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | This completes the retransmission (automated repeat-request, ARQ) logic, sending (delayed) ACK messages when needed. On deallocation, flows will ACK try to retransmit any remaining unacknowledged messages (unless the FRCTFLINGER flag is turned off; this is on by default). Applications can safely shut down as soon as everything is ACK'd (i.e. the current Delta-t run is done). The activity timeout is now passed to the IPCP for it to sleep before completing deallocation (and releasing the flow_id). That should be moved to the IRMd in due time. The timerwheel is revised to be multi-level to reduce memory consumption. The resolution bumps by a factor of 1 << RXMQ_BUMP (16) and each level has RXMQ_SLOTS (1 << 8) slots. The lowest level has a resolution of (1 << RXMQ_RES) (20) ns, which is roughly a millisecond. Currently, 3 levels are defined, so the largest delay we can schedule at each level is: Level 0: 256ms Level 1: 4s Level 2: about a minute. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Allow pure acknowledgment packets in FRCTDimitri Staessens2020-06-061-96/+230
| | | | | | | | | | | | | | | | | | | | This adds the logic to send a pure acknowledgment packet without any data to send. This needed the event filter for the fqueue, as these non-data packets should not trigger application PKT events. The default timeout is now 10ms, until we have FRCP tuning as part of fccntl. Karn's algorithm seems to be very unstable with low (sub-ms) RTT estimates. Doubling RTO (every RTO) seems still too slow to prevent rtx storms when the measured rtt suddenly spikes several orders of magnitude. Just assuming the ACK'd packet is the last one transmitted seems to be a lot more stable. It can lead to temporary underestimation, but this is not a throughput-killer in FRCP. Changes most time units to nanoseconds for faster computation. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Refactor FRCTDimitri Staessens2020-05-041-39/+33
| | | | | | | | | | This is a small refactor of FRCT because I found some things a bit hard to read. I tried to refactor frcti_rcv to always queue the packet, but that causes unnecessarily retaking the lock when calling queued_pdu and thus returning idx is a tiny bit faster. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Create an rxmwheel per flowDimitri Staessens2020-05-021-37/+49
| | | | | | | | | The single retransmission wheel caused locking headaches as the calls for different flows could block on the same rxmwheel. This stabilizes the stack, but if the rdrbuff gets full there can now be big delays. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix updating retransmission wheelDimitri Staessens2020-05-011-6/+6
| | | | | | | | | | Fixes infinite rescheduling with RTO getting lower than the timerwheel resolution. For very low RTO values we'd need a big packet buffer with the current memory allocator implementation (rdrbuff). Setting a (configurable) minimum RTO (250 us) reduces this need. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Stabilize FRCP under packet loss conditions0.17.3Dimitri Staessens2020-04-301-44/+47
| | | | | | | | | | | There were a bunch of bugs in FRCP that urgently needed fixing. Now data QoS is usable even with heavy packet loss (within some parameters). The current RTT estimator is the IETF one. It should be updated to the improved one used in the Linux kernel once the A-timer (ACKs without data) and graceful shutdown are implemented. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib, ipcpd: piggyback ECDHE on flow allocationDimitri Staessens2020-02-251-1/+1
| | | | | | | | | | | The initial implementation for the ECDHE key exchange was doing the key exchange after a flow was established. The public keys are now sent allowg on the flow allocation messages, so that an encrypted tunnel can be created within 1 RTT. The flow allocation steps had to be extended to pass the opaque data ('piggybacking'). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* build: Update copyright to 20200.16.0Dimitri Staessens2020-01-021-1/+1
| | | | | Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Add initial rtt estimator to FRCTDimitri Staessens2019-02-081-12/+63
| | | | | | | | | | | | This adds a simple round-trip time estimator to FRCT. The estimate is a weighted average with deviation. The retransmission is scheduled after rtt + 2 times the deviation. A retransmit doubles the rtt estimate to avoid the no-update case when rtt suddenly increases. The rtt is estimated in microseconds and the granularity for retransmits is 256 microseconds. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* build: Update copyright to 2019Dimitri Staessens2019-02-051-1/+1
| | | | | | | Updates the copyright notice in all sources to 2019. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Add First Fragment (FFGM) bit to FRCTDimitri Staessens2018-11-211-1/+2
| | | | | | | | | This adds a First Fragment bit to FRCT. This small optimisation avoids losing two packets when there is packet loss without fragmentation, without the need to disable fragmentation at the end points. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Remove CRC flag from FRCTDimitri Staessens2018-10-151-1/+0
| | | | | | | | The integrity check mechanism was split from FRCT, this flag is not needed anymore. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Some more fixes in retransmissionDimitri Staessens2018-10-101-6/+6
| | | | | | | | | The queued packets were not correctly read. The rcv_cr->seqno now indicates the next packet the receiver application expects. A lot more stable now, but still some further issues to be fixed. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* lib: Fix initial automated repeat-requestDimitri Staessens2018-10-091-7/+13
| | | | | | | | | This fixes rudimentary automated repeat-request ARQ to correctly configure the both connection records and use the receiver seqno. The rto variable is moved out of the connection record. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* Merge branch 'testing' into beDimitri Staessens2018-10-061-102/+23
|\
| * lib: Keep track of highest delivered seqnoDimitri Staessens2018-10-061-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | The FRCT kept only a left window edge in the receiver connection window, however, it needs to keep track of the left window edge (highest ACK'd sequence number) and the highest delivered sequence number, so it can delay ACKs that cannot be piggybacked. TCP recommends at most 500 ms for delayed ACKs (probably good to keep it near half of RTO). Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
| * lib: Split error checking from FRCTDimitri Staessens2018-10-051-36/+0
| | | | | | | | | | | | | | | | This splits off the CRC from FRCT so it can be set independently. Ouroboros now allows raw flows with error checking. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
| * ipcpd, lib, irmd, tools: Change SDU to packetSander Vrijders2018-10-051-1/+1
| | | | | | | | | | | | | | | | This will change SDU (Service Data Unit) to packet everywhere. SDU is OSI terminology, whereas packet is Ouroboros terminology. Signed-off-by: Sander Vrijders <[email protected]> Signed-off-by: Dimitri Staessens <[email protected]>
| * lib: Pass qosspec at flow allocationDimitri Staessens2018-10-051-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | The flow allocator now passes the full qos specification to the endpoint, instead of just a cube. This is a more flexible architecture, as it makes QoS cubes internal to the layers. Adds endianness transforms for the flow allocator protocol in the normal IPCP. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
| * lib: Check return values init functionsSander Vrijders2018-09-281-3/+3
| | | | | | | | | | | | | | | | | | This will check the return values of init functions so that the code is more robust. It also removes a duplicate init in the timerwheel, checks for buffer overflows in the RIB and checks string lengths. Signed-off-by: Sander Vrijders <[email protected]> Signed-off-by: Dimitri Staessens <[email protected]>
| * lib: Remove configuration from FRCTDimitri Staessens2018-09-271-58/+17
| | | | | | | | | | | | | | | | This removes configuration from the FRCT protocol to send it during flow allocation. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>
* | lib: Support for rudimentary retransmissionDimitri Staessens2018-07-271-35/+38
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | This adds rudimentary support for sending and processing acknowledgments and doing retransmission. It replaces the generic timerwheel with a specific one for retransmission. This is currently a fixed wheel allowing retransmissions to be scheduled up to about 32 seconds into the future. It currently has an 8ms resolution. This could be made configurable in the future. Failures of the flow (i.e. rtx not working) are indicated by the rxmwheel_move() function returning a fd. This is currently not yet handled (maybe just setting the state of the flow to FLOWDOWN is a better solution). The shm_rdrbuff tracks the number of users of a du_buff. One user is the full stack, each retransmission will increment the refs counter (which effectively acts as a semaphore). The refs counter is decremented when a packet is acked. The du_buff is only allowed to be removed if there is only one user left (the "stack"). When a packet is retransmitted, it is copied in the rdrbuff. This is to ensure integrity of the packet when multiple layers do retransmission and it is passed down the stack again. Signed-off-by: Dimitri Staessens <[email protected]> Signed-off-by: Sander Vrijders <[email protected]>