| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Growing pains.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The delayed ACK was wrongly measuring the delay against the receiver
activity instead of the sender activity. Also fixed receiver activity
not being updated for non-data packets (and duplicates and other
dropped traffic).
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the option to use the Round-Trip-Time (RTT) estimation
algorithm as it is implemented in the TCP implementation in Linux. It
looks like it outperforms the TCP default algorithm, so I enabled this
one by default. Also adds the option to change the RTO timeout
calculation to include more (or less) than 4 times the mdev (specified
as a power of 2. Left the default value to 2 (so, 4 mdevs), but 3 (8
mdevs) gives better results in my tests.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Delayed ACKs are now sent after twice the internal tick time. Fixes
initial ACK record (rcv_cr.seqno) being uninitialized (0) when the
first ACK was to be sent. Adds some FRCT metrics for number of
received delayed (bare) ACKs and the RTT estimator.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unidirectional traffic has one of the peers only send bare FRCT
packets. These never set a DRF, since they have no sequence number.
At the receiver, all these ACKs and window updates were always dropped
as the receiver connection record was timed out.
Also fixes a SEGV if flow control kicks in (passing NULL timeout to
pthread_cond_timedwait).
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The frcti_filter was reading raw data from the buffers, causing the
frcti_rcv to operate directly on encrypted packets. It decrypt and
filter for invalid packets. I moved the function from frct to the
fqueue implementation and renamed it fqueue_filter as it filters
fqueues. Should be extended to filter out keepalives on non-FRCT
flows, as these will now still cause spurious wakeups.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step moving away from scheduling the FRCT and flow
monitoring functions as part of the IPC calls (flow_read / flow_write
/ fevent) and towards the more scalable (and far less complicated)
implementation to take care of these functions in separate threads.
If a process creates the first flow that requires FRCT, it will spin
up a thread to process events on the timerwheel (retransmissions and
delayed ACKs). This single thread lives until the last flow with FRCT
is deallocated.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reading packets from the rbuff and checking their validity (non-zero
size, pass crc check, pass decryption) is now extracted into a
function.
Also adds a function to get the length of an sdu_du_buff instead of
subtracting the tail and head pointers.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It doesn't really make sense to manually and one-sidedly configure the
timeout of delayed acknowledgements, as setting it too high upsets the
peer's sRTT estimates. Even worse, it also causes a lot of spurious
retransmissions if it exceeds the sRTT mean deviation calculated by
the receiver. Compensating on bare acknowledgment for the ack delay
could improve the RTT estimate deviation, but not the spurious
retransmissions if it was set too high. This sets the delayed ack to
wait for a single RTT mean deviation. Probably needs more tweaking to
further reduce differences between the RTT estimates at the sender and
receiver, e.g. compensate the RTT estimate for delayed acks, or
increase the RTO to add 8 mdevs to sRTT instead of 4. However, it
looks like the mdev estimate is the trickiest one to get to sync, not
the RTT average. Linux reduces the sample weight for mdev from 1/4 to
1/32 in some cases, will give that a shot some day too to see if that
further align sRTT estimates. In any case, this patch already improves
things a lot.
Also fixes a bug where the sender was sending acknowlegments on the
first packets in flight for the 0 sequence number. The receiver
activity was measured in seconds but compared to a timeout value in
nanoseconds.
There's still a lot of spurious retransmissions that start after
actual packet loss occurs, I'm still investigating what causes it.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This exposes some additional metrics relating to FRCT / Flow control:
the number of duplicate packets received, number of packets received
out of the flow control window and / or reordering queue, and the
number of rendez-vous messages sent.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The timerwheel was retransmitting packets and the error check for
negative values of the rbuff allocation was instead checking for
non-zero values, causing a buffer allocation to succeed but the
program to continue down the unhappy path leaving that packet stuck in
the buffer unattended.
Also fixes wrongly scheduled retransmissions that cause packet storms.
FRCP is much more stable now. Still needs some work for high
bandwidth-delay products (fast-retransmit).
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The parameters were set directly from the build configs. A first step
to making FRCP configurable at runtime, is to pass the parameters to
the frcti_create() function.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
| |
The notorious off-by-one hit again.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There was a lock reversal in the timerwheel. There still is a thorough
revision needed of the locking in dev.c after the FRCP logic is
completed and tuned.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
| |
Bare FRCP messages (ACKs without data, Rendez-vous packets) were not
encrypted on encrypted flows, causing the receiver to fail decryption.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds flow liveness monitoring for flows, with a fixed timeout of
120s. I will make it configurable at flow allocation later on (timeout
needs to be communicated to the peer). If one peer dies, or doesn't
call any IPC calls (flow_write/flow_read/fevent) it will stop sending
keepalives and the other peer's read/writes will error on an
-EFLOWDOWN after the timeout expires.
Packets without a payload (0 length packets) are interpreted as
keepalive packets for the flow. They can be sent from any application,
but they will not trigger a message read at the receiver side (0 as a
return value on flow_read indicates a previous partial read has
completed at exactly the buffer size).
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The free of the buffer in the failure path of the readdir RIB
functions was taking the wrong pointer in a couple of places. The FRCT
RIB readdir was missing error handling for malloc and strdup.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
| |
There was a missing unlock in FRCT. Also fixes some indentation.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the timeout is already expired, the wait variable would be negative
and return a negative value for the __frcti_dealloc function, thinking
that the timeout was not expired causing an unnecessary wait even if
all packets are acknowledged.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The initial sender right window edge (indicating acknowledged packet
sequence number) was initialized to seqno - 1. This should be the same
as seqno, since we acknowledge with the next expected sequence number.
It also indicates that a flow without traffic has no outstanding
acknowledgements.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Application flows can now be monitored from the RIB, exposing FRCT
statistics (window edges, retransmission timeout, rtt estimate, etc).
Application RIB requires user permissions to be able to access
/dev/fuse.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This add an ouroboros/pthread.h header that wraps the
pthread_..._unlock() functions for cleanup using
pthread_cleanup_push() as this casting is not safe (and there were
definitely bad casts in the code). The close() function is now also
wrapped for cleanup in ouroboros/sockets.h.
This allows enabling more compiler checks.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
| |
The ugent email addresses are shut down, updated to Ouroboros mail
addresses.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
| |
Happy New Year, Ouroboros!
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the rendez-vous mechanism to handle the case where the
sending window is closed and window updates get lost. If the sending
window is closed, the sender side will send an RDVS every DELT_RDV
time (100ms), and give up after MAX_RDV time (1 second). Upon
reception of a RDVS packet, a window update is sent immediately. We
can make this much more configurable later on (build options for
defaults, fccntl for runtime tuning).
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If the sending window for flow control is closed, the sending
application will now block until the window opens. Beware that until
the rendez-vous mechanism is implemented, shutting down a server while
the client is sending (with non-timed-out blocking write) will cause
the client to hang indefinitely because its window will close.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This adds sending and receiving window updates for flow control. I
used the 8 pad bits as part of the window update field, so it's 24
bits, allowing for ~16 million packets in flight.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows configuring some parameters for FRCP at compile time, such
as default values for Delta-t and configuration of the timerwheel. The
timerwheel will now reschedule when it fails to create a packet,
instead of setting the flow down immediately. Some new things added
are options to store packets for retransmission on the heap, and using
non-blocking calls for retransmission. The defaults do not change the
current behaviour.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This completes the retransmission (automated repeat-request, ARQ)
logic, sending (delayed) ACK messages when needed.
On deallocation, flows will ACK try to retransmit any remaining
unacknowledged messages (unless the FRCTFLINGER flag is turned off;
this is on by default). Applications can safely shut down as soon as
everything is ACK'd (i.e. the current Delta-t run is done). The
activity timeout is now passed to the IPCP for it to sleep before
completing deallocation (and releasing the flow_id). That should be
moved to the IRMd in due time.
The timerwheel is revised to be multi-level to reduce memory
consumption. The resolution bumps by a factor of 1 << RXMQ_BUMP (16)
and each level has RXMQ_SLOTS (1 << 8) slots. The lowest level has a
resolution of (1 << RXMQ_RES) (20) ns, which is roughly a
millisecond. Currently, 3 levels are defined, so the largest delay we
can schedule at each level is:
Level 0: 256ms
Level 1: 4s
Level 2: about a minute.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the logic to send a pure acknowledgment packet without any
data to send. This needed the event filter for the fqueue, as these
non-data packets should not trigger application PKT events. The
default timeout is now 10ms, until we have FRCP tuning as part of
fccntl.
Karn's algorithm seems to be very unstable with low (sub-ms) RTT
estimates. Doubling RTO (every RTO) seems still too slow to prevent
rtx storms when the measured rtt suddenly spikes several orders of
magnitude. Just assuming the ACK'd packet is the last one transmitted
seems to be a lot more stable. It can lead to temporary
underestimation, but this is not a throughput-killer in FRCP.
Changes most time units to nanoseconds for faster computation.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a small refactor of FRCT because I found some things a bit
hard to read. I tried to refactor frcti_rcv to always queue the
packet, but that causes unnecessarily retaking the lock when calling
queued_pdu and thus returning idx is a tiny bit faster.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The single retransmission wheel caused locking headaches as the calls
for different flows could block on the same rxmwheel. This stabilizes
the stack, but if the rdrbuff gets full there can now be big delays.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fixes infinite rescheduling with RTO getting lower than the timerwheel
resolution. For very low RTO values we'd need a big packet buffer with
the current memory allocator implementation (rdrbuff). Setting a
(configurable) minimum RTO (250 us) reduces this need.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There were a bunch of bugs in FRCP that urgently needed fixing. Now
data QoS is usable even with heavy packet loss (within some
parameters). The current RTT estimator is the IETF one. It should be
updated to the improved one used in the Linux kernel once the A-timer
(ACKs without data) and graceful shutdown are implemented.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The initial implementation for the ECDHE key exchange was doing the
key exchange after a flow was established. The public keys are now
sent allowg on the flow allocation messages, so that an encrypted
tunnel can be created within 1 RTT. The flow allocation steps had to
be extended to pass the opaque data ('piggybacking').
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a simple round-trip time estimator to FRCT. The estimate is
a weighted average with deviation. The retransmission is scheduled
after rtt + 2 times the deviation. A retransmit doubles the rtt
estimate to avoid the no-update case when rtt suddenly increases. The
rtt is estimated in microseconds and the granularity for retransmits
is 256 microseconds.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
| |
Updates the copyright notice in all sources to 2019.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This adds a First Fragment bit to FRCT. This small optimisation avoids
losing two packets when there is packet loss without fragmentation,
without the need to disable fragmentation at the end points.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
| |
The integrity check mechanism was split from FRCT, this flag is not
needed anymore.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The queued packets were not correctly read. The rcv_cr->seqno now
indicates the next packet the receiver application expects. A lot more
stable now, but still some further issues to be fixed.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes rudimentary automated repeat-request ARQ to correctly
configure the both connection records and use the receiver seqno. The
rto variable is moved out of the connection record.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The FRCT kept only a left window edge in the receiver connection
window, however, it needs to keep track of the left window edge
(highest ACK'd sequence number) and the highest delivered sequence
number, so it can delay ACKs that cannot be piggybacked. TCP
recommends at most 500 ms for delayed ACKs (probably good to keep it
near half of RTO).
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This splits off the CRC from FRCT so it can be set
independently. Ouroboros now allows raw flows with error checking.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This will change SDU (Service Data Unit) to packet everywhere. SDU is
OSI terminology, whereas packet is Ouroboros terminology.
Signed-off-by: Sander Vrijders <[email protected]>
Signed-off-by: Dimitri Staessens <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The flow allocator now passes the full qos specification to the
endpoint, instead of just a cube. This is a more flexible
architecture, as it makes QoS cubes internal to the layers.
Adds endianness transforms for the flow allocator protocol in the
normal IPCP.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This will check the return values of init functions so that the code
is more robust. It also removes a duplicate init in the timerwheel,
checks for buffer overflows in the RIB and checks string lengths.
Signed-off-by: Sander Vrijders <[email protected]>
Signed-off-by: Dimitri Staessens <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This removes configuration from the FRCT protocol to send it during
flow allocation.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds rudimentary support for sending and processing
acknowledgments and doing retransmission.
It replaces the generic timerwheel with a specific one for
retransmission. This is currently a fixed wheel allowing
retransmissions to be scheduled up to about 32 seconds into the
future. It currently has an 8ms resolution. This could be made
configurable in the future. Failures of the flow (i.e. rtx not
working) are indicated by the rxmwheel_move() function returning a fd.
This is currently not yet handled (maybe just setting the state of the
flow to FLOWDOWN is a better solution).
The shm_rdrbuff tracks the number of users of a du_buff. One user is
the full stack, each retransmission will increment the refs counter
(which effectively acts as a semaphore). The refs counter is
decremented when a packet is acked. The du_buff is only allowed to be
removed if there is only one user left (the "stack").
When a packet is retransmitted, it is copied in the rdrbuff. This is
to ensure integrity of the packet when multiple layers do
retransmission and it is passed down the stack again.
Signed-off-by: Dimitri Staessens <[email protected]>
Signed-off-by: Sander Vrijders <[email protected]>
|