How to fix excessive numbers of FTP connections in in TIME_WAIT state?

When you use FTP for recursive downloads or uploads and the FTP server has a firewall installed, you might get blocked due to "too many connections". Nearly all of those connections would be in TIME_WAIT state, and are only visible on the server side (not when you check on your client with e.g. netstat -anp --tcp | grep <ip-address>). The exact threshold of "too many" depends on firewall configuration (but can be something like 300 – 800). You will probably see no error message at all, and cannot reach the FTP server any more, either temporarily or permanently. What causes this problem, and how to avoid it?

First, this is a completely normal feature of the FTP protocol in passive mode: "Some existing protocols, such as FTP, make use of implicit signalling, and cannot be retrofitted with TIME_WAIT controls." – source; additionally, FTP does not provide for TCP connection reuse as HTTP/1.1 does, so we are indeed stuck with using one TCP socket for transferring one file via FTP). Perusing lots of server ports is not nice of the FTP protocol though (they stay in TIME_WAIT for up to four minutes!), since in extreme casesit can lead to the server runnig out of ports in the ephemeral port range for incoming connections. That's why the firewall is jumping in …

To understand how to work around this, we first have to know a bit about FTP active / passive mode and TCP TIME_WAIT:

"In active mode, the client establishes the command channel (from client port X to server port 21) but the server establishes the data channel (from server port 20 to client port Y, where Y has been supplied by the client).
In passive mode, the client establishes both channels. In that case, the server tells the client which port should be used for the data channel."
[Stackoverflow answer by paxdiablo, CC-BY-SA-3.0]

Now, TIME_WAIT is a state that a socket enters for around 4 minutes after a TCP connection was closed cleanly. It exists for two reasons (preventing delayed segments from being misinterpreted as part of a new connection, and reliable full-duplex connection termination, as explained in a great article in detail).

Fix 1: Use active mode FTP

The usual recommendation (like here) for avoiding being blocked by a firewall due to TIME_WAIT connections is to switch from passive to active FTP mode. Here's my (preliminary) understanding of why this works:

In the FTP protocol, one TCP data connection is used per file transfer, in both passive and active modes. However, in active mode the port used for these TCP connections on the server side is always 20, while in passive mode it is a random port number for each new connection, which will then be in TIME_WAIT state for 2-4 minutes after the connection ends. That is why "the downside to passive mode is that it consumes more sockets on the server [and] could eventually lead to port exhaustion" [source]. Not exactly though: a socket in TIME_WAIT state does not have to block the whole port from reuse, but only the exact socket (which  is a combination of two server addresses and two port numbers), though many operating systems naively implement it to block the whole port [source]. And in line with this naive way of implementing it, firewalls naively assume that the port is still blocked (even if it's not) and hence consider TIME_WAIT connections under somehow active connections. Now if you switch to active mode FTP, the server uses port 20 for all the outgoing data connections, with different sockets for each though. So the same number of sockets are consumed, and the same number are in TIME_WAIT state for the same amount of time, but since they all belong to one port, the firewall will not see it, and not block you.

This is obviously a relatively crude way of solving the issue. A better way would be firewall whitelisting (if accessible to you), or a deep inspection firewall with rules not counting TIME_WAIT connections resulting from passive mode FTP.

Fix 2: Limit the number of transfers per minute

If a firewall configuration allows us (say) 400 connections, and sockets stay in TIME_WAIT state for 4 minutes, it allows for 100 file transfers (each with its own TCP connection) per minute without being blocked. Technically, recursive download commands could support such a limit, but I am not aware of any FTP client that does. Not even lftp has this: it has the net:connection-limit option, but that is only for active connections. A connection that is in TIME_WAIT on the server is considered closed by the client already, so will not count into this limit.

However, a relatively nice approximation is this: Use the mirror --script=FILE command to just generate a script for downloading the files, file by file. Edit that script and insert lftp sleep 4m commands after a number of files that are close to triggering the firewall blocking. Then execute that script in lftp. All that could be wrapped in a nice script as well …

As an approximation, one might perhaps use a very restrictive data rate limit option to stay below the "files per minute" threshold. In lftp, that would be net:limit-total-rate (compare its manpage). Some people also reported that limiting the max. number of parallel active connections to one also helped [source] – it works the same way as rate limiting, but is far from guaranteed to help in your case, of course.


Posted

in

,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.