This upgrade takes care of issues UCB-02-003, UCB-02-004, and UCB-02-006
from the 2021 security audit of Turbo Tunnel by Cure53.
https://github.com/flynn/noise/security/advisories/GHSA-g9mp-8g3h-3c5c
UCB-02-004
https://github.com/flynn/noise/pull/43
UCB-02-003, UCB-02-006:
https://github.com/flynn/noise/pull/44
In my testing locally, specifying -dot with a non-responsive TCP port
would time out after about 30 seconds anyway:
$ time ./dnstt-client -dot tns.example.com:8000 -pubkey-file server.pub t.example.com 127.0.0.1:7000
dial tcp 45.79.134.119:8000: connect: connection timed out
real 0m31.398s
user 0m0.006s
sys 0m0.003s
Which is in line with the documentation for net.Dialer:
https://golang.org/pkg/net/#Dialer
With or without a timeout, the operating system may impose its
own earlier timeout. For instance, TCP timeouts are often around
3 minutes.
But may as well be explicit.
This commit has the side effect of changing the error message from
"connection timed out" to "i/o timeout".
$ time ./dnstt-client -dot tns.example.com:8000 -pubkey-file server.pub t.example.com 127.0.0.1:7000
dial tcp 45.79.134.119:8000: i/o timeout
real 0m30.007s
user 0m0.003s
sys 0m0.007s
I tried setting the dialTimeout to 40 seconds, and in that case the
system timeout take precedence after ≈31 seconds, with the "connection
timed out" error as before.
This is issue UCB-02-007 from the 2021 security audit of Turbo Tunnel by
Cure53.
The audit report additionally recommends calling SetReadDeadline before
each read operation. I have chosen not to do that. It is intended that
the TLS connection should be able to remain idle if there is nothing to
send. As DNS is a query–response protocol, one might expect a response
(and within a certain amount of time) only after sending a query;
sendLoop could refresh the ReadDeadline for recvLoop every time it sends
a query. But a malicious DoT server could keep a useless connection
alive anyway by sending Slowloris-style short responses within each
deadline, and an external adversary could capable of delaying responses
could deny service indefinitely or simply block the server. In any case,
the smux KeepAliveTimeout serves as a check that prevents stalled
connections from remaining indefinitely.
In my testing locally, these dials would time out after about 30 seconds
anyway:
2021/04/20 23:26:46 begin session 54cafb53
2021/04/20 23:26:47 begin stream 54cafb53:3
2021/04/20 23:27:19 stream 54cafb53:3 handleStream: stream 54cafb53:3 connect upstream: dial tcp X.X.X.X:YYYY: connect: connection timed out
2021/04/20 23:27:19 end stream 54cafb53:3
Which is in line with the documentation for net.Dialer:
https://golang.org/pkg/net/#Dialer
With or without a timeout, the operating system may impose its
own earlier timeout. For instance, TCP timeouts are often around
3 minutes.
But may as well be explicit.
This commit has the side effect of changing the error message from
"connection timed out" to "i/o timeout".
2021/04/20 23:28:08 begin session 05b0a46e
2021/04/20 23:28:09 begin stream 05b0a46e:3
2021/04/20 23:28:39 stream 05b0a46e:3 handleStream: stream 05b0a46e:3 connect upstream: dial tcp X.X.X.X:YYYY: i/o timeout
2021/04/20 23:28:39 end stream 05b0a46e:3
The usual use case for upstream is that it is a localhost IP address and
port, but it may also be a hostname and port. net.DialTCP resolves the
hostname once and for all, and only uses one of the hostname's IP
addresses if there are more than one. net.Dial will try all the IP
addresses in turn until it is able to establish a connection.
Now upstream is kept as a string variable all the way through the call
chain. For the sake of usability, we try resolving the address with
net.ResolveTCPAddr in main, to emit an error or warning right away,
rather than deferring it to the first stream.
This was a memory leak.
Compare to the `sess.Close()` in the client:
https://repo.or.cz/dnstt.git/blob/2fe067548848f7dd1acb527a20699d7d2358d150:/dnstt-client/main.go#l174
This is issue UCB-02-002 from the 2021 security audit of Turbo Tunnel by
Cure53.
Recent versions of go (at least go1.14) automatically add a `go` line to
go.mod with every command, such as `go build`. Adding one here so that
builds don't wind up changing a versioned file. There doesn't seem to be
much importance to what number goes here
(https://utcc.utoronto.ca/~cks/space/blog/programming/GoModulesGoVersions).
I've just verified that the programs build with go1.11.
This log line would formerly be emitted for a query with 0 questions:
FORMERR: too many questions (0)
The Nmap DNSStatusRequest probe is a query with 0 questions.
There was a logic error in the code. The nextP variable was used to
store the packet that was too big to pack into the most recent DNS
response. But nextP was not tied to any particular ClientID; instead it
would be sent to whatever client happened to be the recipient of the
next response.
The confusion didn't cause connections to fail completely; any
misdirected packets were treated as out-of-sequence garbage by KCP and
dropped. But it hurt performance a lot: I saw a download go from 300
KB/s to 50 KB/s just by connecting a second client with a different
ClientID (not even sending or receiving with the second client). The
reason is that a fraction of the packets intended for the downloading
client were instead sent to the idle client, which to the downloading
client looks like a packet drop, requiring a retransmission by the
server.
We fix it by placing the leftover packet in a per-ClientID stash, rather
than a variable shared by all ClientIDs.
I want a way to "unread" a packet from an send queue, in the case where
I'm packing packets into a fixed space and don't know when I'm done
until I've read one too many packets. There's no way to insert the extra
packet at the head of the cannel representing the send queue, so that it
will be the next thing received from OutgoingQueue. Instead, add an
separate one-element queue called the stash. The caller can stash an
excess packet, then prioritize checking it in the next round by calling
Unstash before OutgoingQueue.
I don't know what I was thinking in
f1ee951fd67d43c3f642c1f319678da51dd05c9b. The way it was written, if
there were not immediately additional packets to pack into the
downstream, it would stop trying to pack and would instead wait until
the maxResponseDelay or another response to send. What I meant is that
the timer and the next-response channel should have priority, if either
of those is true *and* there is additional downstream available to pack.
Only when both of those are false should we try to pack downstream data.