16 March, 2015
Last week there was an interesting question
posed by Peter Smith
to the fwknop mailing list that focused on running fwknop in
UDP listener mode
vs. using libpcap. I thought it would be useful to turn this into a blog post, so here is Peter's
original question along with my response:
"...I want to deploy fwknop on my server, but I'm not sure If I should use
the UDP listener mode or libpcap. At first UDP listener mode seems to be
the choice, because I don't have to compile libpcap. However, I then
have to open a port in the firewall. Thinking about this, I get the
feeling that I'm defeating the purpose of using SPA, by allowing
Internet access to a privileged processe.
If an exploitable security issue is found, even though fwknop remains
passive and undiscoverable, an attacker could blindly send his exploit
to random ports on servers he suspects running fwknopd, and after
maximum 65535 tries he would have root access. I'm not a programmer, so
I can't review the code of fwknop or SSH daemon, but if both is equally
likely of having security issues, I might as well just allow direct
access to the SSH daemon and skip using SPA.
Is my point correct?..."
First, let me acknowledge your point as an important issue to bring up. If
someone finds a vulnerability in fwknopd, it doesn't matter whether fwknopd
is collecting packets via UDP sockets or from libpcap (assuming we're
talking about a vulnerability in fwknopd itself). The mere processing of
network traffic is the problem.
So, why run fwknopd at all?
This is something of a long-winded answer, but I want to be thorough. I'll
likely not settle the issue with this response, but it's good to start the
discussion.
Starting with your example above with SSHD open to the world, suppose there
is a vulnerability in SSHD. What does the exploit model look like? Well, an
attacker armed with an exploit can trivially find the SSH daemon with any
scanner, and from there a single TCP connection is sufficient for
exploitation. I would argue that a primary enabling factor benefiting the
attacker in this scenario is that vulnerable targets that listen on TCP
sockets are so easy to find. The combination of scannable services +
zero-day exploits is just too nice for an attacker. Several years ago I
personally had a system running SSHD compromised because of this. (At the
time, it was my fault for not patching SSHD before I put it out on the
Internet, but still - the system was compromised within about 12 hours of
being online.)
Now, in the fwknopd world, assuming a vulnerability, what would
exploitation look like? Well, targets aren't scannable as you say, so an
attacker would have to guess that a target is running fwknopd and the
corresponding port on which it watches/listens for SPA packets
[1].
Forcing an attacker to send thousands of packets to different ports (assuming a
custom non-default port instead of UDP port 62201 that fwknopd normally
watches) is likely a security benefit in contrast to an obviously
targetable service. That is, sending tons of SPA exploit packets to
different ports is a common pattern that is quite noisy and is frequently
flagged by IDS's, firewalls, SIEM's, and flow analysis engines. How many
systems could an attacker target with such heavyweight scans before the
attacker's own ISP would notice? Or before the attacker's IP is included
within one of the Emerging Threats compromised hosts lists? Or within
DShield as a known bad actor? 10 million? How many of these are actually
running fwknopd? I know there are some spectacular scanning results out
there, so it's really hard to quantify this, but either way there is a big
difference between sending thousands of > 100-byte UDP packets to each
target vs. a port sweep for TCP/22.
Further, when a system is literally blocking every incoming packet
[2],
an attacker can't even necessarily be sure there is any target connected to
the network at all. Many attackers would likely move on fairly quickly from
a system that is simply returning zero data to some other system.
In contrast, for a service that advertises itself via TCP, an attacker
immediately knows - with 100% certainty - there is a userspace daemon with
a set of functions that they can interact with. This presents a risk. In my
view, this risk is greater than the risk of running fwknopd where an
attacker gets no data.
Another fair question to ask is about the architecture of fwknopd. When an
HMAC is used (this will be the default in a future release), fwknopd reads
SPA packet data, computes an HMAC, and does nothing else if the HMAC does
not match. This is an effort to try to stay faithful to a simplistic
design, and to minimize the functions an attacker can interact with - even
for packets that are blindly sent to the correct port where fwknopd is
watching. Thus, most functions in fwknopd, including those that parse
user-supplied data and hence where bugs are more likely to crop up, are not
even accessible without first passing the HMAC which is a cryptographically
strong test. When libpcap is also eliminated through the use of UDP, not
even libpcap functions are in the mix
[3]. In other words,
the security of fwknop does not rely on not being discoverable or scannable - it is merely
a nice side effect of not using TCP to gather data from the network.
To me, another way to think of fwknopd in UDP mode is that it provides a
lightweight generic UDP authenticator for TCP-based services. Suppose that
SSHD had a design built-in where it would bind to a UDP socket, wait for a
similar authentication packet as SPA provides, and then switch over to TCP.
In this case, there would not be a need for SPA because you would get the
best of both worlds - non-scannability
[4] plus everything that TCP provides
for reliable data delivery at the same time. SPA in UDP listening mode is
an effort to bridge these worlds. Further, there is nothing that ties
fwknopd to SSH. I know people who expose RDP to the Internet for example.
Personally, I'm quite confident there are fewer vulnerabilities in fwknopd
than something like RDP. Even if there isn't a benefit in terms of the
arguments above in terms of service concealment and forcing an attacker to
be noisy, my bet is that fwknopd - even if it advertised itself via TCP -
would provide better security than RDP or other services derived from
massive code bases.
Now, beyond all of the above, there are some additional blog posts and other
material that may interest some readers:
[1] If an attacker is in the position to watch legitimate SPA packets from
an existing client then this guesswork can be largely eliminated. However,
even in this case, if fwknopd is sniffing via libpcap (so not using the UDP
only mode), there is no requirement for fwknopd to be running on the system
where access is actually granted. It only needs to be able to sniff packets
somewhere on the routing path of the destination IP chosen by the client. I
mention this to illustrate that it may not be obvious to an attacker where
a targetable fwknopd instance runs even when SPA packets can be monitored.
Also, there aren't too many attackers who are in this position. At least,
the number of attackers in this position is _far_ lower than those people
(everyone) who are in a position to discover a vulnerable service from any
system worldwide with a single TCP connection.
[2] As you point out, in UDP-only mode, the firewall must allow incoming
packets to the UDP port where fwknopd is listening. But, given that fwknopd
never sends anything back over this port, it remains indistinguishable to
every other filtered port.
[3] There are more things built into the development process that may be
worth noting too that heighten security such as the use of Coverity, AFL
fuzzing, valgrind, etc., but these probably takes us too far afield from
the topic at hand. Also, there are some roadmap items that have not been
implemented yet (such as privilege separation) that will make the
architecture even stronger.
[4] Assuming a strong firewall stance against incoming UDP packets to
other ports so one cannot infer service existence because UDP/22 doesn't
respond to a scan but other ports respond with an ICMP port unreachable,
etc.