Scaling security devices is much more difficult than scaling routers or switches. A router acts on the destination IP lookup only, a 32 or 128 bit fixed length value, whereas a switch acts on a 48 bit fixed length MAC address, looking up on the destination MAC and adding the source MAC to a lookup table. Those values are not just fixed length, but they also appear at the same place in a data frame.
Routers and switches therefore embraced silicon very early on. Custom chips were designed that are comprised from transistors that form logic gates such as NAND or OR gates. Those logic gates are hardwired on a chip. These chips are called Application Specific Integrated Circuits – or ASIC, for short.
The logic in an ASIC used for routers and switches are hardwired, very similar to electronic components on an old TV circuit board. Unlike in an old tube TV, those ASICs process digital data. They can extract extremely fast IP and MAC addresses or perform table routing and forwarding table lookups in real time. Real time means that the time to perform a function always takes the same time, regardless of the load and run time.
There are several drawbacks with ASICs, though: First, ASICs cannot be changed once they leave the foundry. Second, there is a long lead-time to developing an ASIC. ASICs are simulated in software but can only be tested when a real sample exists. Producing samples is very costly, hence a long time is spent on testing an ASIC in software emulation before the first sample is built. This means that the technology used in an ASIC might be two or three years old before an ASIC hits production. And third, the development costs of ASICs are very high which makes them expensive for low volume production and evolutional versioning. The same ASIC generation has to be amortized over many years. The span between ASIC generations can therefore be five or more years, specifically for ASICs that are made for only one vendor’s products and sees low production count.
While this works for routing and switching that has not rudimentary changed in a decade or two, and there are still routers and switches in production today, which outlived a decade in service, this approach cannot be utilized for security where new threats appear by the minute. Threats typically do not obey fixed length requirements or are found at the same place within a data frame. RFC3514 has not been widely adopted by the BlackHat community for some reason.
The solution is to use microprocessors. Microprocessors are completely flexible and can be programmed in an instance to perform various tasks. Early firewalls started on common office technology processors, mostly Intel i386, but also PowerPC. The early days of firewalls were extensions to routers or switches. Security rules matched on source and destination IP, IP protocol ID, as well as source and destination ports for UDP and TCP protocols all fixed length values appearing at the same place within a data frame. While those general-purpose processors were programmable, they were not fast, and depending on the underlying operating system, not predictable, in terms of timing. This created substantial delays and jitter between packets. Security vendors took a hint from router and switch vendors and created ASICs to perform value extraction, table lookup, and packet switching. During the stateful inspection days, ASIC based systems have been very successful.
Stateful packet inspection (SPI) works by tracking TCP connection state between a client and a server socket. A socket is the combination of an IP protocol and a port. The two most common protocols are stateless UDP and stateful TCP. Stateful inspection was controlling access between sockets – that means access between clients and server applications. The problem with stateful packet filters these days is that traffic uses few sockets and that clients need access to many more servers. Other applications such as peer-to-peer (P2P) file sharing can use any socket. For instance, an internal client does almost all connections on HTTP and HTTPS and needs access to the entire Internet. In addition, a malicious attack can come over a legitimate connection, e.g. browsing a reputable news site that has a banner ad with malicious code embedded.
Deep packet inspection (DPI) inspects the actual data stream that flows between a client and a server. DPI can identify the application independent of sockets, and can look within the data stream for malicious code, or categorize applications and content. Whereas DPI was originally an add on to SPI, these days it replaced SPI as SPI is no longer effective in stopping threats, or controlling traffic flows. The term Next-Generation Firewall in NGFW implies DPI functionality. This includes common services such as user, application, and content identification, as well as intrusion prevention, gateway antivirus, geo fencing, botnet detection, bandwidth controls, and such. Also today, SSL client decryption is more and more important to be able to look into the payload of the data stream. After the recent website disclosures, we have seen a steady trend of more encryption that according to some predictions might reach two thirds of all sites by the end of next year.
DPI inspection cannot easily be done in silicon, or in other words few sub-functions could be done in hardware. DPI systems often apply hardware coprocessors that do cryptography, pattern matches, table look-ups, and framing. Vendor specific custom ASIC’s are less common today due to the cost of development. Sometimes Field Programmable Arrays (FPGAs) are utilized instead since their development cycle is low, but performance is significantly lower than that of an ASIC system, and there is little benefit to modern multicore processors. Another strategy by vendors that are locked into ASICs, is adding a microprocessor core to their legacy silicon. Performance of those afterthoughts is poor.
To summarize: Stateful inspection is no longer effective in protecting a network. DPI only benefits for some repetitive sub-functions from ASICs, but custom ASIC development is expensive with multi-year amortization cycles. On the other hand, office computer and server processors are too slow for scaling DPI beyond a few Gbps. They are also expensive and consume a lot of power, which means they cannot be packaged very densely, limiting the maximum throughput of the system.
SonicWall solved this problem by creating a security platform that is free from legacy. It is not based on custom ASICs, but uses high volume ASIC functions, that does not use power hungry and expensive microprocessors, but uses large clusters of processors more commonly found in low power applications such as smart phones. This permits a high packaging density of massive parallel processing, both in general microprocessors as well as ASIC coprocessors, utilized for signature match, table lookup, cryptography, framing, hashing, and switching.
SonicWall utilizes Cavium’s Octeon systems-on-a-chip (SoC) with up to 32 individual MIPS64 cores. Multiple SoC systems can be combined. Systems can have up to eight processing blades with one Octeon processor each within the same small two or three RU hardware enclosure. Enclosures can be deployed individually, as A/P HA pairs, or clustered up in a security fabric with a combined 2048 cores and DPI throughput of over 300 Gbps.
A single pass security engine, Reassembly Free Deep Packet Inspection (RFDPI), for which SonicWall got a patent awarded, brings this streamlined hardware with massive processing ability to life. RFDPI processes from SonicWalls around the world share intelligence with each other, over 2,000,000 devices today, enabled by the SonicWall GRID cloud. The GRID also offers cloud services such as sandboxing an access to a signature base of over 21,000,000 signatures, growing: 40,000 new malware samples are analyzed every day.
The philosophy behind SonicWall is to offer price effective massive parallel processing power that is highly scalable, and enable it with sophisticated on-board software that is connected via the cloud.