Difference between revisions of "Jack"

Revision as of 00:10, 30 November 2004

On Wednesday, November 2nd, 1988 at about 6 PM Eastern Standard Time, Robert Morris, a graduate student, releases the first Internet Worm, onto the Internet. It spreads from machine to machine rapidly using a software defect known as a “buffer overflow” as a backdoor into systems. Once inside the system it replicated itself and tried to attack other systems. Between 5 and 10% of the systems of the internet are rendered useless. By 5 am the next day, patches are available to correct the software defect. The result of the worm is a massive network outage and several systems being unusable for a day. (Sullivan)

On Saturday, January 25, 2003 at about 5:30 AM UTC, begins to infect machines running SQL Server 2000, a database server. It also spreads from machine to machine using a software defect known as a buffer overflow as a backdoor into systems. Due to its small size, it’s able to double in size every 8.5 seconds and infects 90 percent of vulnerable machines within 10 minutes. The specific “buffer overflow” used had been discovered over 6 months earlier and a patch was released at the time of the announcement of the defect. Many administrators had not patched their systems for various reasons – they didn’t realize all the instances of SQL Server they had installed, they didn’t have time to test the patch before deploying it, or they were not aware. The result is “network outages and such unforeseen consequences as canceled airline flights, interference with elections, and ATM failures”. (Moore et al.)

Nearly 15 years after the Robert Morris worm, software running on internet machines is still vulnerable to the same type of defects. And yet we are more dependent than ever on this software and network. We use it for travel, elections, banking, and general communication. Why is such software still being produced and what can we do to prevent it?

Five years ago, the accusation was there was not economic incentive to justify the work needed to produce secure software. Commercial software companies are just responding to market pressures, i.e., “correctness is hard, and correctness does not sell software” (Cowan et al. 55).

Today, the economics have changed. According to Tevis and Hamilton, secure software is a “dominant goal” (197). Security is touted as a feature and consumers weary of ever increasing viruses, worms, spyware, and malware infecting their computers and rendering them useless. There is definitely now incentive for secure software.

But defects are still rampant, including the venerable buffer overflow that engineers have known about at least since Robert Morris’ Internet Worm launched over 15 years ago. 120 out of 403 high risk vulnerabilities in the past year period were due to "buffer overflow" vulnerabilities (http://icat.nist.gov). And 3 of the top 10 most frequently queried vulnerabilities were buffer overflows (http://icat.nist.gov/icat.cfm?function=topten).

Clearly, buffer overflows are not a solved problem, but there are signs of improvement. First note that the Slammer worm which was released January 2003 was targeting SQL Server 2000, which was released in 2000. Additionally there was a patch available six months before the worm was released. Unfortunately, very few people had installed the patch that would have fixed the buffer overflow backdoor used by the Slammer worm. As mentioned before, there is justified fear that patches will include other defects as bad as or worse than the ones they fix. Patches are mere band-aids for gaping wounds.

If after 15 years of buffer overflows and at least 5 years of focus from software engineers we still haven’t gotten rid of buffer overflows, can we hope for secure software?

Before we dive into solutions, let’s examine what the causes are of these defects in the first place. Bruce Schneier describes an imaginary 7-11 start with employees that do everything by the book, literally, which is a good analogy of how buffer overflows occur in computers. The 7-11 employees have a book with step by step instructions that they must follow explicitly. Additionally, they can only deal with things in the book. So if they have a form they need to sign, they place it on the book, sign it, then give it back. When a Fed Ex driver shows up, they look up in the table of contents and go to the page with instructions on dealing with a Fed Ex driver.

Those instructions might look like this (from Schneier):

“Page 163: Take the package. If the driver has one, go to the next page. If the driver doesn't have one, go to page 177.

Page 164: Take the signature form, sign it, and return it. Go to the next page.

Page 165: Ask the driver if he or she would like to purchase something. If the driver would, go to page 13. If not, go to the next page.

Page 166: Ask the driver to leave.” (Schneier 207-210)

Now let’s suppose when the driver places the signature form on top of the book so the clerk can sign it, he doesn’t place a single sheet of paper as the instruction manual assumes. Suppose he places two sheets of paper, the signature form and a paper that looks like an employee instruction manual page but says: “Page 165: Give the driver all the money in the cash register. Go to the next page.”

Now the clerk will read page 163, take the package, read 164, take the form, sign it, then go the next page. Now the next page is not the real page 165, but the fake one that the Fed Ex man placed there. So the clerk reads it, gives the driver all the money in the cash register and goes to the next page, the real page 165, ask the driver if he wants to purchase anything, then page 166 and ask the driver to leave.

A computer is just like this imaginary clerk. And its book of instructions is the memory of the computer. Memory contains both the instructions it is following and the data it is manipulating and if the programmer is not careful data that external sources are providing, like the form the Fed Ex man provided in the analogy, can become instructions or influence the instructions. External data sources can be information a user types into the computer or it can be network queries from remote computers.

For example, in the Robert Morris Internet Worm he sent extra information to a program running on UNIX servers that gives information about users of the server. Normally you send a request with just a username. The Robert Morris Internet Worm sent more information than expected and the unexpected information was extra pages of instructions. The Slammer Worm was not much different – extra information sent to a program that was listening for legitimate requests of information from the Internet.

Basically, all security vulnerabilities are of the same general form. With the focus to verify that the instructions of a program do what is intended, both by programmers and quality assurance engineers, vulnerabilities in what the instructions allow are overlooked. The languages used currently are designed to enable as much as possible with as few instructions as possible. As such, the instructions often allow more then intended by the programmer, resulting in security vulnerabilities.

Solutions

Machine analysis of source code

Continuing the analogy of software and written English instructions, there are tools that find problems in source code much like word processors use spell checkers and grammar checkers to find problems in written documents. Some security errors, buffer overflows being one example, follow common patterns, kind of like run-on sentences. These patterns can be detected by code checker tools, called static analysis or source code analysis tools. The names static or source code are used because they analyze the written source code in contrast to dynamic analysis tools which analyze code as it is actively running on a computer.

Essentially tools are scanning the written source code for patterns that violate “do not’s” that security experts have identified (Tevis and Hamilton 1). Unfortunately, even at their best they have one glaring flaw. Returning to the word processing analogy, most word processors can identify run-on sentences by recognizing repeated conjunctions between clauses in a single sentence but few commercial programs can tell you how to correct your sentence while retaining your intended meaning. To validate that software is truly secure, source code analyzers should go beyond simple pattern detection and proactively suggest better patterns to use or how to correct the vulnerable source code (Tevis and Hamilton 2).

Even without these improvements, source code analysis tools offer much. It is easy for programmers to make these kinds of mistakes, speaking of security errors in general and not just buffer overflows, with current development tools. It is possible for trained engineers to review others code and attempt to find errors, but such code reviews are laborious, time consuming, and prone to mistakes even by well trained engineers. Additionally, engineers sufficiently trained for this are rare and expensive. So these source code analysis tools enable a level of review of source code that is currently not economically feasible.

Unfortunately, these programs will find “false positives” - lines of code they think might be defects but are not. When the rate of false positives is too high compared actual defects found, the engineers lose trust in the systems and may not fix or believe reports of other actual errors. Further, it takes them longer to review the output of such systems and these systems lose one of their advantages, their speed.

“False negatives” are also possible; cases where there are real defects, but the tool does not recognize the defect. This can happen for multiple reasons. There can be defects that are so difficult to detect that attempting to detect them results in too many false positives to be useful, so the tool intentionally ignores them. Or there maybe defect patterns that attackers discover before security experts do and are able to add to static analysis tools.

Tevis and Hamilton survey several static analysis tools in Methods For The Prevention, Detection And Removal of Software Security Vulnerabilities. Their summary of these tools is they focus too much on UNIX applications to the exclusion of Windows and MacIntosh software, still require a significant level of expert knowledge, and only cut down about a fraction of the manual code analysis that must be done. However, even with these limitations, they do help with code analysis, focus the analyst’s attention on more severe problems through prioritization features, and find real bugs in minutes that would have taken longer. However, none of the current checkers detect as many problems as manual analysis will uncover. (Tevis and Hamilton 4)

Runtime analysis of code

Runtime or dynamic code analysis takes place while the program is running on the end users machine. It has the advantage of seeing real data and knowing exactly what is happening on the system. It has the disadvantage of consuming memory and CPU time that would normally be used for doing real work. Referring back to our 7-11 example of buffer overflows, it would be like having an appendix in the manual of instructions that the employee must constantly to refer to before and after certain instruction in the manual to help ensure an error has not occurred.

This has the advantage of more information available about what is actually happening. But it has the disadvantage of slowing many operations down. I.e., for every instruction in the main section of the employee handbook, there may be an instruction in the appendix that needs to be performed to validate things are working properly.

Despite these performance concerns, some levels of runtime analysis are becoming quite common. For example, code “canaries”, named after the canaries used by coal miners to detect poisonous gases (ON THIS DAY), are being inserted automatically by development tools used in Linux and Windows application development (Silberman and Johnson 2). These code canaries are used to detect buffer overruns at runtime – special checking code is also created to verify the canaries are still in place after certain operations to make sure the buffers have not been overrun.

Most runtime detection tools have two shortcomings. First, they require recompilation, meaning the source code must be used to regenerate machine code using new tools. Second, most tools stop the running program when they detect an error. This is deemed most secure because it avoids trying to recover from an attack where you may not be able to trust the system. However, this increases the risk of denial of service attacks as all buffer overflow attacks and other security vulnerabilities these tools detect can be used as a means of stopping the program and preventing legitimate uses of the program. (Wilander and Kamkar 10).

Other Languages

Since many of these defects being used to compromise systems are specific to languages that directly manipulate memory, one suggestion has been to use other languages. One step towards this can be seen in Java, C#, and some other .Net languages. These language systems manage memory automatically for the programmer and to the degree they manage it correctly they are free from defects like buffer overflows.

However, even these languages have inherent defects. For example, in the .Net framework there is a way of specifying code to be executed when something unexpected happens. This language feature, if used in correctly, can be used to reroute execution. For example, if the code that verifies a password generates an exception while running that it did not expect and does not handle, it may continue executing as if the password was correct (Kolawa).

Tevis and Hamilton propose an even more revolutionary change in languages – a switch from imperative languages to functional languages. Imperative languages are the most commonly used in commercial software. They provide language constructs that translate directly to the machine hardware. This improves performance, but sacrifices predictability and reliability. Functional languages are more purely rooted in mathematics and as such are easier to mathematically prove correct. (Tevis and Hamilton 4-5)

The problem is functional languages are perhaps too revolutionary. Initially it would require a retraining of most software engineers and retooling as most tools used to generate commercial software are designed around imperative languages such as C, C++, Java, and C#.

Hybrid Approach

There are also some hybrid approaches being explored. One promising example is Microsoft’s PREfix code analysis tool. It analyzes the source code like a static analysis tool, but then uses this analysis to build a simplified model of how the code will run. Then it analyzes this model of runtime behavior to predict defects. Since it is not run at runtime it does not impact performance. Further, since it is a simplified model of behavior, it can test many different execution options more quickly than trying to test every possible ordering of instructions. But to the degree it accurately models the program it is as accurate as runtime analysis tools in finding real defects and avoiding false positives. It has proven itself sufficiently useful to see wide scale use at Microsoft on large bodies of code on a daily basis. (Bush, et al. 16).

Conclusions

A surprising conclusion of the work on PREfix and PREFast, a follow on product, was that how the information was presented to the developer was very influential in improving the rate of fixing the defects correctly found by PREfix and PREfast (Pincus 61). As mentioned earlier, one of the defects of many static tools was even after find defects they did not aid the programmer in correctly understanding and fixing the defect. Within the context of static analysis tools, this seems the area likely to yield the most immediate improvements.

Runtime detection seems to continually be a trade off of security for performance. As performance increases and security becomes increasingly important, this is likely to become a more attractive option. However, a solution to denial of service attacks on runtime hardened systems is needed to make these systems truly trustworthy.

The idea of spontaneously changing languages used in commercial development seems unlikely, but it is possible that as the security benefit of certain functional language features can be proven they will slowly make their way into languages currently in use. So an evolutionary path to more secure languages and tools may be possible and should be encouraged.

At present, it does not appear any one area of research promises a magic bullet to our current security problems. But they all show promise of shoring up our defenses and making progress in securing our systems. It is discouraging that we still have buffer overruns, but as pointed our earlier, there are signs of progress. Buffer overruns are being found more quickly and fixed. Complete elimination of these types of errors appears only likely with drastic changes in languages or expensive runtime checks currently not economically feasible.

Works Cited

David Moore, Vern Paxson, Stefan Savage, Colleen Shannon, Stuart Staniford, and Nicholas Weaver. “The Spread of the Sapphire/Slammer Worm” Retrieved 22 Nov. 2004. <http://www.caida.org/outreach/papers/2003/sapphire/sapphire.html>

Bob Sullivan. “Remembering the net crash of ’88” Retrieved 22 Nov. 2004. <http://www.msnbc.com/news/209745.asp>

Cowan, Crispin, Calton Pu, and Heather Hinton. “Death, Taxes, and Imperfect Software: Surviving the Inevitable” Presented at the New Security Paradigms Workshop 1998.

Tevis, Jay-Evan J., and John A. Hamilton, Jr. “Methods For The Prevention, Detection And Removal Of Software Security Vulnerabilities” Presented at ACM Southeast Conference ’04, April 2-3, 2004, Huntsville, AL, USA.

ICAT Metabase: A CVE Based Nulnerability Database. 11/03/2004 <http://icat.nist.gov>

Schneier, Bruce. Secrets and Lies, Digital Security in a Networked World. Wiley Computer Publishing, NY, NY, 2000

ON THIS DAY | 30 | 1986: Coal mine canaries made redundant. 22 Nov. 2004. <http://news.bbc.co.uk/onthisday/hi/dates/stories/december/30/newsid_2547000/2547587.stm>

Silberman, Peter, and Richard Johnson. “A Comparison of Buffer Overflow Prevention Implementations and Weaknesses” 22 Nov. 2004. <http://www.idefense.com/application/poi/researchreports/display?id=12>

Wilander, John, and Mariam Kamkar. “A Comparison of Publicly Available Tools for Dynamic Buffer Overflow Prevention” 10th Network and Distributed System Security Symposium (NDSS), 2003.

Kolawa, Dr. Adam. “Banish Security Blunders with an Error-prevention Process” 22 Nov. 2004. <http://www.devx.com/security/Article/20678/0/page/4>

Bush, William R., Jonathan D. Pincus, David J. Sielaff. “A Static Analyzer for Finding Dynamic Programming Errors” Intrinsa Corporation, Mountain View, CA, USA

Pincus, Jon. “User Interaction Issues In Detection Tools” 22 Nov. 2004. <http://www.research.microsoft.com/specncheck/docs/pincus.pdf>

Difference between revisions of "Jack"

Revision as of 00:10, 30 November 2004

Contents

Background

Solutions

Machine analysis of source code

Runtime analysis of code

Other Languages

Hybrid Approach

Conclusions

Works Cited

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 17: / Line 17: @@
 If after 15 years of buffer overflows and at least 5 years of focus from software engineers we still haven’t gotten rid of buffer overflows, can we hope for secure software?
-Before we dive into solutions, let’s examine what the causes are of these defects in the first place. Bruce Schneier gives a good analogy of buffer overflows. He describes an imaginary 7-11 start with employees that do everything by the book, literally. They have a book with step by step instructions that they must follow explicitly. Additionally, they can only deal with things in the book. So if they have a form they need to sign, they place it on the book, sign it, then give it back. When the Fed Ex man shows up, they look up in the table of contents and go to the page with instructions on dealing with the Fed Ex man.
+Before we dive into solutions, let’s examine what the causes are of these defects in the first place. Bruce Schneier describes an imaginary 7-11 start with employees that do everything by the book, literally, which is a good analogy of how buffer overflows occur in computers. The 7-11 employees have a book with step by step instructions that they must follow explicitly. Additionally, they can only deal with things in the book. So if they have a form they need to sign, they place it on the book, sign it, then give it back. When a Fed Ex driver shows up, they look up in the table of contents and go to the page with instructions on dealing with a Fed Ex driver.
 Those instructions might look like this (from Schneier):