Caroline

Software companies do not make secure software because consumers do not demand it. This shibboleth of the IT industry is beginning to be challenged. It is dawning on consumers that security matters as computers at work and home are sickened with viruses, as the press gives more play to cybersecurity stories, and as software companies work to market security to consumers. Now, consumers are beginning to want to translate their developing, yet vague, understanding that security is important into actionable steps, such as buying more secure software products, which in turn will encourage companies to produce more secure software. Unfortunately, consumers have no good way to tell how secure different pieces of software are.

Security is an intangible beast. When a consumer buys the latest release of Office, he can see that the software now offers him a handy little paper clip to guide him through that letter he is writing. He understands that Microsoft has made tangible improvements—or not, depending on your perspective—to the software. Security does not lend itself to such neat packaging: Microsoft can say it puts more resources into security, but should the consumer trust this means the software is more secure? Is Microsoft’s software more secure than that of another company? To help the consumer answer these questions, we need an independent lab to test and certify software security. If consumers trusted this lab, they could begin to vote for security with their dollars and companies would have the incentive to work harder at providing it.

This seems to be a good solution: Consumers do not know what security looks like so they’ll trust the computer science experts running the lab to figure it out and tell them. The trouble is, those computer science experts don’t know whether a piece of software is secure either.

Ensuring that a piece of software is secure is very hard because software is extremely complex, with many lines of code as well as many possible interactions and configurations for each environment in which it will be used. Because it is so complex, it is virtually impossible make it mistake-free. Consider how often you see that message asking if you want to report a problem to Microsoft. A mistake in the program caused that message to appear. Security vulnerabilities—a flawed design decision, a small bug in the code—are often tiny mistakes in a huge sea of millions of lines of code. Attackers are constantly probing for these mistakes and, at least with today’s technology, can always stay one step ahead of the defenders.

A lab to test security Software’s complexity makes it difficult not only to avoid making mistakes but also to devise foolproof ways to detect every single one of these mistakes. However, this does not mean we should throw up our hands and lament that a lab to test security could never work because we don’t know how to tell whether a piece of software is secure. Computer scientists have devised proxies they can measure to estimate how secure a piece of software.

First, the lab’s experts could study the plan, or specification, for the software, to see if security features, such as encryption and access control, have been included in the software. They can also look at the software’s threat model, its evaluation of what threats the software might face and how it will deal with those threats.

Lab experts can look at the process by which the software developers created the software. Many computer scientists believe that the more closely a software development process mirrors the formal step-by-step exactitude of a civil engineering process the more secure the code should be. So the lab experts can evaluate if the development team documented its source code, taking careful notes so others can understand what the code does. They can see if a developer’s code has been reviewed by the developer’s peers. They can evaluate if the software has been developed according to a reasonable schedule and not rushed to market. They can look at how well the software was tested. The hope then is if the developers take measures like these, the code is likely to have fewer mistakes and is thus more secure.

The lab’s experts might also run tests on the software themselves. They might run programs to check the source code for vulnerabilities that we know occur frequently. Or they might employ human testers to probe the code for vulnerabilities.

Finally, the lab’s experts could follow the software after it has been shipped, tracking how often patches to fix vulnerabilities are released and how long it takes the company to release patches.

Reasonable tests for estimating security So the first decision an independent lab would have to make would be which of these methods it would choose to evaluate security. This test must produce a result that enough people believe has merit. Some computer scientists argue that we know so little about how to produce secure software that it is premature for a lab to attempt to test our ability to do so. They say we can make sure there are security features in the specs, that the process by which the software was built produced something reasonably close to the spec with as few bugs as possible, and that it was rigorously tested, and even after all this, we still can’t expect that our software is more secure than if we had not taken these measures. It just takes one attacker to find one new vulnerability to cause devastating problems, they say. Other computer scientists will agree that we will never be 100 percent sure that our software is secure, but that it has to be better to test what we can than to do nothing at all.

Which of these methods, then, should the lab use? Ideally, if the only consideration was to do the best job possible in estimating whether the software is secure, the lab would look at all of them. But in the real world, the lab will face constraints such as the time and money it takes to evaluate the software, and access to proprietary information such as source code and development processes. In the face of such constraints, computer scientists disagree about which proxy is best to measure.

Once the lab has decided on how it will measure security, it must consider several other elements that go into making a successful lab.

Meaningful ratings: Once the lab decides on which proxy it will look at, it needs to devise a way to convey information about its findings to consumers. Any rating system for the software must say how secure the lab thinks the software might be, but at the same time, not give users a false sense of security, so to speak. The lab should make clear that the ratings are simply an attempt to measure proxies for security.

In addition, the metrics the ratings system uses to describe how secure a piece of software is must make sense to an average consumer and at the same time map to metrics intelligible to computer scientists. At the very least, the lab might use a pass/fail system which would make it easy for consumers to understand. The passing grade would note the software had performed satisfactorily on a series of requirements that computer scientists and IT professionals would understand. Slightly more elaborate schemes might include rating the software “approved for business use,” or “approved for consumer use,” or rating the software on a scale.

Critical mass of participating companies Companies will only care about getting their software rated if many of their competitors were participating. Therefore, a critical mass of companies would need to participate in the certification process for it to be useful. There are various ways to convince companies to participate: industry leaders might take the iniative and participate for the benefit of the industry as a whole or government policy might mandate that certain government agencies only use software that has been certified.

Educated consumers: Consumers would need to know learn and value the ratings. Press attention, the endorsement of the ratings by leading industry figures and computer scientists, and more consumer education on why security is important would help.

Expense: It must not be prohibitively expensive to submit your software for review so the smaller players in the industry can afford to be included.

Time: As new releases in software come with some regularity, the review process must take place quickly enough for the software to have time on the market before its next version comes out.

Accountability and independence Finally, the lab must be accountable and independent. It needs to be held responsible for its scoring process. And it must be able to evaluate software without caring whose software it is evaluating and who is funding the evaluation.

Existing labs

The Common Criteria labs The Common Criteria is the rating system for software security most commonly in use today. The Common Criteria labs evaluate the software’s high-level security features, such as encryption, access control and authentication. Organizations submit their products to labs charged with evaluating the software against security requirements specified by the Common Criteria: software is submitted to be evaluated at level 1 to 7. The level the company chooses to have the product evaluated at depends on how extensive the security features it has chosen for the product are. Commercial software products such as Windows are typically not evaluated for security above a level 4: the upper levels are reserved for critical systems such as those used in defense.

Critics of the Common Criteria argue that its certification process needs improvement. Some argue the evaluation of the Common Criteria’s chosen security proxy could be better: They say the way the Common Criteria labs evaluate the security features of the software relies too much on documentation to be effective. Others note that the process takes too long—evaluations need to be done before the next release of the software under evaluation—and is too expensive, which bars all the but the biggest companies from having their software evaluated. Consumers also know nothing about the Common Criteria. Whether educating them about it would be effective is uncertain as the particularities of each ranking can baffle even computer scientists. Finally, Jonathon Shapiro, professor of Computer Science at Johns Hopkins University, points out that the labs are not as independent and accountable as they could be: his evidence for this is that [SV edit: add "companies"?] are playing the labs off each other for favorable treatment.

Cylab The Cylab is an iniative of Carnegie Mellon University in partnership with leading IT companies including Microsoft, Oracle, Cisco and Hewlett Packard. The lab is developing a plan to run automated checking tools—small programs that run against a piece of software’s source code—to catch three common security vulnerabilities: buffer overruns, integer overflows and input validation.

According to Larry Maccherone, director of the Cylab, 85 percent of the vulnerabilities cataloged in the CERT database, a database run by CMU’s Software Engineering Institute which tracks most known security vulnerabilities, are of the kind that the Cylab’s tools will test for.

As the Cylab settles on a plan for testing its chosen security proxy, it must also focus on the other elements that go into a successful software security test lab. The details on many of these are still being worked out. For the software ratings to be meaningful, Larry argues, they must be very simple, a pass/fail system. Consumers will respond to a pass/fail rating, he believes, and at this early stage, anything more complex will be useless. The first iteration of the lab should set the bar very low for receiving a “certified” rating: as our ability to test software improves and as more vendors participate, the bar can be raised. Consumers will likely value these ratings if major players in the software industry subject their software to the lab’s verification process. Engaging a critical mass of companies to do so should not be a problem, the Cylab believes. The industry giants who are partners in the lab would set the example and smaller vendors would follow. Microsoft has every interest in promoting industry-wide independent evaluations, Maccherone says, after all two-thirds of the “blue-screen of death” incidents, where the operating system crashes in Windows NT were caused by third-party drivers.

The Cylab also needs to consider the cost in time and money involved in running these tests. It might be that with automated testing, the financial cost could be less than with other ways of evaluating software: The system runs by itself with little human intervention and humans are only needed to analyze the results. However, a problem with these test is false positive results. How much does handling these false positives increase the costs in time and money?

Using automated tools to test the software might alleviate some of the problems with accountability and independence that critics contend the Common Criteria labs have. Tools provide more objective analysis than any human could.

Conclusion: leading into next section?

Bibliography:

Interviews with Larry Maccherone, director Cylab (dates will come)
Interview with Jonathan Shapiro, professor of Computer Science, Johns Hopkins
Interview with Stuart W. Katzke, Ph.D. Senior Research Scientist, National Bureau of Standards and Technology
Interview with Gary McGraw, chief technology officer Citigal,
Interview with Eugene Spafford, professor of Computer Science, Purdue
Email exchange with Hal Varian, professor Economics, UCB
Interviews with Steve Maurer, professor, Economics, UCB

Caroline

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools