Trusting the system: Innovations for an insecure world

Stephen Chong, associate professor of computer science, believes that minimizing catastrophic security breaches requires looking at systems from a different, and more philosophical, point of view.

"Ideally, you would like to trust as few things as possible, because the fewer things you trust, the fewer things can hurt you."

That sounds, perhaps, like the philosophy of a person who’s been hurt one too many times. Yet Stephen Chong, associate professor of computer science, is articulating the problem at the heart of information security: one breach is all it takes. One weakness in a trusted technology is enough to damage a company’s reputation, devastate personal privacy, or even destabilize a nation’s infrastructure.

In 2014, a data breach at Home Depot exposed information relating to an incredible 56 million credit and debit cards. A mysterious hack at Sony exposed tens of thousands of Social Security numbers, leaked five previously unreleased movies, made employees’ private emails public, and raised hackles among international leaders. The Heartbleed bug—a vulnerability in the Open Secure Sockets Layer (OpenSSL) encryption used on much of the Web—allowed hackers to eavesdrop and gain access to passwords, cookies, and other private information on half a million websites. In other contexts, where lives are on the line—military and medical settings, for example—security breaches can be catastrophic.

Ideally, you would like to trust as few things as possible, because the fewer things you trust, the fewer things can hurt you.

Even when the risks are limited to embarrassment or financial loss, it’s easy to understand Chong’s preoccupation with keeping to a minimum the elements in which we are required to place trust.

“Maybe we can actually come up with techniques that are so strong that it’s no longer about believing or hoping that a piece of software is going to do the right thing,” he suggests. “Maybe we can prove in a formal way that it is going to do the right thing. Then, you could say, we don’t need to trust it at all.”

Through mathematical and computational approaches, researchers at Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) are going way beyond plugging holes in aging systems. They are fundamentally rethinking what it means to trust the system at all.

Their work raises broad questions: Can we guarantee that a program does what it’s told? How do we know what happens out of sight? How much of “doing the right thing” depends on technology, and how much on human behavior, ethics, law, or policy? How much risk can we tolerate? And what does it really mean to make a system secure, to handle data correctly, or to keep personal information private?

These questions transcend traditional academic fields, so computer scientists at SEAS collaborate with their colleagues across the University—at Harvard Law School (HLS), in the social sciences, in statistics, and beyond.

Their thinking is contributing to the technical, legal, and policy underpinnings of the evolving and ever more powerful systems that sustain health care, business, government, and modern life’s everyday activities. But if these new systems are to be acceptably secure, the philosophical questions need answers too.

New ideas for a changing world

Twenty-five years after the Internet first became a household utility, the United Nations and McKinsey estimate that it connects approximately 3 billion people and that almost $8 trillion a year changes hands through e-commerce.

Greg Morrisett, Allen B. Cutting Professor of Computer Science, recalls a simpler time. When pioneering coders built many of the systems in place today, he says, their expectations and the technological context were very different.

“A lot of this old code was developed in the days when the Internet connected a few universities and maybe the military, but it wasn’t worldwide, and people in the universities were pretty trustworthy,” says Morrisett, who will become Dean of the Faculty of Computing and Information Science at Cornell University in the fall. “You’d have some crazy kids hacking into stuff, but it wasn’t bad. When money became a factor, all of a sudden things changed.”

Today, spam and phishing attempts are an everyday nuisance, and the Internet is the medium of choice for anonymous retaliation, ransom schemes, and stock-market manipulation.

“The Net enables this in terms of geography—it could be a kid in Bulgaria that’s hacking into your system—and there’s also a degree of scale in the attacks,” Morrisett says. “There are 2 billion smartphones out there now. If I could break into 1 percent of that, my God—that’s 20 million smartphones. You could do a lot of damage.”

And the hackers are no longer just private individuals, but nation-states, too. In 2009, the Stuxnet virus, covertly embedded by the United States and Israel into industrial computer systems, disrupted the uranium enrichment process in Iran. In August 2012, another virus wiped three-quarters of the computers at Saudi Aramco, the world’s biggest oil company. After the Sony attack in 2014, the United States blamed North Korea.

“Anywhere there’s conflict, there’s now cyber-conflict as well,” Morrisett says. Independent hackers with financial interests are one thing—“Often you can find a chokehold for them that doesn’t have anything to do with the technical stuff. You can go after the banks where they’re clearing their money, for instance. But if China wants to break into my laptop, China’s going to break into my laptop.

“That’s scary,” he says, “because the next iteration of warfare is going to involve a lot of drones and robots, missiles, ships, aircraft, and devices that we can’t even imagine, all controlled by software. It’s just terrifying to think that all of that could be vulnerable.”

In that world, building higher walls and stronger systems isn’t necessarily enough. “Security isn’t a goal, you know? It’s like saying, ‘Okay, we’re going to solve warfare.’ If you put up some defensive Maginot Line, well, somebody’s going to blow past it with a Blitzkrieg. It’s a constant evolution.”

What we need, Morrisett says, is “game-changing” research.

Inherited insecurity

Security expert Bruce Schneier spoke at SEAS in January, in conversation with Edward Snowden, the former National Security Agency (NSA) systems administrator who leaked classified records of NSA surveillance efforts. (Snowden called in from Moscow via Google Hangouts.)

“The NSA has to balance two different focuses: defend our networks and attack their networks,” said Schneier, a fellow at Harvard’s Berkman Center for Internet and Society. “Those missions, I think, made a lot more sense during the Cold War when you could defend the U.S. radios and attack the Soviet radios, because the radios were different. It was ‘us’ and ‘them.’”

The trouble now, Schneier said, is that “we’re all using the same stuff. Everyone uses TCP/IP, Microsoft Word, Firefox, Windows computers, Cisco routers … Whenever you have a technique to attack their stuff you are necessarily leaving our stuff vulnerable. And conversely, whenever you fix our stuff you are fixing their stuff.”

Those vulnerabilities—the ones we know about—are myriad. Moreover, weaknesses can lurk in software for a very long time before being found.

Bash, for example, is a shell program that was first created in 1989 to execute commands on the Unix operating system. Respected and freely distributed, it became a key, trusted part of Linux, OS X, and other widely used systems. Yet it was only in September 2014 that a severe vulnerability was discovered and announced, sparking hundreds of thousands of cyberattacks per day by the end of that month. Shellshock, as the bug is now known, had been there for 25 years, unnoticed.

“Many security problems arise in legacy code and legacy systems, which were developed in an age when we trusted everybody and they weren’t all connected to the Internet,” Morrisett laments. “How do we secure old code that’s lying around and just has these ticking time bombs like Shellshock did?”

He recently wrapped up a project addressing that question with computer security researchers from BAE Systems, the University of Pennsylvania, Northeastern University, and Harvard, funded by the Defense Advanced Research Projects Agency (DARPA). The premise of the project, called CRASH/SAFE, was that a dramatically different, clean-slate approach to machine architecture and software design could be the most secure and resilient solution.

“What if we threw away the existing hardware and software systems and started from scratch,” he asks. “And you think, ‘Well, that’s ridiculous.’ It’s very hard to throw away the 50 million or 100 million lines of code in Windows and just start from scratch. But when the iPads and iPhones came out, they were using a new operating system in a completely different environment, where the apps had to be curated and purchased through the store. So actually, there is much more potential to apply these techniques in new environments than there is on the old legacy code that’s been around forever.”

So how do we stop these bugs from creeping into software, allowing attackers to gain a foothold?

The CRASH/SAFE project was intended as speculative, experimental work, but Morrisett, Chong, and others at SEAS are also developing new techniques that can be applied to legacy systems, to restore confidence in the short term.

“Given what we learned from Snowden, the math and computer science communities started to wonder if the NSA had deliberately weakened U.S. cryptosystems with ‘back-doors’ that would allow them to more easily decrypt messages,” Morrisett says. “Unfortunately, that means that (a) bad players within NSA might be reading U.S. secrets that they should not, and (b) a bad guy might discover that back-door and use it to break our crypto systems.”  

There is an urgent need, therefore, for new programs that can either look for holes in critical systems (either during or after development) or formally prove that an algorithm is secure.

“If you look at most of the flaws that have led to security breaches,” Morrisett says, “they haven’t actually been so much at the algorithmic level as at the level of mistakes in the code. There’s something about translating the high-level ideas into the details to be executed on the computer that can lead to numerous flaws. So how do we stop these bugs from creeping into software, allowing attackers to gain a foothold?”

One solution is to assume the program isn’t trustworthy—that is, to wall it off from the rest of the system and give it only limited privileges. Chong demonstrated a version of this approach in a project called Shill in October 2014, with graduate students Scott Moore and Dan King, and postdoctoral fellow Christos Dimoulas. Shill, a secure shell scripting language, provides a way to run scripts in a kind of sandbox, forcing the software to act only within the bounds of an explicit set of granular permissions. This solution can be applied to legacy applications—no software has to be checked or rewritten—but requires a user knowledgeable enough to judge which permissions are safe to grant.

Another approach to uncertainty is to prove that the program is trustworthy. RockSalt, a system created in Morrisett’s Harvard research group in 2012, relies on multiple layers of oversight. Developed by Edward Gan ’13 and Joseph Tassarotti ’13, former postdoc Jean-Baptiste Tristan (now at Oracle), and Gang Tan at Lehigh University, RockSalt automatically checks and mathematically proves that a piece of code respects a given security policy.

“In practice, these proofs can be huge, so how do you check them? Well, you write a little program to check the proof,” Morrisett explains. “Hopefully that little program is smaller and simpler than the thing you’re trying to prove, because of course, if you have a bug in that, well—okay, you can see it’s turtles all the way down. But the great thing about these proofs, as a concept, is that there’s no need to trust anybody.”

“The big research problem here is that building these proofs is very hard—really, really hard,” he adds. “What we need is a discipline of ‘proof engineering’ the way we’ve had software engineering.”

Whose responsibility?

When Schneier and Snowden spoke at SEAS in January, their conversation was part of a daylong symposium on “Privacy in a Networked World.” Wrapping up the event with a panel discussion, former Deputy Chief Technology Officer for Internet Policy in the Obama White House Daniel Weitzner again challenged the notion that technology can “solve” security.

“What’s the right balance between relying on provably secure cryptographic techniques on the one hand, and law and institutions on the other?” Weitzner asked. “Underneath that [reliance on technology] is, in some part, a view that says we can’t actually quite trust our government or our legal system or our institutions in the private sector anymore; we need this kind of cryptographic guarantee.”

Unavoidably, though, humans do have to be involved in technology design and implementation. So at what level, at what stage in product development or law enforcement should security be engineered into the design? And who should we trust to make key decisions? Why would anyone do it right, when not doing it right is usually easier, faster, and cheaper?

As Harvard Chief Technology Officer and Gordon McKay Professor of the Practice of Computer Science James Waldo would say, that’s a primate problem, not a technology problem.

“You always need to pay something for security,” says Chong. When you have to type in a password before you’re allowed to access something, it’s reassuring but inefficient. “The same is true when we build secure systems,” he says. “It costs us. Writing programs that are secure doesn’t make them do anything fancier. It’s not like a secure version of Microsoft Word is any better at spell checking or formatting your document. It’s about the stuff that doesn’t happen.”

More often than not, it takes a breach to spur action. In 2001, for example, Microsoft’s servers and software were attacked by a series of successful worms—Code Red, Nimda, and Klez, among others. At the time, Chief Executive Officer Bill Gates “sent out a memo that said, ‘Okay, we’re stopping production. We’re going to figure out a better way to write code,’” recalls Morrisett, who has served on the company’s Trustworthy Computing Academic Advisory Board since 2002.

“They’ve actually done a very good job,” Morrisett says, referencing the Security Development Lifecycle, a process made mandatory at Microsoft in 2004. “They were able to do that because somebody like Gates could dictate that every developer needs to be trained in secure coding practice and every piece of code that gets checked in needs to be run through these tools. You also see that in banks. When there’s a clear hierarchy, somebody can say, ‘Coders, you shall do X.’”

On the other hand, as Chong notes, “Startups are not necessarily incentivized to prioritize security. If you think about the earlier versions of Facebook and how it’s evolved—being able to control the confidentiality of your information and understand who’s going to be able to see it—that was definitely not the highest priority when Facebook was initially there.”

So, Chong asks, “Can we make the cost of security reasonable by developing techniques that require only a little extra effort above what the developer needs to do already, and making that extra effort for security be well aligned with the developer’s goals for functionality?”

Can we be “smarter” while protecting privacy?

As Schneier pointed out, vulnerabilities in consumer electronics and web services can easily become national security issues these days. This is an area where the Federal Trade Commission is increasingly taking an interest. In a speech at the International Consumer Electronics Show in Las Vegas in January 2015—one month before the White House launched a new cybersecurity initiative—FTC Chairwoman Edith Ramirez (AB ’89, J.D. ’92) urged product designers to “prioritize security and build security into their devices from the outset.”

But Ramirez’ concern goes beyond information security. In a world with 25 billion networked devices—the burgeoning Internet of Things—the risks to personal privacy are also unprecedented.

“Your smart TV and tablet may track whether you watch the History Channel or reality television,” she said, “but will your TV-viewing habits be shared with prospective employers or universities? Will they be shared with data brokers, who will put those nuggets together with information collected by your parking lot security gate, your heart monitor, and your smart phone? And will this information be used to paint a picture of you that you will not see but that others will—people who might make decisions about whether you are shown ads for organic food or junk food, where your call to customer service is routed, and what offers of credit and other products you receive?” 

As businesses inexorably collect more and more data from connected devices and use that data to segment consumers, she cautions that the trend has the potential to “exacerbate existing socioeconomic disparities.”

Faced with this cascade of complex, unpredictable, and potentially devastating outcomes, a consumer’s decision to buy a fitness tracker or a smart watch suddenly takes on more significance.

“We don’t want to be manipulated as consumers, but that’s in tension with getting personalized services,” says Salil Vadhan, Vicky Joseph Professor of Computer Science and Applied Mathematics. “In the physical world, we’re used to the idea that we can have independent interactions with different merchants and service providers. When you go into one store and then into another, you count on the idea that whatever interactions you might have had in the first store are forgotten, and you’re having a fresh interaction.”

Now, personalized products and services are demolishing the concept of the store altogether. The “next big thing” is a chimera of your credit card, your diary, a nutritionist, a GPS tracker, a sleep monitor, a continuous feed of communications with your boss, your best friend, and your mother—and another live feed of rich data going back to the companies enabling such innovative efficiency.

“If users don’t understand how these services handle private data then they will become very reluctant to use them,” says David Parkes, George F. Colony Professor of Computer Science, Harvard College Professor, and Area Dean for Computer Science.

Parkes and Yiling Chen, Gordon McKay Professor of the Computer Science, both conduct research that applies insights from economics and game theory to the development of artificial intelligence and multi-agent systems. Among other topics, Parkes and  Chen study the future of personalized services and the incentives that may encourage consumers to use them.

Parkes gives as an example an intelligent system that helps care for people in their old age.

“Imagine a technology that could surround you during your life, understand you, understand what’s important to you, understand how you might make decisions in regard to your own care, or who you want to see on a particular day,” he suggests. “How you want your room laid out. How you’d like your meals prepared. How your possessions get distributed after death. Maybe you get to a point where you can’t clearly communicate this anymore. Think of that as a challenge for artificial intelligence, where the AI gets to observe your interactions all the time and steps in as your advocate when you’re no longer able to take care of yourself.”

It’s a powerful idea, but as Parkes acknowledges, there are massive obstacles to overcome. Even if AI becomes that advanced, he says, “Unless we can solve the privacy concerns, people are not going to want technology to do that.”

Storing personal data locally, without sharing it, isn’t a practical option for consumer devices, because the value of the data to third parties is often what makes the technology affordable (and profitable). Even if the data were stored locally, there is a deeper problem.

“If the purpose of the AI is to do something for me, then as soon as it starts acting on my behalf based on who it thinks I am, it’s revealing something about me,” Parkes explains.

If your AI advocate sees something that you might like to buy—based on your history—and communicates that to someone else, it’s revealing information about your overall preferences. A computer that can negotiate on your behalf will, in the process, reveal something about your values, your priorities, and your weaknesses.

“It’s a very difficult problem to know how to tell the computer what’s okay to share and what’s not, and what the tradeoffs are between value and privacy,” Parkes says. “We don’t know how to do this right now.”

If the goal of AI is to imbue technology with human values and the ability to make decisions just as well as a human would, there’s yet another snag. Humans don’t always make great decisions.

“People are terrible at estimating the costs of privacy or the value of privacy,” says Chong. “Same with security.”

If an email provider serves up advertisements based on topics mentioned in personal messages, but offers a clean and intuitive interface, do users really consider the privacy policy before creating an account? When a company starts to offer two-factor authentication rather than a single password, what does it take for customers to sign up?

“Often, we tend to drift over time towards the insecure,” Chong says. “Maybe you choose an easy-to-remember password because, hey, nobody’s ever broken into your bank account before. This is why end users are perhaps not the right people to be making some security decisions.” On the other hand, he adds, “Adobe shouldn’t decide whether it’s safe for you to run the installer for an Adobe product on your machine.”

We want security, but we also want convenience. We want personalized services, but we also want privacy. We want surprising technologies that encourage us to change our lives—without risking what’s already safe.

“As I see it,” Vadhan says, “one of the roles computer science has to play in these kinds of questions is to change the curve of tradeoffs, to help achieve two objectives that seem in tension, to make them more compatible by doing things that, without an understanding of technology, you might have thought impossible.”

Protecting privacy vs. advancing research

The two competing interests in Vadhan’s own research are formidable. 

Vadhan, who served as director of the Center for Research on Computation and Society (CRCS) at SEAS until August 2015, leads a project called “Privacy Tools for Sharing Research Data.” Funded primarily by the National Science Foundation and by the Sloan Foundation and Google, the project operates on the premise that when a huge amount of time and money have been invested in collecting data—such as the family histories and genomes of people with cancer—that data should then be made available to researchers whose studies have the potential to derive societal benefit.

On one hand, certain types of personal information enjoy strict legal protections and cannot be shared without the explicit consent of the individual in question. For example, medical records are confidential in many contexts under the Health Insurance Portability and Accountability Act (HIPAA), and educational records are covered by the Family Educational Rights and Privacy Act (FERPA). Thousands of other laws at the state and local levels add layers of complexity. Institutional review boards (IRBs) at universities and hospitals also enforce ethical standards in research involving human subjects.

All with good reason.

On the other hand, the insights that could be gleaned by sharing data, rather than keeping it locked away, are compelling. The discoveries that could help diagnose and precisely treat cancer, predict Alzheimer’s disease before symptoms arise, identify risk factors for obesity, or prevent students from dropping out of school may not occur without large-scale, cross-referenced studies.

Even though research subjects may be promised anonymity, often sufficient steps are not taken to protect their identities—something computer scientists have demonstrated many times in the past.

For example, in 2000, Latanya Sweeney (ALB ’95) famously analyzed data from the 1990 census and revealed that 87 percent of the U.S. population could be uniquely identified by just a zip code, date of birth, and gender. In January 2015, the Institute for Advanced Computational Science at SEAS challenged students to re-identify individuals in three “anonymous” data sets on contraception use, income levels, and the health records of patients with diabetes. The students did so handily, using statistical inference and by cross-referencing the data with other publicly available records.

These examples illustrate that taking all of the names, addresses, and Social Security numbers out of a sensitive data set may not be enough to protect the subjects’ identities. “A coworker, an employer, a nosy neighbor, or an insurance company that already knows something about you—to them, you may not be anonymous,” Vadhan warns. “They might very well be able to figure out who you are.”

So Vadhan and his collaborators from CRCS, Harvard’s Institute for Quantitative Social Science, the Berkman Center at HLS, and the MIT Libraries’ Program on Information Science are building a new type of data repository, which will ensure that privacy is protected while enabling the benefits of data sharing. One of their approaches is based on a technique called “differential privacy” that has developed in just the last 10 years.

“What you typically want to do as a researcher is to learn the properties of a population, and your data set is just a sample of that population,” says Vadhan. “We want to allow aggregate information to be released and individual information to be protected, and we want this to hold regardless of how much a potential adversary knows about the individuals already in a data set.”

Differential privacy uses statistical and computational methods to mediate access to the data through an interface where the researcher can perform queries. The system introduces a precise amount of “noise” into the aggregate data it releases. This allows it to mathematically guarantee that the individual subjects are protected by a certain standard of privacy, and that the results meet a high standard of accuracy. Data can be safely released through a series of queries—up to a point, depending on the size of the database, when access must then be restricted.

The process of building this system continually raises new legal and policy questions around researchers’ obligations and ethics, as well as new theoretical questions in computer science.

“Much of what motivates me is understanding the limits of computation,” says Vadhan. “In addition to examining what’s currently practical, I want to delineate the border between what’s possible and impossible in the long term, as both computation and data increase in scale.”

Counterintuitive Consquences of Design Choices

Improving privacy and security will mean developing stronger systems, providing the right incentives to implement them, understanding their limitations, and anticipating the unpredictable.

But as Sweeney noted at SEAS in January, “Privacy and security are just the beginning. Every fundamental value that Americans have is being redefined by technology design. It’s the choice of the design, the choice of how the technology works, that is going to challenge every law we have.”

“It’s not going to be a technique or a tool that’s going to save the day, and it’s not going to be a law,” she said. “I’ve grown to dislike this idea that I have to balance these things, to choose one or the other.”

The best solutions arise, she said, when you blend them.