Q&A with Steven Salzberg ’89 (Ph.D.)

Expert in bioinformatics has helped sequence the genomes of humans, anthrax, and woolly mammoths

August 11, 2011

Facebook Twitter Email LinkedIn

Steven Salzberg '89 performed comparative genomic analysis on anthrax to determine the origin of the strain used in the 2001 postal attacks, assisting the FBI's investigation.

Steven Salzberg ’89 developed an interest in genomics while working toward his Ph.D. in computer science at Harvard. He held three degrees from Yale, in English and computer science, but it was here that he heard about, and became fascinated by, the Human Genome Project.

“I started learning about it in my spare time, by reading whatever I could find on genetics and genomics,” he recalls, “and I spent a semester sitting in the back of Stephen Jay Gould’s lectures on evolution.”

Shortly after receiving his Ph.D., Salzberg found ways to get involved in the Human Genome Project and eventually helped write the 2001 Science paper that unveiled and analyzed the first draft of the human genetic sequence.

Today, Salzberg is a professor of medicine and of biostatistics at Johns Hopkins University. He has previously held faculty positions at Johns Hopkins, where he began his career, the Institute for Genomic Research (now part of the J. Craig Venter Institute), and the University of Maryland.

Over the years, he has sequenced hundreds of species ranging from woodland strawberries to woolly mammoths.

When the FBI was investigating the 2001 postal anthrax attacks, Salzberg used whole-genome analysis to compare the strains found in the letters with other variants. His research helped to identify the source of the Bacillus anthracis Ames strain, leading the FBI to the culprit.

From 2003 to 2006, he also directed a long-term, large-scale study of the evolution of the influenza virus.

Today, Salzberg’s research group develops algorithms and software for sequencing, assembling, and analyzing genomic data. He is a vocal advocate of open-source collaboration.

What makes genomics an exciting field today?

I think the most exciting scientific frontier is mapping out how all the mutations in our genes affect us: how they make each of us more or less healthy, and how we might use this knowledge to change our genetic fate.

What was your role in sequencing the human genome?

I worked with the scientists at Celera Genomics to analyze the first draft of the genome and write the landmark 2001 paper describing it. My work focused on the genes themselves: how many there were, where they were, and what their structure was. I also discovered many large “chunks” of the genome that had been duplicated long ago, in a distant ancestor of humans, and we described that in the paper as well.

Shortly following the publication of the two competing genome papers, I led an effort to disprove one of the main findings in the other paper: that there had been over 200 bacterial genes transferred directly into the human lineage at some time in the past. Their dramatic claim turned out to be wrong.

What does your current research involve?

My lab today is purely computational: we design and implement the algorithms that are used to study genomes. I collaborate very, very closely with sequencing labs and with biomedical scientists. I’m often involved from the very beginning of a project, and I participate in experimental design even though I don’t go into the lab.

Why are open-source software and data important for genomics?

Making genomics software open source lets people focus on the science without worrying about licensing fees or other restrictions.

My group is currently developing a suite of open-source software programs for use on the newest “next-generation” DNA sequence data. Our programs have been downloaded by thousands of scientists around the world who are eager to use them, because next-gen data can be overwhelming without the right tools to analyze it. Sharing software and data helps to accelerate progress in the field.

Additionally, several projects today are aiming to collect and sequence DNA from a large swath of the human population, and these have tremendous potential to advance our understanding of health and disease.

How do you feel about direct-to-consumer genetic testing?

Although it is still very early, and most of the information you’ll get today from a genome scan isn’t very useful, I also think people should be given the tools to look at their own genes if they want them. It might be a waste of money for now, but over time this information will become very valuable.

You’ve never attempted to patent a gene that you sequenced. Why not?

I think gene patents should never have been allowed, and the U.S. Patent Office made a huge mistake when it granted the first one. Patents aren’t inventions; they’re a product of nature. My colleagues and I have discovered literally tens of thousands of genes, in bacteria, plants, and animals, but we didn’t invent any of them. Should we be able to patent them just because we were the first to sequence them? I don’t think so. The U.S. Justice Department recently came around to this same position in a brief filed in federal court, so I think the days of gene patents are numbered.

Last year, to challenge the use of gene patents, Mihaela Pertea and I developed our own (free) software that anyone can use to detect mutations in the BRCA1 and BRCA2 genes. The sequences of those genes, which have been patented by Myriad Genetics, are associated with breast and ovarian cancer. We think it’s wrong to restrict access to potentially life-saving information, particularly when that information comes from your own DNA.

How did coming to Harvard for your Ph.D. help prepare you for your career in bioinformatics?

Having a strong foundation in computer science allowed me to tackle problems that biological scientists couldn’t solve on their own. I tell all my students that if they want to work on the kinds of problems my group studies, they need a strong computational background.

What accomplishments are you most proud of since receiving your Ph.D.?

Starting the influenza genome project with David Lipman (the director of NCBI, the home of GenBank) is one of them. The genome of the flu virus had not been studied on a large scale until we took this on. Within a year we had sequenced over 100 genomes, a number that grew to over 5,000 a few years later. It’s still going on today.

Another technical accomplishment is my group’s development of a computer program called Glimmer, which finds genes in bacteria. Glimmer has been adopted by many scientists around the globe and has been used to identify millions of bacterial and viral genes since we first released it in 1998.

I’m also proud of the work we did helping to track down the source of the anthrax used in the 2001 attacks—work that we had to keep secret for years, until we finally published it earlier this year.

Topics: Computer Science, Bioengineering