Nearly 2,000 years ago, Chinese astronomers recorded what historians believe to be the first documented sighting of a supernova, a star that “dies” in a catastrophic explosion. Technological advances, from the simple telescopes of the early 1600s to the first computer-controlled supernova search in the 1960s, offered new insights into these dramatic astronomical events.

Now, students in the computational science and engineering master’s program offered by the Institute for Applied Computational Science (IACS) at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS), are applying the latest data science techniques to help scientists shed new light on supernovae.

Led by Pavlos Protopapas, Scientific Program Director of IACS, the students were participating in the fifth year of a Harvard research and education collaboration with Chilean researchers and scientists.

Three students traveled to Chile with Protopapas and David Sondak, lecturer in Computational Science, in January to conduct collaborative astronomical research projects with students and faculty from the University of Chile and Pontifica Universidad Católica de Chile. Working in teams with astronomers, mathematicians, and fellow computer scientists, they utilized machine learning to classify and then visualize supernovae.

One of the biggest problems in supernovae classification stems from astronomical data, explained Paul Blankley, S.M. ’19. As a telescope gathers data, that information is organized as a time series, with gaps in the spacing of data points and measurement errors that can make it difficult to classify a supernova until all data has been compiled.

But the Large Synoptic Survey Telescope (LSST), under construction on a remote Chilean mountaintop, will gather terabytes of data each day using the world’s largest digital camera. Storing data until it can be used for accurate classifications would be extremely costly. Blankley and his teammates developed a machine-learning algorithm that could be trained to classify a supernova as data is received, refining that classification as additional data becomes available.

(From left) Javiera Astudillo (U. Católica), Paul Blankley, Nick Ruta, Tracy Hu, Rohan Thavarajah, and Cristóbal Donoso (U. Concepción) work on projects at the intersection of data science and astronomy. (Photo provided by Nick Ruta.)

Interpreting huge amounts of information is another challenge for scientists, and one Nicholas Ruta, M.E. ’20, and Rohan Thavarajah, M.E. ’19, set out to solve using data visualizations. Their team used Blankley’s data to create a bar graph that dynamically represents the probability a supernova falls into each class. The visualization enables researchers to interpret multiple observations at once, and understand how the model is progressing with each one.

“We usually spend a lot of time studying the technical aspects of a project, but there is this whole other element of how to portray that to people and make it useful to scientists in a more meaningful and immediate way,” Ruta said. “It was really fun to delve into that aspect of visualization, the more artistic, creative side of it.”

With only four days to complete their projects, the biggest challenges the students encountered came down to time management. Learning to collaborate with students who had different skill sets and interests, and then working together to pore through massive data sets in such a limited time, tested both their soft skills and computational expertise, Blankley said.

The interdisciplinary nature of this project showcases how data science can be applied to a wide range of problems, Ruta said. Astronomy, with its massive data sets, lends itself to computational solutions, but the students encountered some surprises as they applied their skills to a different discipline.

“What was surprising to me is that, a lot of times, you don’t want the best solution, you want the fastest solution,” Blankley said. “There is no question, when you are dealing with the quantity of data that the LSST will generate, if you come up with a reasonably good, faster solution, you’ll take that every time.”

In addition, the principles of machine learning are so different from many of the foundational concepts in astronomy, where physics rules the day, Thavarajah said. Developing effective models requires a data scientist to think like an astronomer, and understand how real-world constraints impact a computational approach.

“You can train a model, but then you have to adhere it to the physical laws of the universe,” said Thavarajah. “Those two things can sometimes clash a little bit. It was a challenging experience, marrying the two disciplines.”

Working with leading astronomers, and hearing presentations by some of the foremost experts on supernovae, opened the students’ eyes to the depth of astronomical research opportunities available. The opportunity to contribute to research that could reshape scientists’ understanding of the universe was humbling and rewarding, Ruta said.

“This research group is interested in finding unidentified objects in the sky and trying to learn more about them,” he said. “It is very exciting to be working with this group in preparation and helping them have the best tools in place when the whole system goes online.”

For Protopapas, an astrophysicist who developed the collaboration in 2013, it is rewarding to apply domain expertise to teach the students about real-world constraints in data science.

“Integrating the techniques and methods of data science and applying them to astronomical data is a central part of the learning experience for the students,” he said.

Students from Harvard, University of Chile and Pontifica Universidad Católica de Chile celebrate their success on the last day of the collaborative program. (Photo provided by Nick Ruta.)