Scaling computation over parallel and distributed computing systems is a rapidly advancing area of research receiving high levels of interest from both academia and industry. The objective can be for high-‐performance computing and energy-‐efficient computing (“green” data center servers as well as small embedded devices). In this course, students will learn principled methods of mapping prototypical computations used in machine learning, the Internet of Things, and scientific computing onto parallel and distributed compute nodes of various forms. These techniques will lay the foundation for future computational libraries and packages for both high-‐performance computing and energy-‐efficient devices. To master the subject, students will need to appreciate the close interactions between computational algorithms, software abstractions, and computer organizations. After having successfully taken this course, students will acquire an integrated understanding of these issues. The class will be organized into the following modules: Big picture: use of parallel and distributed computing to achieve high performance and energy efficiency; End-‐to-‐end example 1: mapping nearest neighbor computation onto parallel computing units in the forms of CPU, GPU, ASIC and FPGA; Communication and I/O: latency hiding with prediction, computational intensity, lower bounds; Computer architectures and implications to computing: multi-‐cores, CPU, GPU, clusters, accelerators, and virtualization; End-‐to-‐end example 2: mapping convolutional neural networks onto parallel computing units in the forms of CPU, GPU, ASIC, FPGA and clusters; Great inner loops and parallelization for feature extraction, data clustering and dimension reduction: PCA, random projection, clustering (K-‐means, GMM-‐EM), sparse coding (K-‐SVD), compressive sensing, FFT, etc.; Software abstractions and programming models: MapReduce (PageRank, etc.), GraphX/Apache Spark, OpenCL and TensorFlow; Advanced topics: autotuning and neuromorphic spike-‐based computing. Students will learn the subject through lectures/quizzes, programming assignments, labs, research paper presentations, and a final project. Students will have latitude in choosing a final project they are passionate about. They will formulate their projects early in the course, so there will be sufficient time for discussion and iterations with the teaching staff, as well as for system design and implementation. Industry partners will support the course by giving guest lectures and providing resources. The course will use server clusters at Harvard as well as external resources in the cloud. In addition, labs will have access to state-‐of-‐the-‐art IoT devices and 3D cameras for data acquisition. Students will use open source tools and libraries and apply them to data analysis, modeling, and visualization problems.