Skip to Main Content U.S. Department of Energy
Deep Science Agile Initiative



From Asteroids to Black Holes: Data Science in Astronomy

Daniela Huppenkothen

Daniela Huppenkothen
Associate Director for the Center for Data-Intensive Research in Astronomy and Cosmology (DIRAC)
Data Science Fellow at the eScience Institute
University of Washington

November 8, 2018

Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This has been particularly true for astronomy, where current and future instruments produce data sets of a size and complexity that are impossible to make sense of with traditional methods. In this talk, Daniela will focus on recent research in astronomy and present examples of how we can use modern statistical and machine learning methods to help explore and understand the physical processes underlying a diverse range of phenomena, from the composition and shape of asteroids to the process of black holes eating their stellar companions.

The universal applicability of data science tools to a broad range of problems has also generated new opportunities to foster exchange of ideas and computational workflows across disciplines. Daniela will discuss ways to enable interdisciplinary collaboration in order to solve fundamental problems across multiple domains.

Learning from the Where, When, and Who for Using Human Generated Data

Rumi Chunara

Dr. Rumi Chunara
Assistant Professor of Computer Science, New York University

October 4, 2018

Internet and mobile human-generated data sources can provide high-resolution spatial and temporal views into many societal phenomena, based on precise geo-location and time linked information. However, the impact of this data has yet to be fully realized, in part due to statistical and computational challenges such as: data being shared whenever a person likes ("at-will"), data shared in a spatially irregular manner, and by ad-hoc, non-representative groups. In this talk I will discuss areas in which we are addressing these computational challenges by using data mining and machine learning approaches to learn from the data in combination with when, where and by whom the data is generated, to create high-resolution, practical representations and improve prediction efforts in modeling efforts. Examples will illustrate applications in public health, but be applicable to other domains to support modeling and decision support efforts using this immense amount of observational data to parameterize aspects of our daily lives.

DARPA's Explainable Artificial Intelligence (XAI) Program

Dave Gunning

Dave Gunning
DARPA 120 Program Manager

July 26, 2018

The goal of XAI is to create a suite of new or modified machine learning techniques that produce explainable models that, when combined with effective explanation techniques, enable end users to understand, appropriately trust, and effectively manage the emerging generation of AI systems. Dramatic success in machine learning has led to an explosion of new AI capabilities. These systems will offer tremendous benefits, but their inability to explain their actions to human users will limit the effectiveness of these systems. There is an inherent tension between machine learning performance and explainability; often the highest performing methods are the least explainable, and the most explainable are less accurate. The program is funding a variety of machine learning techniques to provide future developers with a range of design options covering the performance versus explainability trade space. XAI is focusing these developments on addressing challenge problems in two areas: (1) machine learning problems to classify events of interest in heterogeneous, multimedia data, and (2) machine learning problems to construct decision policies for a simulated autonomous system.

Continuous Representation of Language and its Implications

Dr. Kyunghyun Cho

Dr. Kyunghyun Cho
Assistant Professor of Computer and Data Science, New York University

March 9, 2018

In this talk, Dr. Cho discussed some of his research on neural machine translation from the past 2.5 years. Starting from the now-standard attention-based neural machine translation, he will walked through multilingual translation, search engine guided non-parametric neural machine translation, and unsupervised machine translation. Then, he then delved deeper into some of his recent work on decoding algorithms for neural machine translation. Finally, he will briefly touched upon some of the on-going work at his lab, including non-autoregressive neural machine translation and trainable greedy decoding.


Learning a Local-Variable Model of Non-Local Quantum Chemistry

Dr. Josh Swamidass

Dr. Josh Swamidass
Professor, Washington University, St. Louis

December 15, 2017

A collection of new approaches to building and training neural networks, collectively referred to as deep learning, are attracting attention in theoretical chemistry. Several groups aim to replace computationally expensive ab initio quantum mechanics calculations with learned estimators. This raises questions concerning the representability of complex quantum mechanics systems with neural networks. Can local-variable models efficiently approximate non-local quantum chemical features?

We demonstrated that convolutional networks cannot efficiently represent aromaticity and conjugation in large systems. This suggests that convolutional networks may not be the best architectures to efficiently represent emergent, non-local chemical properties. We introduced a new architecture that propagates information back and forth in waves of non-linear computation. This local-variable model is parsimonious, both computationally and representationally efficient. It processes molecules in sub-linear time and models aromatic and conjugated systems more accurately and with far fewer parameters than convolutional networks.

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

Dr. Yanjun Qi

Dr. Yanjun Qi
Assistant Professor, Department of Computer Science, University of Virginia

November 27, 2017

The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Two fundamental challenges still exist: genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and the core aim is to understand what factors are relevant and how they work together.

We presented an attention-based deep learning approach, AttentiveChrome, which uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map.

Coughing, sneezing, and aching online: Twitter and the volume of influenza-like illness in a pediatric hospital

Dr. David M. Hartley

Dr. David M. Hartley
Department of Pediatrics, Cincinnati Children's Hospital, University of Cincinnati College of Medicine

October 12, 2017

Hospitals need early warning of epidemics that may stress or overwhelm emergency departments and urgent care clinics. This talk described a recent study investigating the relation of the incidence of geo-referenced tweets related to respiratory illness to the incidence of influenza-like illness in the emergency department and urgent care clinics of a large pediatric hospital.

Scalable and Efficient Bayesian Inference through Stochastic Gradient MCMC for Independent and Correlated Data

Dr. Yian Ma

Dr. Yian Ma
Postdoctoral scholar at University of California, Berkley

September 19, 2017

Yian Ma presented a complete framework for constructing stochastic gradient MCMC sampling algorithms. A small bias is introduced as a tradeoff for decrease of variance. When the bias is not tolerable, he invented an irreversible version of the Metropolis Hastings algorithm to correct for this bias. Then he will extend a stochastic gradient MCMC algorithm to correlated data. As a first effort, he focused on time series data modeled by hidden Markov models.

Contemporary deep learning has enabled a next generation of artificial intelligence (AI) applications, opening the door to potential breakthroughs in many aspects of our lives. Understanding the capability (and limitations), and improving contemporary artificial intelligence with application to scientific problems, will enable PNNL to advance the frontiers of scientific research and national security.

Deep Learning