During the Fall 2020, the CMSA will be hosting a seminar on Computers and Math, organized by Michael Douglas. This seminar will take place on Wednesday from 3:00 – 4:00pm. There will also be a group meeting on Monday from 9:30 – 10:30am. Both of these meetings will take place virtually. To learn how to attend, please fill out this form, or contact Michael Douglas (email@example.com),
The schedule below will be updated as talks are confirmed.
|9/16/2020||William Hamilton, McGill University and MILA|
|Title: Graph Representation Learning: Recent Advances and Open Challenges|
Abstract: Graph-structured data is ubiquitous throughout the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases into deep learning architectures is crucial if we want systems that can learn, reason, and generalize from this kind of data. Recent years have seen a surge in research on graph representation learning, most prominently in the development of graph neural networks (GNNs). Advances in GNNs have led to state-of-the-art results in numerous domains, including chemical synthesis, 3D-vision, recommender systems, question answering, and social network analysis. In the first part of this talk I will provide an overview and summary of recent progress in this fast-growing area, highlighting foundational methods and theoretical motivations. In the second part of this talk I will discuss fundamental limitations of the current GNN paradigm and propose open challenges for the theoretical advancement of the field.
|9/23/2020||Andrea Montanari, Departments of Electrical Engineering and Statistics, Stanford |
|Title: Self-induced regularization from linear regression to neural networks|
Abstract: Modern machine learning methods –most noticeably multi-layer neural networks– require to fit highly non-linear models comprising tens of thousands to millions of parameters. Despite this, little attention is paid to the regularization mechanism to control model’s complexity. Indeed, the resulting models are often so complex as to achieve vanishing training error: they interpolate the data. Despite this, these models generalize well to unseen data : they have small test error. I will discuss several examples of this phenomenon, beginning with a simple linear regression model, and ending with two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory. I will try to extract a unifying picture.
A common feature is the fact that a complex unregularized nonlinear model becomes essentially
equivalent to a simpler model, which is however regularized in a non-trivial way.
[Based on joint papers with: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]
|10/7/2020||Marinka Zitnik, Department of Biomedical Informatics, Harvard|
|Title: Subgraph Representation Learning|
Abstract: Graph representation learning has emerged as a dominant paradigm for networked data. Still, prevailing methods require abundant label information and focus on representations of nodes, edges, or entire graphs. While graph-level representations provide overarching views of graphs, they do so at the loss of finer local structure. In contrast, node-level representations preserve local topological structures, potentially to the detriment of the big picture. In this talk, I will discuss how subgraph representations are critical to advance today’s methods. First, I will outline Sub-GNN, the first subgraph neural network to learn disentangled subgraph representations. Second, I will describe G-Meta, a novel meta-learning approach for graphs. G-Meta uses subgraphs to adapt to a new task using only a handful of nodes or edges. G-Meta is theoretically justified, and remarkably, can learn in most challenging, few-shot settings that require generalization to completely new graphs and never-before-seen labels. Finally, I will discuss applications in biology and medicine. The new methods have enabled the repurposing of drugs for new diseases, including COVID-19, where predictions were experimentally verified in the wet laboratory. Further, the methods identified drug combinations safer for patients than previous treatments and provided accurate predictions that can be interpreted meaningfully.
|10/14/2020||Jeffrey Pennington, Google Brain|
|Title: Triple Descent and a Fine-Grained Bias-Variance Decomposition|
Abstract: Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, striking a balance between simpler models that exhibit high bias and more complex models that exhibit high variance of the predictive function. However, such a simple trade-off does not adequately describe the behavior of many modern deep learning models, which simultaneously attain low bias and low variance in the heavily overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which are too coarse to reveal important nuances of actual neural networks. In this talk, I will describe a precise high-dimensional asymptotic analysis of Neural Tangent Kernel regression that reveals some of these nuances, including non-monotonic behavior deep in the overparameterized regime. I will also present a novel bias-variance decomposition that unambiguously attributes these surprising observations to particular sources of randomness in the training procedure.
|10/21/2020||Marinka Zitnik, Department of Biomedical Informatics, Harvard||Title: Subgraph Representation Learning, part 2|
Continuation from Oct 7.
|10/28/2020||Boaz Barak and Yamini Bansal, Harvard University Dept. of Computer Science |
|Title: Generalization bounds for rational self-supervised learning algorithms, or “Understanding generalizations requires rethinking deep learning”|
Abstract: The generalization gap of a learning algorithm is the expected difference between its performance on the training data and its performance on fresh unseen test samples. Modern deep learning algorithms typically have large generalization gaps, as they use more parameters than the size of their training set. Moreover the best known rigorous bounds on their generalization gap are often vacuous.
In this talk we will see a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a complex representation of the (label free) training data, and then fitting a simple (e.g., linear) classifier to the labels. Such classifiers have become increasingly popular in recent years, as they offer several practical advantages and have been shown to approach state-of-art results.
We show that (under the assumptions described below) the generalization gap of such classifiers tends to zero as long as the complexity of the simple classifier is asymptotically smaller than the number of training samples. We stress that our bound is independent of the complexity of the representation that can use an arbitrarily large number of parameters.
Our bound assuming that the learning algorithm satisfies certain noise-robustness (adding small amount of label noise causes small degradation in performance) and rationality (getting the wrong label is not better than getting no label at all) conditions that widely (and sometimes provably) hold across many standard architectures.
We complement this result with an empirical study, demonstrating that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet, including SimCLR, AMDIM and BigBiGAN.
The talk will not assume any specific background in machine learning, and should be accessible to a general mathematical audience. Joint work with Gal Kaplun.
|11/4/2020||Florent Krzakala, EPFL|
|Title: Some exactly solvable models for machine learning via Statistical physics|
Abstract: The increasing dimensionality of data in the modern machine learning age presents new challenges and opportunities. The high dimensional settings allow one to use powerful asymptotic methods from probability theory and statistical physics to obtain precise characterizations and develop new algorithmic approaches. Statistical mechanics approaches, in particular, are very well suited for such problems. Will give examples of recent works in our group that build on powerful methods of statistical physics of disordered systems to analyze some relevant questions in machine learning and neural networks, including overparameterization, kernel methods, and the behavior gradient descent algorithm in a high dimensional non-convex landscape.
|11/11/2020||Eric Mjolsness, Departments of Computer Science and Mathematics, UC Irvine|
|Title: Towards AI for mathematical modeling of complex biological systems: Machine-learned model reduction, spatial graph dynamics, and symbolic mathematics|
Abstract: The complexity of biological systems (among others) makes demands on the complexity of the mathematical modeling enterprise that could be satisfied with mathematical artificially intelligence of both symbolic and numerical flavors. Technologies that I think will be fruitful in this regard include (1) the use of machine learning to bridge spatiotemporal scales, which I will illustrate with the “Dynamic Boltzmann Distribution” method for learning model reduction of stochastic spatial biochemical networks and the “Graph Prolongation Convolutional Network” approach to course-graining the biophysics of microtubules; (2) a meta-language for stochastic spatial graph dynamics, “Dynamical Graph Grammars”, that can represent structure-changing processes including microtubule dynamics and that has an underlying combinatorial theory related to operator algebras; and (3) an integrative conceptual architecture of typed symbolic modeling languages and structure-preserving maps between them, including model reduction and implementation maps.
|11/18/2020||Yang-Hui He, Oxford University, City University of London and Nankai University|
|Title: Universes as Big data, or Machine-Learning Mathematical Structures|
Abstract: We review how historically the problem of string phenomenology lead theoretical physics first to algebraic/differetial geometry, and then to computational geometry, and now to data science and AI.
With the concrete playground of the Calabi-Yau landscape, accumulated by the collaboration of physicists, mathematicians and computer scientists over the last 4 decades, we show how the latest techniques in machine-learning can help explore problems of physical and mathematical interest, from geometry, to group theory, to combinatorics and number theory.
|12/9/2020||James Gray, Virginia Tech, Dept. of Physics||Topic: Machine learning and SU(3) structures on six manifolds|
Abstract: In this talk we will discuss the application of Machine Learning techniques to obtain numerical approximations to various metrics of SU(3) structure on six manifolds. More precisely, we will be interested in SU(3) structures whose torsion classes make them suitable backgrounds for various string compactifications. A variety of aspects of this topic will be covered. These will include learning moduli dependent Ricci-Flat metrics on Calabi-Yau threefolds and obtaining numerical approximations to torsional SU(3) structures.
|1/13/2021||Josef Urban, Czech Technical University||TBA|
|1/20/2021||Christian Szegedy, Google Research||TBA|
|2/3/2021||Sanjeev Arora, Princeton Dept. of CS||TBA|
|4/7/2021||David McAllester, Toyota Technologcal Institute of Chicago||TBA|