DS-GA 3001 (MATH-GA 2830): Information Theory for Statistics and Learning

Description

The interplay between information theory, statistics, and machine learning is a constant theme in the development of all fields. This course will discuss how techniques rooted in information theory play a key role in understanding the fundamental limits of statistical and machine learning problems in terms of minimax risk and sample complexity, and develop procedures that attain the statistical optimality. The primary focus is on information-theoretic applications to statistics and machine learning, rather than the classical “IEEE-style information theory”.

This course will cover the following topics: 1) entropy, mutual information, KL divergence, f-divergences, and their many applications (source coding, channel coding, adaptive data analysis, PAC Bayes, binary hypothesis testing, large deviation, strong converse, I-MMSE formula, area theorem, functional inequalities, etc); 2) techniques for minimax lower bounds (mutual information method; Le Cam, Assouad, and Fano; methods of second moment and orthogonal polynomials; metric entropy and global Fano; etc); 3) constructive procedures (entropic upper bounds of statistical estimation; redundancy, aggregation, prediction via universal compression; sampling via strong data-processing inequality; etc). Detailed examples under many areas will be provided for each topic.