SCIENCE AT THE EDGE SEMINAR SERIES

Quantitative Biology / Gene Expression in Development & Disease Seminar

Friday, 01 March 2013 at 11:30am

Room 1400 Biomedical and Physical Sciences Bldg.

Refreshments at 11:30

Speaker:  Mauro Maggioni, Departments of Mathematics, Computer Science, and Electrical and Computer Engineering, Duke University

Title:  Multiscale Geometric Methods for Data in High Dimensions

Abstract:
We discuss recent work on multiscale geometric analysis applied to high-dimensional data sets. A first application to the estimation of the intrinsic dimension of noisy data, a second one to the construction of data-driven dictionaries for efficient sparse representations data sets and a novel geometric multiresolution analysis framework for encoding data. Finally we discuss the problem of estimating a probability measure in high dimensions, whose support is (nearly) low-dimensional and has some geometric structure, for example that of a manifold, or a union of hyperplanes. We construct a multiscale geometric tree decomposition of the data and use this decomposition to construct an increasing family of approximation “spaces” in the space or probability measures, parametrized by certain subtrees of the multiscale tree, and perform a multiscale bias-variance tradeoff using this family of approximation spaces. We obtain finite-sample results that guarantee that with high probability the Wasserstein distance between the (random) measure estimated by our algorithm and the true measure is small, depending on the number of samples, a measure of complexity of the models we use (typically this depends only on the intrinsic dimension and not on the ambient dimension!), and a notion of “regularity” of the true measure.