NSF Research Training Group: Randomized Numerical Analysis
Randomized algorithms are methods that make random choices during their execution –akin to playing dice. Randomized sampling, that is the selective picking and choosing of data pieces, tends to be extremely effective for Big Data because of the high degree of redundancy and repetition in the data.
Science has embraced randomness as an indispensable means for understanding nature: Quantum physics posits statistical laws to describe the random behavior of particles. Economists use statistics to capture the price fluctuations in a free-market economy. Ethernet protocols can use random numbers to decide when next to access a shared communication channel. Radiation shielding was advanced by John von Neumann and Stanislav Ulam’s invention of Monte Carlo methods at Los Alamos in 1946.
Randomness helps to break symmetry, cut down on repetition, avoid outliers, and render intractable computations feasible. Randomized algorithms are often easier and simpler to program than their deterministic counterparts.
We believe that Numerical Analysis, the study of algorithms for the problems of continuous mathematics, can only survive if it looks beyond the smooth manifolds of continuous mathematics towards the graph and network structures of discrete mathematics. Classical NA methods are designed to answer exact questions as accurately as possible, but become powerless when confronted with vague questions and problems that may not have a unique answer: Is this tumor malignant: yes/no? Predict the 3 safest routes from A to B based on real-time traffic data.
The underlying mathematical problems tend to be neither well-posed nor well-conditioned. Asking for 16 accurate digits is absurd in the presence of noisy and missing data, and new generations of processors that trade off arithmetic accuracy for faster processing speed.
The emerging computational paradigms require entirely novel approaches that have the power to tame the many sources of error and their interplay: severe corruption of data, and the attendant need for regularization (converting a hopelessly ill-posed problem into a well-posed one); statistical uncertainties from inference and modeling; high numerical sensitivity even to the tiniest disturbances (ill-conditioning); errors due to randomization; and finally errors caused by finite precision arithmetic.
This is precisely the purpose of our program.