Student Research

Novel Randomized Linear Algebra in Apache Spark

In this poster, I present some of the work that I have done on Apache Spark implementations of randomized Linear Algebra: randomized SVD and randomized QB factorization. The field of randomized Linear Algebra has expanded rapidly in the last decade in response to the challenge of Big Data and the need for computationally efficient solutions in this realm. Probabilistic algorithms that derive a smaller matrix approximation from a higher-dimensional matrix have been shown to have both theoretical and empirical success. I seek to take advantage of the Apache Spark Big Data processing engine, in particular the GraphX library, to efficiently compute these various probabilistic algorithms in a distributed manner. Moreover, GraphX provides the graph-and-data parallel processing abstraction that I attempted to build an effective and scalable solution on top of. Finally, I verify the algorithms in a local manner and benchmark them on a large matrices to demonstrate the correctness and effectiveness of the implementations.