Suriya Gunasekar

Suriya Gunasekar

Principal Research Manager

Microsoft Research, Redmond

suriyag@microsoft.com

I am a Principal Research Manager at Microsoft Research, where I am part of the Physics of AGI group that pioneered the Phi family of language models. I currently work on improving language model capabilities through scale, data curation, and creative uses of synthetic data generation, while maintaining broader interests in evaluation and alignment of AI systems for targeted usecases. Prior to MSR, I was a Research Assistant Professor at Toyota Technological Institute at Chicago. I received my Ph.D. from The University of Texas at Austin.

Highlight: The Phi family

(2023) Phi-2: The surprising power of small language models. Blog Post Model

(2023) Textbooks Are All You Need II: phi-1.5 technical report. PDF Model

arXiv preprint.

(2023) Textbooks Are All You Need. PDF Model

arXiv preprint.

Mentorship

I have had the priviledge of working with some awesome interns in our group.

All Publications

(2023) KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval. PDF Dataset

International Conference on Learning Representations (ICLR, 2024).

(2023) Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models. PDF

International Conference on Learning Representations (ICLR, 2024).

(2023) Textbooks Are All You Need II: phi-1.5 technical report. PDF Model

arXiv preprint.

(2023) Textbooks Are All You Need. PDF Model

arXiv preprint.

(2023) (S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability. PDF

Advances in Neural Information Processing Systems (NeurIPS).

(2022) How to Fine-Tune Vision Models with SGD. PDF

International Conference on Learning Representations (ICLR, 2024).

(2022) Unveiling Transformers with LEGO: a synthetic reasoning task. PDF Code

arXiv preprint.

(2022) Neural-Sim: Learning to Generate Training Data with NeRF. PDF Code

European Conference on Computer Vision (ECCV).

(2022) Data Augmentation as Feature Manipulation. PDF

International Conference on Machine Learning (ICML).

(2022) Inductive bias of multi-channel linear convolutional networks with bounded weight norm. PDF

Conference on Learning Theory (COLT).

(2021) Methods and Analysis of The First Competition in Predicting Generalization of Deep Learning. PDF Dataset Competition Page

NeurIPS 2020 Competition and Demonstration Track.

(2021) Mirrorless mirror descent: A natural derivation of mirror descent. PDF

International Conference on Artificial Intelligence and Statistics (AISTATS).

(2020) Implicit bias in deep linear classification: Initialization scale vs training accuracy. PDF

Neural Information Processing Systems (NeurIPS).

(2020) Implicit regularization and convergence for weight normalization. PDF

Neural Information Processing Systems (NeurIPS).

(2020) Kernel and Rich Regimes in Overparametrized Models. PDF

Conference on Learning Theory (COLT).

(2019) Theory of deep learning. PDF

Princeton Univ. Princeton, NJ.

(2019) Convergence of gradient descent on separable data. PDF

International Conference on Artificial Intelligence and Statistics (AISTATS).

(2019) Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models. PDF

International Conference on Machine Learning (ICML).

(2018) Implicit bias of gradient descent on linear convolutional networks. PDF

Neural Information Processing Systems (NeurIPS).

(2018) On preserving non-discrimination when combining expert advice. PDF

Neural Information Processing Systems (NeurIPS).

(2018) Characterizing Implicit Bias in Terms of Optimization Geometry. PDF

International Conference on Machine Learning (ICML).

(2018) The Implicit Bias of Gradient Descent on Separable Data. PDF

Journal of Machine Learning Research (JMLR).

(2017) Implicit regularization in matrix factorization. PDF

Neural Information Processing Systems (NeurIPS).

(2017) Learning Non-Discriminatory Predictors. PDF

Conference on Learning Theory (COLT).

(2016) Preference Completion from Partial Rankings. PDF Code

Neural Information Processing Systems (NeurIPS).

(2016) Identifiable phenotyping using constrained non-negative matrix factorization. PDF

Machine Learning for Healthcare Conference (MLHC).

(2016) Phenotyping using Structured Collective Matrix Factorization of Multi--source EHR Data. PDF

arXiv preprint.

(2015) Unified view of matrix completion under general structural constraints. PDF

Neural Information Processing Systems (NeurIPS).

(2015) Consistent collective matrix completion under joint low rank structure. PDF

Artificial Intelligence and Statistics (AISTATS).

(2014) Face detection on distorted images augmented by perceptual quality-aware features. PDF Dataset

IEEE transactions on information forensics and security.

(2014) Exponential family matrix completion under structural constraints. PDF Errata for conference version

International Conference on Machine Learning (ICML).

(2013) Noisy matrix completion using alternating minimization. PDF

Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD).

(2012) Review quality aware collaborative filtering. PDF

ACM conference on Recommender systems (RecSys).