Manish Nagaraj

Data Efficiency · Efficient Fine-tuning · Large Language Models

I am currently working as a Machine Learning Engineer at Uber Technologies. Broadly, I’m interested in techniques that make foundation models, LLMs, vision models, and multimodal systems, more compact, accurate, and deployable in real applications.

I received my Ph.D. in Electrical and Computer Engineering at Purdue University, working with Professor Kaushik Roy. I also received my M.S. in Electrical and Computer Engineering from Purdue University and my B.E. in Electronics and Communications from PES Institute of Technology, Bangalore, India.

My doctoral dissertation, “Exploring Data Efficiency for Deep Learning Systems” looked at how to make modern deep learning, especially large language and vision models, more practical and scalable. I worked on methods that identify which data actually matters for training, so that we can fine-tune and deploy large models with less compute and without sacrificing performance. This has included:

‘TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning’ - a forward-only, attention-based approach for selecting instruction-tuning data for LLMs, accepted for publication at ICML 2026.
‘Coresets from Trajectories: Selecting Data via Correlation of Loss Differences’ - a gradient-free coreset method for large-scale vision training, accepted at TMLR.
‘TOFU: Federated Learning with Data and Communication Efficiency’ - improving data and communication efficiency in federated learning, published in IEEE Access.
‘DOTIE: Energy-Efficient Object Detection Using Event Cameras’ - event-based object detection with spiking neural networks, demonstrated at the 2023 IEEE International Conference on Robotics and Automation (ICRA) and CVPR workshops.

Across these projects, the common thread was data efficiency for large models: selecting informative subsets, scaling training under real-world constraints, and making models usable in settings like federated learning, robotics, and resource-limited hardware.

News

May 18, 2026	I’m happy to share that I’m starting a new position as Machine Learning Engineer at Uber Technologies!!
May 04, 2026	I completed my PhD! I want to specially thank my committee members and friends for all the support!
Apr 30, 2026	Thrilled to share that our paper TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning has been accepted to ICML 2026! 🎉
Nov 18, 2025	Coresets from Trajectories: Selecting Data via Correlation of Loss Differences got accepted for publication at TMLR!
Oct 25, 2024	I passed my preliminary examination!

Latest Posts

Jul 10, 2023	Blog post on Federated Learning

Selected Publications

TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning

Manish Nagaraj, Sakshi Choudhary, Utkarsh Saxena, Deepak Ravikumar, and Kaushik Roy

In Forty-third International Conference on Machine Learning, 2026

Abs DOI Bib Code

Instruction tuning is essential for aligning large language models (LLMs) to downstream tasks and commonly relies on large, diverse corpora. However, small, high-quality subsets, known as coresets, can deliver comparable or superior results, though curating them remains challenging. Existing methods often rely on coarse, sample-level signals like gradients, an approach that is computationally expensive and overlooks fine-grained features. To address this, we introduce TRIM (Token Relevance via Interpretable Multi-layer Attention), a forward-only, token-centric framework. Instead of using gradients, TRIM operates by matching underlying representational patterns identified via attention-based "fingerprints" from a handful of target samples. Such an approach makes TRIM highly efficient and uniquely sensitive to the structural features that define a task. Coresets selected by our method consistently outperform state-of-the-art baselines by up to 9% on downstream tasks and even surpass the performance of full-data fine-tuning in some settings. By avoiding expensive backward passes, TRIM achieves this at a fraction of the computational cost. These findings establish TRIM as a scalable and efficient alternative for building high-quality instruction-tuning datasets.
@inproceedings{nagaraj2025trim, title = {TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning}, author = {Nagaraj, Manish and Choudhary, Sakshi and Saxena, Utkarsh and Ravikumar, Deepak and Roy, Kaushik}, booktitle = {Forty-third International Conference on Machine Learning}, year = {2026}, url = {https://openreview.net/forum?id=25AaIQjAg9}, doi = {10.48550/arXiv.2510.07118}, }
Coresets from Trajectories: Selecting Data via Correlation of Loss Differences

Manish Nagaraj, Deepak Ravikumar, and Kaushik Roy

Transactions on Machine Learning Research, 2025

Abs DOI Bib Code

Deep learning models achieve state-of-the-art performance across domains but face scalability challenges in real-time or resource-constrained scenarios. To address this, we proposeCorrelation of Loss Differences (CLD), a simple and scalable metric for coreset selection that identifies the most impactful training samples by measuring their alignment with the loss trajectories of a held-out validation set. CLD is highly efficient, requiring only persample loss values computed at training checkpoints, and avoiding the costly gradient and curvature computations used in many existing subset selection methods. We develop a general theoretical framework that establishes convergence guarantees for CLD-based coresets, demonstrating that the convergence error is upper-bounded by the alignment of the selected samples and the representativeness of the validation set. On CIFAR-100 and ImageNet-1k,CLD-based coresets typically outperform or closely match state-of-the-art methods across subset sizes, and remain within 1% of more computationally expensive baselines even when not leading. CLD transfers effectively across architectures (ResNet, VGG, DenseNet), enabling proxy-to-target selection with < 1% degradation. Moreover, CLD is stable when using only early checkpoints, incurring negligible accuracy loss. Finally, CLD exhibits inherent bias reduction via per-class validation alignment, obviating the need for additional stratified sampling. Together, these properties make CLD a principled, efficient, stable, and transferable tool for scalable dataset optimization.
@article{nagaraj2025coresets, title = {Coresets from Trajectories: Selecting Data via Correlation of Loss Differences}, author = {Nagaraj, Manish and Ravikumar, Deepak and Roy, Kaushik}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2025}, url = {https://openreview.net/forum?id=QY0pbZTWJ9}, doi = {10.48550/arXiv.2508.20230}, }
TOFU: Towards Obfuscated Federated Updates by Encoding Weight Updates into Gradients from Proxy Data

Manish Nagaraj, Isha Garg, and Kaushik Roy

IEEE Access, 2024

Abs DOI Bib

Advances in Federated Learning and an abundance of user data have enabled rich collaborative learning between multiple clients, without sharing user data. This is done via a central server that aggregates learning in the form of weight updates. However, this comes at the cost of repeated expensive communication between the clients and the server, and concerns about compromised user privacy. The inversion of gradients into the data that generated them is termed data leakage. Encryption techniques can be used to counter this leakage but at added expense. To address these challenges of communication efficiency and privacy, we propose TOFU, a novel algorithm that generates proxy data that encodes the weight updates for each client in its gradients. Instead of weight updates, this proxy data is now shared. Since input data is far lower in dimensional complexity than weights, this encoding allows us to send much lesser data per communication round. Additionally, the proxy data resembles noise and even perfect reconstruction from data leakage attacks would invert the decoded gradients into unrecognizable noise, enhancing privacy. We show that TOFU enables learning with less than 1% and 7% accuracy drops on MNIST and CIFAR-10 datasets, respectively. This drop can be recovered via a few rounds of expensive encrypted gradient exchange. This enables us to learn to near-full accuracy in a federated setup, while being 4x and 6.6x more communication efficient than the standard Federated Averaging algorithm on MNIST and CIFAR-10, respectively.
@article{nagaraj2024tofu, title = {TOFU: Towards Obfuscated Federated Updates by Encoding Weight Updates into Gradients from Proxy Data}, author = {Nagaraj, Manish and Garg, Isha and Roy, Kaushik}, journal = {IEEE Access}, year = {2024}, publisher = {IEEE}, doi = {10.1109/ACCESS.2024.3390716}, url = {https://ieeexplore.ieee.org/abstract/document/10504799}, }
Dotie-detecting objects through temporal isolation of events using a spiking architecture

Manish Nagaraj, Chamika Mihiranga Liyanagedera, and Kaushik Roy

In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023

Abs DOI Bib Video Code

Vision-based autonomous navigation systems rely on fast and accurate object detection algorithms to avoid obstacles. Algorithms and sensors designed for such systems need to be computationally efficient, due to the limited energy of the hardware used for deployment. Biologically inspired event cameras are a good candidate as a vision sensor for such systems due to their speed, energy efficiency, and robustness to varying lighting conditions. However, traditional computer vision algorithms fail to work on event-based outputs, as they lack photometric features such as light intensity and texture. In this work, we propose a novel technique that utilizes the temporal information inherently present in the events to efficiently detect moving objects. Our technique consists of a lightweight spiking neural architecture that is able to separate events based on the speed of the corresponding objects. These separated events are then further grouped spatially to determine object boundaries. This method of object detection is both asynchronous and robust to camera noise. In addition, it shows good performance in scenarios with events generated by static objects in the background, where existing event-based algorithms fail. We show that by utilizing our architecture, autonomous navigation systems can have minimal latency and energy overheads for performing object detection.
@inproceedings{nagaraj2023dotie, title = {Dotie-detecting objects through temporal isolation of events using a spiking architecture}, author = {Nagaraj, Manish and Liyanagedera, Chamika Mihiranga and Roy, Kaushik}, booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)}, pages = {4858--4864}, year = {2023}, organization = {IEEE}, url = {https://arxiv.org/abs/2210.00975}, doi = {10.1109/ICRA48891.2023.10161164}, }