hello!
I’m Prabhav Singh (pronounced Pruh-bhav).
I’m currently pursuing my Master’s in Computer Science (Thesis) with a specialization in Human Language Technologies at Johns Hopkins University, where I conduct research at the Center for Language and Speech Processing (CLSP). I’m fortunate to be advised by Prof. Jason Eisner and Prof. Jesus Villalba. Before this, I earned my Bachelor’s in Electrical Engineering from Delhi University, where I worked with Prof. K.P.S. Rana and Prof. Vineet Kumar at the APC Lab, NSIT.
You can find more details in my CV. Feel free to reach out at: psingh54 at jhu dot edu
💡 I am actively seeking PhD positions for Fall 2026. If you are aware of openings or opportunities, I’d deeply appreciate hearing from you!
my (ever-changing) research interests
My broad interests are in the field of langauge modeling, speech representation and any combination of both that is helpful in solving a task. See my publications for more, or read about my major research areas (as of now) below (click on them to expand).
Cheaper LLM + Human Workflows: Combining humans and LLMs in a pipeline that enables principled, cost-effective annotation and evaluation.
LLMs are increasingly employed as surrogate annotators and evaluators in NLP workflows. However, current practices often involve multiple heuristic decisions to design effective workflows. For example, choosing the appropriate subset to annotate — by LLMs or humans — remains a costly decision.
Recently, I've been working on AnnotationArena — an end-to-end framework to streamline LLM-based evaluation and annotation. This includes:
- Using Value of Information [1], [2] for inference-time decision making.
- Leveraging gradient-based heuristics [3], [4] for active learning.
- Exploring reinforcement learning and alignment techniques to enable adaptive, continuous annotation pipelines with principled decisions.
I'm also interested in alternative labeling strategies such as ratings, rankings, and ordinal classifications.
Multimodal Learning for Language and Speech: Fusing audio, text, and vision to solve tasks that are natural for humans — but hard for machines.
I build models that integrate speech, text, and vision for tasks like:
My models learn from heterogeneous modalities with minimal supervision. I began my research journey with emotion recognition, and while I’ve developed a fair amount of expertise in that space, I’m increasingly drawn to speaker recognition and diarization. I find diarization particularly interesting — it's a fundamental speech task with open challenges in temporal structure, multimodal fusion, and low-resource adaptation. My recent works are focused on improving diarization quality through multitask learning and adaptive fusion.
ML Learning Theory / Replicability: Understanding how adaptive decisions affect replicability in transfer learning.
Thanks to rigorous theory coursework at JHU (this), I’ve developed a strong interest in replicability theory — distinct from reproducibility (See [this] and [this]).
My initial research in this area (guided by Prof. Jess Sorrell) focuses on:
- Deriving replicability bounds for transfer learning.
- Investigating how adaptive data selection affects transferability and stability of learned models.
Read our ongoing manuscript: Sensitivity of Selectivity in Transfer Learning (work-in-progress).
📢 recent updates
-
June 2025
Starting my summer internship at a stealth startup in California — working on agentic workflows for document understanding in finance and retail. -
May 2025
Two papers accepted at INTERSPEECH 2025! (See this post). Excited to present in Rotterdam 🇳🇱! -
April 2025
Our poster on LLM + Human collaboration (read here) won Best Poster at MASC-SLL 2025. -
September 2024
Our ICMI’24 paper on multimodal emotion recognition for Mild Cognitive Impairment (MCI) is now available.
#MASCSLL2025 was super fun! Thanks to @penn_state for organising @MASC_Conference (and for the ice-cream).
— Prabhav Singh (@psingh522) April 6, 2025
Our paper also won a best-poster award! pic.twitter.com/1AlPB6iRG4