Kanishk Jain

New · 2026

Discovering Failure Modes in Vision-Language Models using RL

Kanishk Jain, Qian Yang, Shravan Nayak, Parisa Kordjamshidi, Nishanth Anand, Aishwarya Agrawal

arXiv, 2026

An RL framework that trains a questioner agent to generate increasingly complex queries that automatically expose weaknesses in vision-language models without human intervention.

Prioritized Concept Learning via Relative Error-driven Sample Selection

Shivam Chandhok, Qian Yang, Oscar Mañas, Kanishk Jain, Leonid Sigal, Aishwarya Agrawal

arXiv, 2025

pdf

A data-efficient training framework for VLMs that dynamically selects samples based on the model's learning propensity.

Benchmarking Vision-Language Models for Cultural Understanding

Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, et al.

EMNLP, 2024 (Oral)

pdf

CulturalVQA: a benchmark for evaluating VLMs' understanding of geo-diverse cultural elements, revealing performance disparities across regions and facets.

Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification

Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi

NeurIPS, 2023

pdf code

Hierarchical Ensembles (HiE) — a post-hoc strategy that leverages label hierarchy to reduce mistake severity in fine-grained classification at test time.

Instance-Level Semantic Maps for Vision-Language Navigation

Laksh Nanwani, Anmol Agarwal, Kanishk Jain, Raghav Prabhakar, Aaron Monis, Aditya Mathur, Krishna Murthy, Abdul Hafez, Vineet Gandhi, K. Madhava Krishna

RO-MAN, 2023

pdf bibtex

An instance-focused scene representation enabling language-based navigation that grounds references to specific instances within an environment.

Ground then Navigate: Language-Guided Navigation in Dynamic Scenes

Kanishk Jain*, Varun Chhangani*, Amogh Tiwari, K. Madhava Krishna, Vineet Gandhi

ICRA, 2023

pdf bibtex

Vision-and-Language Navigation for outdoor autonomous driving — explicitly grounds navigable regions from text and uses them as guidance for the navigation stack.

Bringing Generalization to Deep Multi-View Detection

Jeet Vora, Swetanjal Dutta, Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi

WACV Workshop, 2023

pdf code bibtex

Existing multi-view detectors overfit to single scenes — we formalize three critical forms of generalization and propose evaluations for them.

Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Kanishk Jain, Vineet Gandhi

ACL Findings, 2022

pdf code bibtex

A Transformer-based architecture capturing all forms of multi-modal interaction synchronously for referring image segmentation.

Grounding Linguistic Commands to Navigable Regions

Kanishk Jain*, Nivedita Rufus*, Unni Krishnan R. Nair*, Vineet Gandhi, K. Madhava Krishna

IROS, 2021

pdf code bibtex

A visual-grounding-based approach to language-guided navigation that brings interpretability to Vision-Language Navigation.