The challenge is turning those quick offline moments into lasting online connections. Place scannable codes on your key ...
[2024/4/23] We have added an audio-grounding feature that tracks the sound-making object within the video's soundtrack. [2023/5/12] We have authored a technical report for SAM-Track. [2023/5/7] We ...
Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision ...
Abstract: Tactile sensing is one of the modalities humans rely on heavily to perceive the world. Working with vision, this modality refines local geometry structure, measures defor-mation at the ...
Abstract: Large language models (LLMs) have demonstrated excellent zero-shot generalization to new language tasks. However, effective utilization of LLMs for zero-shot visual question-answering (VQA) ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results