QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-Free Visual Document Understanding
Abstract: In Visual Document Understanding (VDU) tasks, finetuning a pre-trained Vision-Language Model (VLM) with new datasets often falls short in optimizing the vision encoder to identify ...
Abstract: In this paper, a table lookup-based computing technique is proposed to perform convolutional neural network (CNN) inference without multiplication, and its FPGA implementation is ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results