GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?



Image Descriptions for ImageNet-1k Generated by MiniGPT-4

The description set can be downloaded via Google Drive or BaiduYun.
Filename is minigpt4_caption_imagenet_train_0_1281166.pth
We generate the descriptions for all 1,281,167 training images using MiniGPT-4 7B model.


Usage

import numpy import torch import torchvision description_list = torch.load('./minigpt4_caption_imagenet_train_0_1281166.pth') dataset = torchvision.datasets.ImageFolder('/path/to/ImageNet-1k/train') index = numpy.random.randint(1281167) img, label = dataset.__getitem__(index) description = description_list[index] ## The "img" and "description" are exactly paired by the "index"


Examples



Citation

@article{ding2023lmat,
  title={GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?},
  author={Ding, Ning and Tang, Yehui and Fu, Zhongqian and Xu, Chao and Han, Kai and Wang, Yunhe},
  journal={arXiv preprint arXiv:2306.00693},
  year={2023}
}