GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?
Paper link : arxiv.org/pdf/2306.00693.pdf

Image Descriptions for ImageNet-1k Generated by MiniGPT-4
The description set can be downloaded via Google Drive or BaiduYun.
Filename is minigpt4_caption_imagenet_train_0_1281166.pth
We generate the descriptions for all 1,281,167 training images using MiniGPT-4 7B model.
Usage
import numpy
import torch
import torchvision
description_list = torch.load('./minigpt4_caption_imagenet_train_0_1281166.pth')
dataset = torchvision.datasets.ImageFolder('/path/to/ImageNet-1k/train')
index = numpy.random.randint(1281167)
img, label = dataset.__getitem__(index)
description = description_list[index]
## The "img" and "description" are exactly paired by the "index"
Examples

Citation
@article{ding2023lmat, title={GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?}, author={Ding, Ning and Tang, Yehui and Fu, Zhongqian and Xu, Chao and Han, Kai and Wang, Yunhe}, journal={arXiv preprint arXiv:2306.00693}, year={2023} }