Matching Images to Book Quotes with AI

Moving beyond keyword matching

Catherine Breslin
4 min readFeb 4, 2025
Thumbnails of images from the Div2k dataset

Have you ever tried to find an image to illustrate a story or a blog?

Searching for images to match text is tedious. It takes time to think of suitable keywords that match the theme of the text, search for images, look through the results to see what keywords work, and iterate. Thats all time I would rather spend doing something else.

What if we could use AI to find the perfect image based on text alone? Using CLIP and Pinecone, I tried exactly that.

Embedding text and images

The first step is to represent both text and image in a useful way. For this, I used embeddings, which are a way to encode text or images as a vector. Using embeddings, you can easily find whether two items are similar to each other or not.

Most AI models embed just one medium, like text or image or audio. That means you can only compare text with text, image with image, or audio with audio. That wasn’t going to work here as using separate models to embed text and audio would mean the two couldn’t be compared.

Instead, I used the CLIP model which is able to embed both images and text into the same latent space, so they’re directly comparable.

Image dataset

The next task is to find a dataset of images to be the basis of my image search. I experimented here with the div2k dataset of images. You can see thumbnails of some of this set at the top of this post. There were 800 images in the training set, and I used CLIP to create an embedding of each image.

800 isn’t very many images, but it did mean I could run the code to create all 800 embeddings in just a few seconds, speeding up my experimentation.

Database and matching

I used a vector database — Pinecone — to make storage, comparison and retrieval of the embeddings easy. Pinecone has a free tier so I used it to store CLIP embeddings for the 800 images. Then, I queried the database with text embeddings, also created using CLIP, to find images most similar to the text.

Pinecone allows you to find the top N matches, but I chose only to look at the top matching image.

Experimenting

To test out whether this works, I experimented with matching famous book quotes with images from the div2k set. I’m limited by the scope of the 800 images in the set — a bigger set of images would bring more creative possibilities — but here are some of my favourite matches to quotes from Macbeth, Where the Wild Things Are, and Slaughterhouse-Five:

“Double, double, toil, and trouble, fire burn and cauldron bubble.”

“’And now,’ cried Max, ‘let the wild rumpus start!’”

“Everything was beautiful, and nothing hurt”

So far, so good! Despite the relatively small number of images to choose between, the algorithm does a reasonable job of selecting images that match the mood and symbolism of these book quotes.

There are, however, some quotes that didn’t work as I hoped! This one, from the Witches’ speech in Macbeth, matched to a somewhat unrelated photo of cats:

“When shall we three meet again, in thunder, lightning or in rain. When the hurlyburly’s done, when the battle’s lost and won”.

My main problem so far was the small number of images in the div2k set. Also, none of the images in this set was the kind of image that naturally would be a good match for illustrating stories and literary quotes. Drastically expanding the set of images would likely be a good first step for more interesting results.

Let me know in the comments if you have suggestions for good image datasets to work with.

I work with companies building AI technology. Get in touch to explore how we could work together.

--

--

Catherine Breslin
Catherine Breslin

Written by Catherine Breslin

Machine Learning scientist & consultant :: voice and language tech :: powered by coffee :: www.catherinebreslin.co.uk

No responses yet