More Than Pixels - Unlock your image data with Vision-Language Models :: Swiss Python Summit 2024

More Than Pixels - Unlock your image data with Vision-Language Models
.ical

10-18, 13:50–14:20 (Europe/Zurich), Aula 4.101

Join us on two Vision-Language Adventures!

We'll uncover the information hidden inside big image collections with Vision-Language Models (VLMs) showing us the way. Who knows which forgotten gems await us?

In the first part, we'll use CLIP and FAISS to go on a treasure hunt in your photo collection. You'll learn how to filter through millions of images in a breeze, using natural language. Bye-bye endless scrolling, hour-long tagging, and frustrated folder searching 👋

In the second part, we will harness the power of VLMs to help us caption images - translating pixels to words. Then we'll make use of the BERTopic library to reveal even deeper insights into your photo collections.

By the end of this talk, you'll be equipped with the knowledge and tools to unlock new insights, identify patterns, and make your image data work harder for you.

This talk is for an intermediate audience - it is good if you bring some knowledge in Computer Vision, NLP or just general Deep Learning.

The talk will be structured as follows:
- 5min - What are VLMs?
- 10min - Image Search with CLIP and FAISS
- 10min - Analysis of Captioned Images
- 5min - Possible Applications, Closing Thoughts

Johannes Kolbe

Hey all 👋

I'm a Data Scientist at celebrate company by day and an AI storyteller by night.

After experiences in research at Fraunhofer Fokus Institute and tinkering with sensor setups for autonomous vehicles, I decided to get more hands-on and joined celebrate company, where I'm now helping our customers to design amazing products with the help of Machine Learning.

I hold a Master's degree in Computer Science with a focus on cognitive systems from TU Berlin.

More Than Pixels - Unlock your image data with Vision-Language Models .ical 10-18, 13:50–14:20 (Europe/Zurich), Aula 4.101

More Than Pixels - Unlock your image data with Vision-Language Models
.ical

10-18, 13:50–14:20 (Europe/Zurich), Aula 4.101