Making Safety Sheets Conversational: A RAG Approach with Docling

I am happy to share my presentation:
โ€œ๐Œ๐š๐ค๐ข๐ง๐  ๐’๐š๐Ÿ๐ž๐ญ๐ฒ ๐’๐ก๐ž๐ž๐ญ๐ฌ ๐‚๐จ๐ง๐ฏ๐ž๐ซ๐ฌ๐š๐ญ๐ข๐จ๐ง๐š๐ฅ: ๐€ ๐‘๐€๐† ๐€๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ฐ๐ข๐ญ๐ก ๐ƒ๐จ๐œ๐ฅ๐ข๐ง๐ โ€
at the ๐ƒ๐จ๐œ๐ฅ๐ข๐ง๐  ๐‚๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ ๐Œ๐ž๐ž๐ญ๐ฎ๐ฉ.

In this talk, I present a ๐‘๐ž๐ญ๐ซ๐ข๐ž๐ฏ๐š๐ฅ-๐€๐ฎ๐ ๐ฆ๐ž๐ง๐ญ๐ž๐ ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐จ๐ง (๐‘๐€๐†) pipeline for transforming complex multilingual safety sheets at RINGANA – ๐’๐จ๐…๐ซ๐ž๐ฌ๐ก ๐ˆ๐“-๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ into an interactive conversational system.

The presentation covers:
[1.] Hybrid chunking with Docling
[2.] ๐Ž๐‚๐‘ and image understanding for challenging PDF documents
[3.] Hybrid retrieval with dense embeddings and ๐“๐…-๐ˆ๐ƒ๐… sparse vectors
[4.] Metadata-aware ๐•๐ž๐œ๐ญ๐จ๐ซ๐ƒ๐ schema design with Pinecone
[5.] LangGraph (LangChain)-based query routing for fast and context-aware answers

I am also very excited about the upcoming ๐ƒ๐จ๐œ๐ฅ๐ข๐ง๐ -๐€๐ ๐ž๐ง๐ญ project and its potential for building even more advanced document-aware AI systems:
https://lnkd.in/durG-5aS

A big thank you to the Docling community, Mingxuan Zhao from IBM ๐๐ž๐ฐ ๐˜๐จ๐ซ๐ค, Carol Chen from Red Hat, for organising this meetup, and especially to everyone who asked thoughtful and challenging questions during the discussion.
I highly encourage you to connect with me on LinkedIn to continue the conversation.

I will also soon publish a detailed blog post on my website https://annasaranti.ai discussing several of the interesting issues and ideas raised during the Community Office Hours.

#AI #RAG #Docling #LLM #VectorDatabase #Pinecone #LangGraph #OCR #DocumentAI #GenerativeAI

—————————————————————————————————————————————————–

This presentation is dedicated to three professors, who will remain unnamed.

To the first, who claimed my code would not run, and who preferred
a private review over a formal evaluation process because
there were โ€œmany courses to prepareโ€: this is your answer.

To the second, who questioned the relevance of my AI models to his field:
innovation rarely waits for permission.

To the third, who wondered how I could exist within the private sector:
some paths are easier to criticize than to walk.

The answer is not impossible.

Video: https://www.youtube.com/watch?v=v0mZ-n_BgZI