I am happy to share my presentation:
โ๐๐๐ค๐ข๐ง๐ ๐๐๐๐๐ญ๐ฒ ๐๐ก๐๐๐ญ๐ฌ ๐๐จ๐ง๐ฏ๐๐ซ๐ฌ๐๐ญ๐ข๐จ๐ง๐๐ฅ: ๐ ๐๐๐ ๐๐ฉ๐ฉ๐ซ๐จ๐๐๐ก ๐ฐ๐ข๐ญ๐ก ๐๐จ๐๐ฅ๐ข๐ง๐ โ
at the ๐๐จ๐๐ฅ๐ข๐ง๐ ๐๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ ๐๐๐๐ญ๐ฎ๐ฉ.
In this talk, I present a ๐๐๐ญ๐ซ๐ข๐๐ฏ๐๐ฅ-๐๐ฎ๐ ๐ฆ๐๐ง๐ญ๐๐ ๐๐๐ง๐๐ซ๐๐ญ๐ข๐จ๐ง (๐๐๐) pipeline for transforming complex multilingual safety sheets at RINGANA – ๐๐จ๐ ๐ซ๐๐ฌ๐ก ๐๐-๐๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ into an interactive conversational system.
The presentation covers:
[1.] Hybrid chunking with Docling
[2.] ๐๐๐ and image understanding for challenging PDF documents
[3.] Hybrid retrieval with dense embeddings and ๐๐
-๐๐๐
sparse vectors
[4.] Metadata-aware ๐๐๐๐ญ๐จ๐ซ๐๐ schema design with Pinecone
[5.] LangGraph (LangChain)-based query routing for fast and context-aware answers
I am also very excited about the upcoming ๐๐จ๐๐ฅ๐ข๐ง๐ -๐๐ ๐๐ง๐ญ project and its potential for building even more advanced document-aware AI systems:
https://lnkd.in/durG-5aS
A big thank you to the Docling community, Mingxuan Zhao from IBM ๐๐๐ฐ ๐๐จ๐ซ๐ค, Carol Chen from Red Hat, for organising this meetup, and especially to everyone who asked thoughtful and challenging questions during the discussion.
I highly encourage you to connect with me on LinkedIn to continue the conversation.
I will also soon publish a detailed blog post on my website https://annasaranti.ai discussing several of the interesting issues and ideas raised during the Community Office Hours.
#AI #RAG #Docling #LLM #VectorDatabase #Pinecone #LangGraph #OCR #DocumentAI #GenerativeAI
—————————————————————————————————————————————————–This presentation is dedicated to three professors, who will remain unnamed.
To the first, who claimed my code would not run, and who preferred
a private review over a formal evaluation process because
there were โmany courses to prepareโ: this is your answer.
To the second, who questioned the relevance of my AI models to his field:
innovation rarely waits for permission.
To the third, who wondered how I could exist within the private sector:
some paths are easier to criticize than to walk.
The answer is not impossible.