【Kai Wähner】GenAI Demo with Apache Kafka, Flink, LangChain, OpenAI



Kai Wähner :GenAI Demo with Apache Kafka, Flink, LangChain, OpenAI

GenAI Demo with Apache Kafka, Flink, LangChain, OpenAI

Generative AI (GenAI) enables automation and innovation across industries. This live demo explores a simple but powerful architecture and demo  for the combination of LangChain with OpenAI LLM, Apache Kafka for event streaming and data integration, and Apache Flink for stream processing. The use case demonstrates how data streaming and GenAI help correlating data from Salesforce CRM, searching for lead information in public datasets like LinkedIn, and recommendation ice-breaker conversations for sales reps.

Table of Contents:

00:44 – Demo Use Case
01:59 – Technical Architecture
04:32 – Github Project
05:57 – LinkedIn Scraping
10:45 – GenAI with LangChain and OpenAI
14:40 – Summary

The following technologies and infrastructure is used to implement and deploy the GenAI demo.

– Python: The programming language almost every data engineer and data scientist uses.
LangChain: The Python framework implements the application to support sales conversations.
– OpenAI: The language model and API help to build simple but powerful GenAI applications.
– Salesforce: The Cloud CRM tool stores the lead information and other sales and marketing data.
– Apache Kafka: Scalable real-time data hub decoupling the data sources (CRM) and data sinks (GenAI application and other services).
– Kafka Connect: Data integration via Change Data Capture (CDC) from Salesforce CRM.
– Apache Flink: Stream processing for enrichment and data quality improvements of the CRM data.
– Confluent Cloud: Fully managed Kafka (Stream and Store), Flink (Process), and Salesforce connector (Integrate).
– SerpAPI: Scrape Google and other search engines with the lead information.
– proxyCurl: Pull rich data about the lead from LinkedIn without worrying about scaling a web scraping and data-science team.

The GitHub project is here: https://github.com/ora0600/genai-with-confluent

For more information about the combination of data streaming (Kafka and Flink) and Generative AI (Python, LangChain, OpenAI), check out these articles:

Apache Kafka as Mission Critical Data Fabric for GenAI: https://www.kai-waehner.de/blog/2023/07/22/apache-kafka-as-mission-critical-data-fabric-for-genai/

Apache Kafka + Vector Database + LLM = Real-Time GenAI: https://www.kai-waehner.de/blog/2023/11/08/apache-kafka-flink-vector-database-llm-real-time-genai/