Consume Kafka topics using Azure Databricks (Spark), Confluent Cloud (Kafka) running on Azure, Schema Registry and AVRO format

Caio Moreno
1 min readOct 8, 2020

--

This post will provide the Sample code (Python) to consume Kafka topics using Azure Databricks (Spark), Confluent Cloud (Kafka) running on Azure, Schema Registry and AVRO format.

Reading the topic:

Kafka Topic

Stream Data formatted and stored in a Spark SQL Table (view):

Topic Curated Data

Source code:
https://github.com/caiomsouza/microsoft-big-data-scientist-and-ai/blob/master/samples/azure-ccloud-databricks/SampleCodeConfluentCloudKafkaAvroPython_CM_08102020_WithoutCred.py

Docs:
https://azure.microsoft.com/en-us/services/databricks/
https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html#example-with-schema-registry
https://docs.confluent.io/current/cloud/cloud-start.html#cloud-start

Special thanks and credits to Angela Chu, Gianluca Natali, Henning Kropp, Yatharth Gupta, Bhanu Prakash, Awez Syed, Nick Hill, Robin Davidson, Liping Huang, Chris Munyasya, Sid Rabindran and many more people from the Databricks, Confluent and Microsoft team engaged to make this integration to work.

--

--

Caio Moreno
Caio Moreno

Written by Caio Moreno

Solutions Architect @databricks | Professor | PhD | Ex-Microsoft | Ex-Avanade/Accenture | Ex-Pentaho/Hitachi | Ex-AOL | Ex-IT4biz CEO. (Opinions are my own)

No responses yet