🍎
Comprehensive Technical Tutorial for AEP
  • Comprehensive Technical Tutorial for Adobe Experience Platform
    • Architecture
    • Video Overview
  • 0 - Getting started
    • 0.0 Which environment do I use?
    • (Deprecated) Install the Chrome extension for the Experience League documentation
    • 0.1 Use Demo System Next to setup your Adobe Experience Platform Data Collection client property
    • 0.2 Create your Datastream
    • 0.3 Set up the website
    • 0.4 Set up the mobile app
    • 0.5 Ingest Data to AEP through the Website
    • 0.6 Ingest Data to AEP through the Mobile App
    • 0.7 Visualize your own Real-time Customer Profile - UI
    • 0.8 See your Real-time Customer Profile in action in the Call Center
    • 0.9 Set up and use the AEP API to visualize your Real-Time Customer Profile
    • 0.10 Install the Experience Platform Debugger Extension
    • 0.11 What if I want to demonstrate basic AEP concepts directly on a live website?
  • 1 - Adobe Experience Platform Data Collection and the Web SDK extension
    • 1.1 Understanding Adobe Experience Platform Data Collection
    • 1.2 Edge Network, Datastreams and Server Side Data Collection
    • 1.3 Introduction to Adobe Experience Platform Data Collection
    • 1.4 Client-side Web Data Collection
    • 1.5 Implement Adobe Analytics and Adobe Audience Manager
    • 1.6 Implement Adobe Target
    • 1.7 XDM Schema requirements in Adobe Experience Platform
    • Summary and Benefits
  • 2 - Data Ingestion
    • 2.1 Explore the Website
    • 2.2 Configure Schemas and Set Identifiers
    • 2.3 Configure Datasets
    • 2.4 Data Ingestion from Offline Sources
    • 2.5 Data Landing Zone
    • Summary and Benefits
  • 3 - Real-time Customer Profile
    • 3.1 Visit the website
    • 3.2 Visualize your own real-time customer profile - UI
    • 3.3 Visualize your own real-time customer profile - API
    • 3.4 Create a segment - UI
    • 3.5 Create a segment - API
    • 3.6 See your Real-time Customer Profile in action in the Call Center
    • Summary and benefits
  • 4 - Query Service
    • 4.0 Prerequisites
    • 4.1 Getting Started
    • 4.2 Using the Query Service
    • 4.3 Queries, queries, queries... and churn analysis
    • 4.4 Generate a dataset from a query
    • 4.5 Query Service and Power BI
    • 4.6 Query Service and Tableau
    • 4.7 Query Service API
    • Summary and benefits
  • 5 - Intelligent Services
    • 5.1 Customer AI - Data Preparation (Ingest)
    • 5.2 Customer AI - Create a New Instance (Configure)
    • 5.3 Customer AI - Scoring Dashboard and Segmentation (Predict & Take Action)
  • 6 - Real-time CDP - Build a segment and take action
    • 6.1 Create a segment
    • 6.2 Review how to configure DV360 Destination using Destinations
    • 6.3 Take Action: send your segment to DV360
    • 6.4 Take Action: send your segment to an S3-destination
    • 6.5 Take Action: send your segment to Adobe Target
    • 6.6 External Audiences
    • 6.7 Destinations SDK
    • Summary and benefits
  • 7 - Adobe Journey Optimizer: Orchestration
    • 7.1 Create your event
    • 7.2 Create your journey and email message
    • 7.3 Update your Data Collection property and test your journey
    • Summary and benefits
  • 8 - Adobe Journey Optimizer: External data sources and custom actions
    • 8.1 Define an event
    • 8.2 Define an external data source
    • 8.3 Define a custom action
    • 8.4 Create your journey and messages
    • 8.5 Trigger your journey
    • Summary and benefits
  • 9 - Adobe Journey Optimizer: Offer Decisioning
    • 9.1 Offer Decisioning 101
    • 9.2 Configure your offers and decision
    • 9.3 Prepare your Data Collection Client property and Web SDK setup for Offer Decisioning
    • 9.4 Combine Adobe Target and Offer Decisioning
    • 9.5 Use your decision in an email
    • 9.6 Test your decision using the API
    • Summary and benefits
  • 10 - Adobe Journey Optimizer: Event-based Journeys
    • 10.1 Configure an event-based journey - Order Confirmation
    • 10.2 Configure a batch-based newsletter journey
    • 10.3 Apply personalization in an email message
    • 10.4 Setup and use push notifications
    • 10.5 Create a business event journey
    • Summary and benefits
  • 11 - Customer Journey Analytics - Build a dashboard using Analysis Workspace on top of Adobe Experie
    • 11.1 Customer Journey Analytics 101
    • 11.2 Connect Adobe Experience Platform Data Sets in Customer Journey Analytics
    • 11.3 Create a Data View
    • 11.4 Data Preparation in Customer Journey Analytics
    • 11.5 Visualization using Customer Journey Analytics
    • Summary and benefits
  • 12 - Ingest & Analyze Google Analytics data in Adobe Experience Platform with the BigQuery Source Co
    • 12.1 Create your Google Cloud Platform Account
    • 12.2 Create your first query in BigQuery
    • 12.3 Connect GCP & BigQuery to Adobe Experience Platform
    • 12.4 Load data from BigQuery into Adobe Experience Platform
    • 12.5 Analyze Google Analytics Data using Customer Journey Analytics
    • Summary and benefits
  • 13 - Real-Time CDP: Segment Activation to Microsoft Azure Event Hub
    • 13.1 Configure your Microsoft Azure EventHub environment
    • 13.2 Configure your Azure Event Hub Destination in Adobe Experience Platform
    • 13.3 Create a segment
    • 13.4 Activate segment
    • 13.5 Create your Microsoft Azure Project
    • 13.6 End-to-end scenario
    • Summary and benefits
  • 14 - Real-Time CDP Connections: Event Forwarding
    • 14.1 Create a Data Collection Event Forwarding property
    • 14.2 Update your Datastream to make data available to your Data Collection Event Forwarding property
    • 14.3 Create and configure a custom webhook
    • 14.4 Create and configure a Google Cloud Function
    • 14.5 Forward events towards the AWS ecosystem
    • Summary and benefits
  • 15 - Stream data from Apache Kafka into Adobe Experience Platform
    • 15.1 Introduction to Apache Kafka
    • 15.2 Install and configure your Kafka cluster
    • 15.3 Configure HTTP API Streaming endpoint in Adobe Experience Platform
    • 15.4 Install and configure Kafka Connect and the Adobe Experience Platform Sink Connector
    • Summary and benefits
Powered by GitBook
On this page
  • 15.1.1 Introduction
  • 15.1.2 Kafka Terminology
  • Message
  • Topic, partitions, offsets
  • Brokers
  • Replication
  • Producers
  • Message keys
  • Consumers
  • Zookeeper
  1. 15 - Stream data from Apache Kafka into Adobe Experience Platform

15.1 Introduction to Apache Kafka

Introduction to Apache Kafka

15.1.1 Introduction

Every organization starts off in a very simple way from a data perspective. An organization's data ecosystem starts with one source system and one target. The source system sends data to the target system, and that's it. Easy, right? If only it would remain that simple. Organizations quickly outgrow that simple setup and the number of source systems and target systems quickly increases. All these different sources and destinations need to exchange data with eachother, and things quickly become very complex. For instance, if an organization has 4 sourcs and 6 destinations and all these applications need to speak with eachother, you'll need to build 24 integrations... Each of those integrations comes with its own difficulties:

  • protocol: how is data transported (TCP, HTTP, REST, FTP, ...)

  • data format: how is the data parsed (binary, CSV, JSON, Parquet, ...)

  • data schema and evolution: what is the data model and how will it evolve?

Additionally, each time you connect a source system to a destination system, there'll be an increased load on those systems from that connection.

So, how do you solve this? That's where Apache Kafka comes into the picture. Apache Kafka allows an organization to decouple data streams and systems by providing a commo, high-throughput, distributed messaging system. Source systems will send their data into Apache Kafka, and destination systems will consume data from Apache Kafka.

Think of all the types of data sources experience businesses have to manage:

  • website events

  • mobile app events

  • POS events

  • CRM data

  • callcenter data

  • transaction history

  • ...

And think of all the types of destinations experience businesses use in their ecosystem which all might need data from those source systems:

  • CRM

  • data lake

  • email system

  • audit

  • analytics

  • ...

Apache Kafka was created by LinkedIn and is now an open source project mainly maintained by Confluent. Apache Kafka provides a distributed, resilient architecture that is fault tolerant. It can scale horizontally to 100s of brokers and can scale to millions of messages per second. It provides a high performance with latencies of less than 10ms which is ideal for real-time use cases.

A couple of use case examples:

  • Messaging System

  • Activity Tracking

  • Gather metrics from many different locations

  • Application logs gathering

  • Stream processing (with Kafka Streams API or Spark)

  • Decoupling of system dependencies

  • Integration with Spark, Flink, Storm, Hadoop and many other Big Data technologies.

For instance:

  • Netflix uses Kafka to apply recommendations in real-time while you're watching TV shows

  • Uber uses Kafka to gather user, taxi and trip data in real-time to compute and forecast demand and compute surge pricing in real-time

  • LinkedIn uses Kafka to prevent spam, collect user interactions to make better connection recommendations in real-time

For all these use cases, Kafka is only used as a transportation mechanism. Kafka is really good at moving data between applications.

15.1.2 Kafka Terminology

Message

A message is a communication sent by a system into Kafka. A message contains a payload, and a payload contains data elements. For instance, an experience event sent by a website into Adobe Experience Platform is considered as a message.

Topic, partitions, offsets

A topic is a particular stream of data, similar to a table in a database. You can have as many topics as you want, and a topic is identified by its name. Topics are split in partitions. Each partition is ordered and each message within a partition gets an incremental id, which is called offset. Messages are stored in a topic, on a partition and is referenced using an offset. Messages are kept only for a limited time (default is 1 week). Once a message is written to a partition, it can't be changed anymore.

Brokers

A broker is similar to a server. A Kafka cluster is composed of multiple brokers (servers). Each broker is identified with an ID and contains certain topic partitions.

Replication

Kafka is a distributed system. One of the important things of a distributed system is that data is securely stored and as such, replication is needed. After all, when one broker (server) goes down, another broker (server) should still be able to provide access to messages that were initially stored on the broker that went down. Replication will create copies of messages across multiple brokers to guarantee that no data is lost.

Producers

How is data sent to Kafka? That's the role of a producer. A producer connects to a source system and takes data from the source system, and then writes that data to topics onto partitions. Based on the configuration of your Kafka cluster, producers will automatically know which broker and partition to write to. In a distributed system with multiple brokers and replication strategy, a producer will store data randomly across multiple brokers, which means that it will do load balancing automatically.

Message keys

Producers can choose to send a key with the message. A key can be any string, number, etc. If no key is provided, a message will be sent randomly to brokers. If a key is sent, then all messages for that key will always go to the same partition. A message key as such is used to order messages based on a specific field.

Consumers

Consumers read data from an Apache Kafka topic and then share that data with destination systems. Consumers know which broker to read from. Data is read by a consumer in order, within each partition. Consumers read data in consumer groups.

Zookeeper

ZooKeeper is essentially a service for distributed systems offering a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems. Zookeeper needs to be running before you can use Apache Kafka, and Zookeeper is sort of the master of ceremony for Kafka, managing distributed services in the backend while Kafka produces and consumes events.

You have finished this exercise.

Previous15 - Stream data from Apache Kafka into Adobe Experience PlatformNext15.2 Install and configure your Kafka cluster

Last updated 2 years ago