在现代分布式系统中,实时数据处理是一个关键需求。Apache Kafka 是一个开源的分布式事件流平台,广泛用于构建实时数据管道和流应用。本文将介绍如何使用 Java 和 Apache Kafka 构建一个简单的实时数据管道。
Apache Kafka 是一个高吞吐量、分布式的消息发布订阅系统,最初由 LinkedIn 开发并开源。Kafka 的核心概念包括:
一个典型的实时数据管道通常包含以下几个部分:
首先,确保你已经安装了以下工具:
在你的 pom.xml
文件中添加以下依赖项:
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
下面是一个简单的 Kafka Producer 示例代码,它将消息发送到指定的主题。
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
public class KafkaProducerExample {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092"); // Kafka broker 地址
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
String topic = "test-topic";
for (int i = 0; i < 10; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>(topic, "key" + i, "value" + i);
RecordMetadata metadata = producer.send(record).get();
System.out.printf("Sent record(key=%s value=%s) " +
"meta(partition=%d, offset=%d)\n",
record.key(), record.value(), metadata.partition(), metadata.offset());
}
producer.close();
}
}
接下来,编写一个 Kafka Consumer 来消费这些消息。
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class KafkaConsumerExample {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-group");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("test-topic"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("Received message: (key=%s value=%s) " +
"from partition=%d, offset=%d\n",
record.key(), record.value(), record.partition(), record.offset());
}
}
}
}
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1