← Projects

NEXMARK Benchmark

2023ProjectView on GitHub

Implementation of the NEXMARK streaming-systems benchmark suite against Apache Flink, used as the workload generator for my BU master's thesis on dynamic checkpointing. NEXMARK simulates an online auction: streams of person, auction, and bid events at configurable rates. Each query stresses a different streaming primitive — joins, aggregations, windowing, sessions — making it the standard rig for measuring tail latency, throughput, and checkpoint cost on bursty workloads. Wired into Flink with the RocksDB state backend so checkpoint behavior could be measured under realistic state sizes.

Highlights

  • Implements the standard NEXMARK queries (joins, aggregations, windows, sessions)
  • Configurable event-rate driver for steady-state and burst workloads
  • RocksDB state backend wired in for realistic state-size measurements
  • Used as the workload generator for the dynamic-checkpointing thesis
  • Captures throughput, tail latency, and checkpoint cost per query

Tech

JavaApache FlinkRocksDBStreaming

The canonical source for this project is on GitHub.

View on GitHub

README · github.com/ArkashJ/NEXMARK-Benchmark

Sample Queries

from Nexmark

Usage (run query5 as an example)

compile

mvn clean package

submit job

0. start kafka

  • Suppose Kafka Server is 192.168.1.180. Download zookeeper and kafka on the server.
  • set dataDir in apache-zookeeper-3.8.0-bin/conf/zoo.cfg
  • set listeners and log.dirs in kafka_2.12-3.3.1/config/server.properties
  • start server
./apache-zookeeper-3.8.0-bin/bin/zkServer.sh start

./kafka_2.12-3.3.1/bin/kafka-server-start.sh ./kafka_2.12-3.3.1/config/server.properties

1. delete previous Kafka topic

  • show all kafka topics

./kafka_2.12-3.3.1/bin/kafka-topics.sh --bootstrap-server 192.168.1.180:9092 --list

  • delete existing one

./kafka_2.12-3.3.1/bin/kafka-topics.sh --delete --bootstrap-server 192.168.1.180:9092 --topic query5_sink

./kafka_2.12-3.3.1/bin/kafka-topics.sh --delete --bootstrap-server 192.168.1.180:9092 --topic query5_src

2. create Kafka topic

./kafka_2.12-3.3.1/bin/kafka-topics.sh --create --bootstrap-server 192.168.1.180:9092 --topic query5_src

./kafka_2.12-3.3.1/bin/kafka-topics.sh --create --bootstrap-server 192.168.1.180:9092 --topic query5_sink

3. Run Flink Cluster.

4. Submit job KafkaSourceBid-jar-with-dependencies.jar with the following Program Arguments:

--broker 192.168.1.180:9092 --kafka-topic query5_src --ratelist 25000_900000

so it will write source events to kafka.

5. Then Submit job Query5-jar-with-dependencies.jar* with the following Program Arguments:

--broker 192.168.1.180:9092 --src-topic query5_src --sink-topic query5_sink --kafka-group 0