Introducing a Streaming Json Data Generator

Introducing a Streaming Json Data Generator

Have you ever needed to generate a realtime stream of json data in order to test an application or build a prototype? When thinking about a good source of streaming data, we often look to the Twitter stream as a solution, but that only gets us so far in prototyping scenarios and we often fall short because Twitter data only fits a certain amount of use cases. There are plenty of json data generator online (like json-generator, or mockaroo), but we couldn’t find an offline data generator for us to use in our testing and prototyping, so we decided to build one. We found it so useful, that we decided to open source it as well so other can make use of it in their own projects.  You can find the json-data-generator over on github.


We had a couple of needs when it came to generating data for testing purposes. They were as follows:

  • Generate json documents that are defined in json themselves. This would allow us to take existing schemas, drop them in to the generator, modify them a bit and start generating data that looks like what we expect in our application
  • Generate json with random data as values. This includes different types of random data, not just random characters, but things like random names, counters, dates, primitive types, etc.
  • Generate a constant stream of json events that are sent somewhere. We might need to send the data to a log file or to a Kafka Queue or something else.
  • Generate events in a defined order, at defined or random time periods in order to act like a real system.

We now have a data generator that supports all of these things that can be run on our own networks and produce streams of json data for applications to consume.

Using the json-data-generator

The json-data-generator has too many different configuration options and features to go over in a blog post like this.  For the full (and updated) documentation, please view the README on the github project page.  With that said, we’d like to show to the basics here in this post to give you an idea of how the json-data-generator might help you on your projects.

Downloading the generator

You can always find the most recent release over on github where you can download the bundle file that contains the runnable application and example configurations.  Head there now and download a release to get started!


The generator runs a Simulation which you get to define.  The Simulation can specify one or many Workflows that will be run as part of your Simulation.  The Workflows then generates Events and these Events are then sent somewhere.  You will also need to define Producers that are used to send the Events generated by your Workflows to some destination.  These destinations could be a log file, or something more complicated like a Kafka Queue.

You define the configuration for the json-data-generator using two configuration files.  The first is a Simulation Config.  The Simulation Config defines the Workflows that should be run and different Producers that events should be sent to.  The second is a Workflow configuration (of which you can have multiple).  The Workflow defines the frequency of Events and Steps that the Workflow uses to generate the Events.  It is the Workflow that defines the format and content of your Events as well.

For our example, we are going to pretend that we have a programmable Jackie Chan robot. We can command Jackie Chan though a programmable interface that happens to take json as an input via a Kafka queue and you can command him to perform different fighting moves in different martial arts styles.  A Jackie Chan command might look like this:

Now, we want to have some fun with our awesome Jackie Chan robot, so we are going to make him do random moves using our json-data-generator! First we need to define a Simulation Config and then a Workflow that Jackie will use.

Simulation Config

Let’s take a look at our example Simulation Config:

As you can see, there are two main parts to the Simulation Config. The Workflows name and list the workflow configurations you want to use.  The Producers are where the Generator will send the events to.  At the time of writing this, we have three supported Producers:

  • A Logger that sends events to log files
  • A Kafka Producer that will send events to your specified Kafka Broker
  • A Tranquility Producer that will send events to a Druid cluster.

You can find the full configuration options for each on the github page. We used a Kafka producer because that is how you command our Jackie Chan robot.

Workflow Config

The Simulation Config above specifies that it will use a Workflow called jackieChanWorkflow.json.  This is where the meat of your configuration would live.  Let’s take a look at the example Workflow config and see how we are going to control Jackie Chan:

The Workflow defines many things that are all defined on the github page, but here is a summary:

  • At the top are the properties that define how often events should be generated and if / when this workflow should be repeated. So this is like saying we want Jackie Chan to do a martial arts move every 400 milliseconds (he’s FAST!), then take a break for 1.5 seconds, and do another one.
  • Next, are the Steps that this Workflow defines.  Each Step has a config and a duration.  The duration specifies how long to run this step.  The config is where it gets interesting!
Workflow Step Config

The Step Config is your specific definition of a json event.  This can be any kind of json object you want.  In our example, we want to generate a Jackie Chan command message that will be sent to his control unit via Kafka.  So we define the command message in our config, and since we want this to be fun, we are going to randomly generate what kind of style, move, weapon, and target he will use.

You’ll notice that the values for each of the object properties look a bit funny.  These are special Functions that we have created that allow us to generate values for each of the properties.  For instance, the “random(‘KICK’,’PUNCH’,’BLOCK’,’JUMP’)” function will randomly choose one of the values and output it as the value of the “action” property in the command message.  The “now()” function will output the current date in an ISO8601 date formatted string.  The “double(1.0,10.0)” will generate a random double between 1 and 10 to determine the strength of the action that Jackie Chan will perform. If we wanted to, we could make Jackie Chan perform combo moves by defining a number of Steps that will be executed in order.

There are many more Functions available in the generator with everything from random string generation, counters, random number generation, dates, and even support for randomly generating arrays of data.  We also support the ability to reference other randomly generated values.  For more info, please check out the full documentation on the github page.

Once we have defined the Workflow, we can run it using the json-data-generator. To do this, do the following:

  1. If you have not already, go ahead and download the most recent release of the json-data-generator.
  2. Unpack the file you downloaded to a directory.
  3. Copy your custom configs into the conf directory
  4. Then run the generator like so:
    1. java -jar json-data-generator-1.1.0.jar jackieChanSimConfig.json

You will see logging in your console showing the events as they are being generated.  The jackieChanSimConfig.json generates events like these:

If you specified to repeat your Workflow, then the generator will continue to output events and send them to your Producer simulating a real world client, or in our case, continue to make Jackie Chan show off his awesome skills.  If you also had a Chuck Norris robot, you could add another Workflow config to your Simulation and have the two robots fight it out!  Just another example of how you can use the generator to simulate real world situations.


Now we understand that you might not have access to a super awesome Jackie Chan robot, but you might need to generate json events for your own application testing where the json-data-generator might come in handy.  You may need to simulate your audit logging software and the events that it generates, or simulate user checkin’s at local venues.  The possibilities are endless in terms of the types of events you can generate. You can find more examples that are included in the distribution, one that includes a kitchen sink example showing all the different Functions and possibilities of the generator.

We hope the community finds this useful! If you use it, let us know! If you happen to find any issues, please file issues or better yet, submit pull requests over on github!  If you need help using it, or just feel like letting us know that your using it, feel free to contact us or reach out to Andrew Serff on twitter or over on github!

Share this post