Join multiple data streams for real-time analytics

This article provides a walkthrough on how to join multiple data streams for real-time analytics.

circle-info

Before performing a join on multiple data streams, you must have already deployed Upsolver and created data sources.

Performing a join on multiple data streams is easy with Upsolver.

This guide provides the instructions on joining:

  • impressions: primary data source with ad campaign information

  • clicks: secondary data stream tracking number of clicks on an ad

circle-exclamation

Create an Athena data output and define a data source

1. Click on Outputs on the left and then New on the right upper corner.

2. Select Amazon Athena as the data output.

3. Click Add to add as many data sources as you need. Click Next to continue.

Join two data streams together and perform data transformation

1. Select the SQL window from the upper right hand corner.

2. The sample SQL below performs a LEFT OUTER JOIN between impressions and clicks data streams.

Behind the scenes, the LEFT OUTER JOIN is creating a lookup table, enabling users to index data by a set of keys and then retrieve the results in milliseconds.

circle-info

Read more about Upsolver lookup tables herearrow-up-right.

Define Athena output parameters

1. Define storage, database, and table information for your Athena environment and click Next.

2. Define the compute cluster that you would like to use and the time range of the data you would like to output.

circle-info

Keep in mind that setting Ending At to Never means the output will be a continuous stream.

3. Click Deploy.

Check output data and run analytics

1. Check to make sure the output data is up to date by clicking on the Progress tab.

2. Run a query in Athena to make sure you get the correct results.

What’s next?

Use Upsolver to index less data into Splunkchevron-right

Last updated

Was this helpful?