ElasticSearch – Real time streaming and Indexing

This is a 2 min read

Feeding real time streams of data into elastic search is an interesting subject with lot of learning. Clearly elastic search is a special purpose NoSQL data store with abilities to index JSON documents in real time. How real time? Well its configurable but it can be as soon as you create or update. Obviously this post assumes basic knowledge of elastic search.

 Here are some of the options

Elastic Search “Rivers”

Simply put opening up a real time stream of data into Elastic Search through creating a special purpose ES index, so that real time stream flows like a river continuously into ES. How is this done? Think about, what if you can install an ES plugin for a particular source of data which can potentially give you real time streams like a twitter and create special index/type with your twitter authorisation credentials/streams of interest. Elastic search has pluggable rivers support for twitter, RabbitMQ, Wikipedia and some other interesting sources. Here is an interesting write-up on testing a twitter river.

#  Elastic Search as LogStash “output”

 If you have some idea about UNIX pipes with classical producer and consumer paradigm in background that’s exactly what LogStash is.

The target use cases here are to direct multiple kinds of log producers into desired consumers. But LogStash takes input sources, can apply optional filters and dumps it into output sinks. LogStash recommends Elastic Search for consuming and indexing log data to enable analytics and complex searches. Although you have the option to use different outputs apart from ES. Here is an interesting write-up on extending the same twitter river example using LogStash to load ES.

So if you real time streaming data “inputs” all you have to do is run LogStash to load it to ES “outputs”

# Elastic Search as Flume “sink”

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of real time streaming data into different storage destinations like Hadoop Distributed File System.