Event Stream Processor Matrix

We published our first ever UI-focused post on Top JavaScript Dynamic Table Libraries the other day and got some valuable feedback – thanks!

We are back to talking about the backend again.  Our Search Analytics and Scalable Performance Monitoring services/products accept, process, and store huge amounts of data.  One thing both of these services do is process a stream of events in real-time (and batch, of course).  So what solutions are there that help one process data in real-time and perform some operations on a rolling window of data, such as the last 5 or 30 minutes of incoming event stream?  We know of several solutions that fit that bill, so we decided to put together a matrix with essential attributes of those tools in order to compare them and make our pick.  Below is the matrix we came up with.  If you are viewing this on our site, the table is likely going to be too wide, but it should look find in a proper feed reader.

If you like working on systems that handle large volumes of data, like our Search Analytics and Scalable Performance Monitoring services, we are hiring world-wide.

Matrix part 1:

License Language Scaling Add or change rules on the fly Other infra needed Rule types
Esper GPL2, commercial java Scale up yes none Declarative, query-based
Drools Fusion ASL 2.0 java Scale up yes none Declarative, mostly rule based, but support queries too
FlumeBase ASL 2.0 java Horizontal: natural sharding on top of Flume yes Flume Declarative, query-based
Storm EPL 1.0 clojure Horizontal Can be implemented on top of Zookeeper ZeroMQ, Zookeeper Provides only low level primitives(like grouping). Rule engine should be implemented manually on top.
S4 ASL 2.0 java Horizontal Can be implemented on top of Zookeeper Zookeeper Provides set of low level primitives. Somehow correlation support via joins. Documentation have a “windowing” section, but it empty.
Activeinsight CPAL 1.0, commercial java Horizontal yes Declarative, Query-like
Kafka APL 2.0 java Horizontal Zookeeper Set of low level primitives

Matrix part 2:

Docs / examples Maturity Community URL Notes
Esper very good mature, stable medium esper.codehaus.org
Drools Fusion good 3 years, stable small jboss.org/drools/drools-fusion.html
FlumeBase good alpha small flumebase.org
Storm exists used in production growing very fast tech.backtype.com good deployment features
S4 average alpha, butused in production medium (will grow under ASF) s4.io
Activeinsight poor unknown unknown activeinsight.org
Kafka good used in production small (will grow under ASF) incubator.apache.org/kafka

So there you have it – we hope you find this useful.  If you have any comments or questions, tweet us (@sematext) or leave a comment here.  If you like working on systems that handle large volumes of data, like our Search Analytics and Scalable Performance Monitoring services, we are hiring world-wide.