Announcement: Apache Storm Monitoring in SPM

There has been a “storm” brewing here at Sematext recently.  Fortunately this has nothing to do with the fierce winter weather many of us are experiencing in different parts of the globe — it’s actually a good kind of storm!  We’ve gotten a lot of requests to add Apache Storm support to SPM and we’re please to say that is now a reality.  SPM can already monitor Kafka, ZooKeeper, Hadoop, Elasticsearch, and more. As a matter of fact, we’ve just announced Redis monitoring, too!

Here’s why you should care:

  1. SPM users can see different Storm metrics in dynamic , real-time graphs, a big improvement from the standard Storm UI which only allows some time-specific snapshots.  Isn’t it better to see trends as opposed to static snapshots?  We certainly think so.
  2. SPM users can create an external link and share their charts with others (like a Mailing List or in a blog post) to get insight into problems without having to provide login credentials.  Here’s an example (you will see the chart even though you don’t know UN/PW):  https://apps.sematext.com/spm-reports/s/aQjuv5GdC1
  3. SPM also provides its users with common System and JVM-related metrics like CPU usage, memory usage, JVM heap size and pool utilization, among others.  This lets you troubleshoot performance issues better by allowing you to correlate  Storm-specific metrics with common System and JVM metrics.

Here are the Storm metrics SPM can now monitor:

  • Supervisors count
  • Topologies count
  • Supervisor total/free/used slots count
  • Topology workers/executors/tasks count
  • Topology spouts/bolts/state spouts count
  • Bolt emitted/transferred events
  • Bolt acked/executed/failed events
  • Bolt executed/processed latencies
  • Spout emitted/transferred events
  • Spout acked/failed events
  • Spout complete latency

Also important to note — users can add alerting rules for all metrics, including Algolerts and heartbeat alerts, as well as receive daily, weekly, and monthly performance reports via email.

Here are some of the graphs — click on them to see larger versions:

Overview

For observing the general state of the system

For observing the general state of the system

Acked-Failed Decrease

Check out how "acked" (blue line) decreased. It may be related to some problems with resources (e.g., CPU load)

Do you see how “acked” (blue line) decreased? It may be related to some problems with resources (e.g., CPU load)

Timing-Increased

Timing-Tncreased

Check out this “Timing” chart: see the spike at ~13:21? It seems that something is up with the CPU (again); it might be the “pressure” from Java GC (Garbage Collector)

Start-Topology-Workers

Start-Topology-Workers

On the first chart you can see how the counts of tasks and workers grew.  It is because a new topology (“job” in Storm terminology) started at 12:25.

Start-Topology

Start-Topology

The same as above: you can see that between 12:00 and 12:30 Storm Supervisor was restarted (something that works on each machine inside the cluster) and topology was added after restarting.

Give SPM a spin – it’s free to get going and you’ll have it up and running, graphing all your Storm metrics in 5 minutes!

If you enjoy performance monitoring, log analytics, or search analytics, working with projects like Elasticsearch, Solr, HBase, Hadoop, Kafka, Storm, we’re hiring planet-wide!