November 2013

Poll: Using SolrCloud or Not?

It’s been 9 months since we conducted a poll on SolrCloud usage. A lot of things can change in 9 months. SolrCloud itself went through a ton of development and bug fixing since our last poll. It’s time to see how many of us are using SolrCloud now, at the end of 2013.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.

ZooKeeper Poll Results

We’ve collected 50 votes in our ZooKeeper Usage Poll over the last few days. Here are the results so far:

66% of people use ZooKeeper directly
Another 16% use ZooKeeper indirectly
18% do not use ZooKeeper at all

This puts total ZooKeeper usage at over 80%. BUT:

Direct ZooKeeper usage being so high at 66% seems a little high and indirect usage being so low at 16% doesn’t feel quite right. ZooKeeper is used by Hadoop, HBase, SolrCloud, Kafka, Storm, and a number of other popular distributed systems that one would think indirect usage would be much higher than direct usage.

What’s your take on these numbers?

Announcement: ZooKeeper Performance Monitoring in SPM

You don’t see him, but he is present. He is all around us. He keeps things running. No, we are not talking about Him, nor about The Force. We are talking about Apache ZooKeeper, the under-appreciated, often not talked-about, yet super-critical component of almost all distributed systems we’ve come to rely on – Hadoop, HBase, Solr, Kafka, Storm, and so on. Our SPM, Search Analytics, and Logsene, all use ZooKeeper, and we are not alone – check our ZooKeeper poll.

We’re happy to announce that SPM can now monitor Apache ZooKeeper! This means everyone using SPM to monitor Hadoop HBase, Solr, Kafka, Sensei, and other applications that rely on ZooKeeper can now use the same monitoring and alerting tool – SPM – to monitor their ZooKeeper instances.

Please tweet about Performance Monitoring for ZooKeeper

Here’s a glimpse into what SPM for ZooKeeper provides – click on the image to see the full view or look at the actual SPM live demo:

Please tell us what you think – @sematext is always listening! Is there something SPM doesn’t monitor that you would really like to monitor? Please vote for tech to monitor!

Want to build highly distributed big data apps with us? We’re hiring good engineers (not just for positions listed on our jobs page), and we’re sitting on a heap of some pretty juicy big data!

Poll: Are You Using ZooKeeper?

In the last decade the world of distributed computing has exploded and Apache ZooKeeper is often at the center of it….which is why we just added ZooKeeper monitoring in SPM. Let’s see what percentage of us use ZooKeeper.

Please tweet so we can collect a large number of votes and get a statistically representative sample.

Please tweet about Poll: Are you using ZooKeeper?

Presentation: Scaling Solr with SolrCloud

Squeezing the maximal possible performance out of Solr / SolrCloud, and Elasticsearch and making them scale well is what we do on a daily basis for our clients. We make sure their servers are optimally configured and maximally utilized. Rafal Kuć gave a long, 75-minute talk on the topic of Scaling Solr with SolrCloud at Lucene Revolution 2013 conference in Dublin. Enjoy!

If you are interesting in working with Solr and/or Elasticsearch, we are looking for good people to join our team.

Meet Sematext at AWS re:Invent in Las Vegas

Are you at the AWS re:Invent conference in Las Vegas this week? Well, we are! Want to talk about Performance Monitoring? Alerting? Log Analytics? Search Analytics? Solr? Elasticsearch? Logstash? Kibana? Flume? Kafka? Morphline? We’re game! We’re also ready to chat, explain, and demo any of our products and services you’re interested in! We’re booth #1108 – stop by!

Presentation: Administering and Monitoring SolrCloud Clusters

Rafal Kuć gave two talks about Solr at Lucene Revolution. One of them, Administering and Monitoring SolrCloud Clusters is below.

If you are interesting in working with Solr and/or Elasticsearch, we are looking for good people to join our team.

Presentation: Solr for Indexing and Searching Logs

Since we’ve added Solr output for Logstash, indexing logs via Logstash has become a possibility. But what if you are not using (only) Logstash? Are there other ways you can index logs in Solr? Oh yeah, there are! The following slides are from Lucene Revolution conference that just took place in Dublin where we talked about indexing and searching logs with Solr.

If you are interesting in Log Analytics and would like to work on things like Logsene, we are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.

Presentation: Solr for Analytics

Last week, a bunch of Sematextans were at Lucene Revolution conference in Dublin, where we were both sponsors and presenters. There were a number of interesting talks and we saw great interest in SPM from people who want to use it to monitor Solr (and more) and want to send their logs to Logsene, which confirmed Sematext is going in the right direction and is creating products and services that are in demand and solve real-world problems.

Below are the slides from one of our four talks from the conference. This talk was about our experience using Solr as an alternative data store used for SPM, in which we share our findings and observations about using Solr for large scale aggregations, analytical queries, applications with high write throughput, performance improvements in Solr 4.5, the lower memory footprint of DocValues, and more.

If you are interesting in this sort of stuff, we are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.

Logstash Performance Monitoring (1.2.2 vs 1.2.1)

We’re using Logstash to collect some of our own logs here at Sematext, because it can easily forward events to Logsene via the elasticsearch_http output. And we recently upgraded to the latest version 1.2.2.

Since we’re obsessed with performance monitoring, the first question that came up was: did the upgrade make any difference in terms of load? So we did a test to find out.

Context

Logstash runs on a JVM, so we’re already monitoring it with SPM for Java Apps. So we just put it through a steady, moderate logs of about 70 events per second for a while, running both 1.2.1 and 1.2.2, on the same machine.

The configuration remained the same on both machines:

a file input that was tailing a single file using the multiline codec
a couple of grok filters and a geoIP filter, that we’ll talk about in later posts
resulting events are fed to Logsene through the elasticsearch_http output, because Logsene exposes the Elasticsearch API

Results

The biggest difference we’ve seen was in memory usage. The new version uses about 30% less memory:

Next, there was a significant difference in the amount of garbage collection going on. Again, some 30% difference in favor of 1.2.2:

We also had slightly less CPU usage. The difference wasn’t significant in our case because we didn’t have a lot of traffic: Logstash is installed on every host and there’s no “central” Logstash to process lots of data. I’m sure that if we had a more complex configuration and/or more traffic, we would have seen much less CPU.

Conclusions

Logstash is getting lighter and lighter, which seems to address its only criticism. You definitely get less memory usage and less garbage collection, so we definitely recommend upgrading to 1.2.2. And if you want to monitor its usage, you can always use our SPM. Happy stashing!