Elasticsearch Server by Rafal Kuc & Marek Rogozinski – now updated!

Use Elasticsearch now?  Thinking about using Elasticsearch?  Wish there was a comprehensive resource that pulled everything you ever wanted to know about Elasticsearch together in one place?  Fret not — you are in luck!

All Elasticsearch, all the time

Sematext engineer Rafał Kuć has co-authored (with Marek Rogozinski) not one, but two(!) different Elasticsearch books: Elasticsearch Server and Mastering Elasticsearch.  Considering that Elasticsearch has only been around a few years — not to mention how much is going on under the hood — it’s a pretty impressive accomplishment.  Even more impressive?  Rafal and Marek have just published a second edition of Elasticsearch Server that encompasses all the changes between Elasticsearch 0.20 and 1.0.  So if you wish you knew more about Elasticsearch, look no further.

Here’s a brief Q&A with Rafal to add some insight:

Q:  What has changed since the first edition of Elasticsearch Server?

A:  After releasing the first edition of the book, which happened to be the first book about Elasticsearch, we got a nice amount of comments and suggestions which we took into consideration when writing the second edition.  The first edition was based on Elasticsearch 0.20, so we already had a lot of material to work with when we were asked to write the second edition and take readers up to version 1.0.  Some of the features we decided to write about were aggregations, new function queries allowing extensive score control, snapshotting, and others.  Some features that are still used by Elasticsearch users, like faceting, did not need much updating.  But others, like percolator, had to be completely rewritten.

Q:  How much work was it?

A:  We tried to make the book as good as we could so the readers could enjoy it and learn from it.  And believe me, we both learned a lot during the writing of the first edition of the book and while writing Mastering Elasticsearch. We had a lot of comments both from the readers and from people working on the book’s Japanese translation.  Thanks Jun!

We incorporated all the comments and suggestion, but it took time, of course. We also wanted to fully restructure the book so that it flowed better.  Hopefully we achieved that. Of course, in addition to all that we had to rewrite major parts of the book to bring it up to date, review all the parts that we decided to leave in the book and make updates as needed, and then write the new sections.

Q:  Where can someone buy it?

A:  You can buy it from Amazon or direct from Packt Publishing.

JOB: Summer Marketing Internship

We are looking for a high-energy intern with diverse marketing skills to help generate demand for our products.  The internship will be demanding as we move at a fast pace and are extremely agile.  This person will work closely with a globally distributed team — US, Canada, Eastern Europe and Asia.  Our headquarters is located in Brooklyn, but we are open to applicants from anywhere.  Depending on the person, this role could be full- or part-time.

Experience and skill set we are looking for:

  • Communicates well in person, in writing and over the phone
  • Uses social media platforms like Twitter, Facebook and LinkedIn
  • Familiar with email campaign tools like (e.g., MailChimp, Campaign Monitor, Constant Contact, etc.)
  • Willingness to do a wide range of tasks and see them through to completion
  • Experience with graphic design tools
  • Willingness to learn in a highly technical environment
  • Background with software and/or IT consulting organizations is especially desirable
  • CRM experience a plus

Approximate internship dates are June 2 – August 29 though we are flexible.  Sound like you?  Then send your resume to mick.emmett@sematext.com

Going to Be in Austin on April 2nd? Then Check Out BV:IO

Live or work in Austin?  Like small conferences filled with smart, interesting technical people, a roster of great speakers, and innovation everywhere you look?  Great — you’ll fit right in a Bazaarvoice’s first ever public technical conference and hackathon to drive innovation in the social commerce space.  Get all the BV:IO event details here.

And since you are reading this blog, there’s a good chance you know about our founder and CEO, Otis Gospodnetic, and his expertise with all things Search and Big Data.  Otis has been invited to speak, and he goes on at 1:30 pm on Wednesday, April 2nd.  Otis will speak about “Open Source Search Evolution” and he’ll be available before and after the talk at the Sematext sponsor table to say hello and talk about SPM, Logsene, Site Search Analytics, Solr, Elasticsearch, Hadoop, NYC vs. Austin tech scenes, Brooklyn Lager vs. Lone Star…and whatever else you bring to our table.

If you’re thinking of attending BV:IO drop us a line at mick.emmett@sematext.com.  Hope to see you there!

Video and Presentation: Indexing and Searching Logs with Elasticsearch or Solr

Interested in log indexing using Elasticsearch or Solr?  Also interested in searching and analyzing logs in real time?

This topic really hits home for us since we released our log analytics tool, Logsene and we also offer consulting services for logging infrastructure.  If you are reading this and looking for a new opportunity then you might be interested to hear that we are hiring worldwide.

If you are into logging like we are, then you will want to check out this presentation delivered by Sematext’s own Radu Gheorghe to the NYC Search, Discovery and Analytics Meetup held recently at Pivotal Labs.  For the purposes of this presentation the term “logs” ranges from server logs and application events to metrics and even social media information.

The presentation has three parts:

  1. Overview of logging tools that play nicely with Elasticseach and Solr (like Logstash, Apache Flume or rsyslog)
  2. Performance tuning and scaling Elasticsearch and Solr
  3. Demo of an end-to-end solution

Here you go – enjoy!

Announcement: Coming Up in Site Search Analytics

Have you checked out Site Search Analytics yet?  If not, and if you think that gaining insight into user search behavior and experience is valuable information, then we’ve got something for you that’s battle-tested and ready to go.

This year we are adding some killer new features that will make SSA even more useful.  So, if you want to be enjoying benefits like:

  • Viewing real-time graphs showing search and click-through rates
  • Awareness of your top queries, top zero-hit queries, most seen and clicked on hits, etc.
  • Having a mechanism to perform search relevance A/B tests and a relevance feedback mechanism
  • Not having to develop, set up, manage or scale all the infrastructure needed for query and click log analysis
  • And many others — here is a full list of features and benefits

…then you will love the new functionality we have on the way.  After all, how can you improve search quality if you don’t measure it first and keep track of it?

Site Search Analytics
Site Search Analytics

Sound interesting?  Then check out a live demo.  SSA is 100% focused on helping you to improve the search experience of your customers and prospects.  And a better search experience translates into more traffic to your web site and greater awareness of your business.

Announcement: Percentiles added to SPM

In the spirit of continuous improvement, we are happy to announce that percentiles have recently been added to SPM’s arsenal of measurement tools.  Percentiles provide more accurate statistics than averages, and users are able to see 50%, 95% and 99% percentiles for specific metrics and set both regular threshold-based as well as anomaly detection alerts.  We will go more into the details about how the percentiles are computed in another post, but for now we want to put the word out and show some of the related graphs — click on them to enlarge them.  Enjoy!

Elasticsearch – Request Rate and Latency

pecentiles_es

Garbage Collectors Time

percentiles_gc

Kafka – Flush Time

percentiles_kafka_1

Kafka – Fetch/Produce Latency 1

percentiles_kafka_2

Kafka – Fetch/Produce Latency 2

percentiles_kafka_3

Solr Req. Rate and Latency 1

percentile_solr

Solr – Req. Rate and Latency 2

percentiles_solr_2

If you enjoy performance monitoring, log analytics, or search analytics, working with projects like Elasticsearch, Solr, HBase, Hadoop, Kafka, Storm, we’re hiring planet-wide!

Announcement: Redis Monitoring in SPM

Don’t worry, we didn’t just stop at Storm monitoring and metrics while improving SPM.  We’re also happy to announce support for Redis.

Specifically, here are some of the key Redis metrics SPM monitors:

  • Used Memory
  • Used Memory Peak
  • Used Memory RSS
  • Connected Clients
  • Connected Slaves
  • Master Last IO Seconds Ago
  • Keyspace Hits
  • Keyspace Misses
  • Evicted Keys
  • Expired Keys
  • Commands Processed
  • Keys count per db
  • To be expired keys count per db

Also, for all application types users can add alerting rules, heartbeat alerts, and Algolerts, as well as receive emails with performance reports for a given time period.

Enough with the words, these are what the graphs look like — click them to enlarge them:

Redis-Overview

Redis-Overview

Redis-Memory

Redis-Memory

Used memory/Used memory peak/Used memory RSS chart

Redis-Keyspace-Hits

Redis-Keyspace-Hits

Keyspace Hits chart

Redis-Expiring-Keys

Redis-Expiring-Keys

Expiring Keys chart

Redis-Evicted-Keys

Redis-Evicted-Keys

Evicted Keys chart

And we’re not done.  Watch this space for more SPM updates coming soon…

Give SPM a spin – it’s free to get going and you’ll have it up and running, graphing all your Redis metrics in 5 minutes!

If you enjoy performance monitoring, log analytics, or search analytics, working with projects like Elasticsearch, Solr, HBase, Hadoop, Kafka, Storm, we’re hiring planet-wide!

Announcement: Apache Storm Monitoring in SPM

There has been a “storm” brewing here at Sematext recently.  Fortunately this has nothing to do with the fierce winter weather many of us are experiencing in different parts of the globe — it’s actually a good kind of storm!  We’ve gotten a lot of requests to add Apache Storm support to SPM and we’re please to say that is now a reality.  SPM can already monitor Kafka, ZooKeeper, Hadoop, Elasticsearch, and more. As a matter of fact, we’ve just announced Redis monitoring, too!

Here’s why you should care:

  1. SPM users can see different Storm metrics in dynamic , real-time graphs, a big improvement from the standard Storm UI which only allows some time-specific snapshots.  Isn’t it better to see trends as opposed to static snapshots?  We certainly think so.
  2. SPM users can create an external link and share their charts with others (like a Mailing List or in a blog post) to get insight into problems without having to provide login credentials.  Here’s an example (you will see the chart even though you don’t know UN/PW):  https://apps.sematext.com/spm-reports/s/aQjuv5GdC1
  3. SPM also provides its users with common System and JVM-related metrics like CPU usage, memory usage, JVM heap size and pool utilization, among others.  This lets you troubleshoot performance issues better by allowing you to correlate  Storm-specific metrics with common System and JVM metrics.

Here are the Storm metrics SPM can now monitor:

  • Supervisors count
  • Topologies count
  • Supervisor total/free/used slots count
  • Topology workers/executors/tasks count
  • Topology spouts/bolts/state spouts count
  • Bolt emitted/transferred events
  • Bolt acked/executed/failed events
  • Bolt executed/processed latencies
  • Spout emitted/transferred events
  • Spout acked/failed events
  • Spout complete latency

Also important to note — users can add alerting rules for all metrics, including Algolerts and heartbeat alerts, as well as receive daily, weekly, and monthly performance reports via email.

Here are some of the graphs — click on them to see larger versions:

Overview

For observing the general state of the system

For observing the general state of the system

Acked-Failed Decrease

Check out how "acked" (blue line) decreased. It may be related to some problems with resources (e.g., CPU load)

Do you see how “acked” (blue line) decreased? It may be related to some problems with resources (e.g., CPU load)

Timing-Increased

Timing-Tncreased

Check out this “Timing” chart: see the spike at ~13:21? It seems that something is up with the CPU (again); it might be the “pressure” from Java GC (Garbage Collector)

Start-Topology-Workers

Start-Topology-Workers

On the first chart you can see how the counts of tasks and workers grew.  It is because a new topology (“job” in Storm terminology) started at 12:25.

Start-Topology

Start-Topology

The same as above: you can see that between 12:00 and 12:30 Storm Supervisor was restarted (something that works on each machine inside the cluster) and topology was added after restarting.

Give SPM a spin – it’s free to get going and you’ll have it up and running, graphing all your Storm metrics in 5 minutes!

If you enjoy performance monitoring, log analytics, or search analytics, working with projects like Elasticsearch, Solr, HBase, Hadoop, Kafka, Storm, we’re hiring planet-wide!