February 2015

Kafka Poll: Version You Use?

With Kafka 0.8.2 and 0.8.2.1 being released and with the updated SPM for Kafka monitoring over 100 Kafka metrics, we thought it would be good to see which Kafka versions are being used in the wild. Kafka 0.7.x was a strong and stable release used by many. The 0.8.1.x release has been out since March 2014. Kafka 0.8.2.x has been out for just a little while, but…. are there any people who are either already using it (we are!) or are about to upgrade to it? Please tweet this poll and help us spread the word, so we can get a good, statistically significant results. We’ll publish the results here and via @sematext (follow us!) in a week.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results. We’ll publish the results here and via @sematext (follow us!) in a week.

Solr Cookbook, 3rd Edition — Now Available and includes Solr 5.0

Hot off the press: a brand new Solr Cookbook! One of Sematext’s Solr and Elasticsearch experts — and authors — Rafał Kuć, has just published the third and latest edition of Solr Cookbook. This edition covers both Solr 4.x (based on the newest 4.10.3 version of Solr) and the just-released Solr 5.0.

Similar to previous Solr Cookbooks, Rafal updated the book significantly — half of the previous content has been changed — and rewrote all of the recipes.

Chapter List

Here’s a list of the chapters:

Apache Solr Configuration
Indexing Your Data
Analyzing Your Text Data
Querying Solr
Faceting
Improving Solr Performance
In the Cloud
Using Additional Solr Functionalities
Dealing with Problems
Real-life Situations

For more information about Solr Cookbook, Third Edition — including info on getting a free chapter — check out the Packt Publishing web page dedicated to it. The book is available in both electronic and paperback versions. Even better, here is a discount code you can use for 20% off (valid until March 22, 2015; see details for applying code below*): scte20

Need Some Solr Expertise?

Rafal isn’t the only Solr expert at Sematext; we’ve got several more who have helped 100+ clients to architect, scale, tune, and successfully deploy their Solr-based products. We also offer 24/7 production support for Solr and Elasticsearch. Here’s more info about our professional services, which also include Elasticsearch and Logging consulting. You can also monitor Solr performance (and many other platforms) with SPM Performance Monitoring.

Have some feedback or questions for Rafal?

He’d love to hear from you — get him @kucrafal

——-

* Using discount code:

Set up a free Packt account or log into your existing account
Add the title “Solr Cookbook – Third Edition” in the cart
Click on ‘View Cart’
Then in the “Do you have a promo code?” field enter scte20
Click on the “Apply” button for the discount to get applied

Account Sharing in SPM, Logsene and Site Search Analytics

Using Cloud (aka SaaS) applications is natural for most of us — simply sign up with your email, login and then use the service within minutes. The Cloud works particularly well with consumer-oriented services. Businesses, however, have slightly different needs.

Up until now, Sematext Apps offered only App sharing. For instance, if you had an SPM App for monitoring your Elasticsearch cluster, another SPM App for monitoring Spark, and a Logsene App where you shipped all your logs, you could invite one or more of your teammates to each one of those apps separately, and give them either Admin or User role. This works fine when you have just a few teammates or just a few Apps.

Larger organizations (aka Enterprises), however, have more complex needs due to their size and structure: they often have multiple departments and teams, many of whom have different roles and permissions.

With more and more large enterprises using Sematext Apps we recently introduced a new feature called “Account Sharing” to our SPM, Logsene and Site Search Analytics solutions.

Example of Account Sharing with different Roles assigned to team members

As the diagram above shows, Account Sharing lets the Account Owner add more people to his Account and give them different roles – Admin, User, or Billing Admin. The benefit of Account Sharing over App Sharing is that by sharing your Account you can easily share all your Apps, Alerts, etc. with your teammates, without having to resort to all people using the same username/password for the shared account. Admins can invite additional people, Billing Admins have access to payment details, and Users can’t modify other people’s Alerts and other settings, though they can create, modify, and delete their own.

Once you have access to multiple Accounts you can easily switch between them, as shown here:

One can have access to an unlimited number of Accounts. Switching between them is seamless and does not require one to log out and log in. While App sharing is still possible (and we have no plans to remove it), Account Sharing makes teamwork much simpler!

Kafka 0.8.2 Monitoring Support

SPM Performance Monitoring is the first Apache Kafka monitoring tool to support Kafka 0.8.2. Here are all the details:

Shiny, New Kafka Metrics

Kafka 0.8.2 has a pile of new metrics for all three main Kafka components: Producers, Brokers, and Consumers. Not only does it have a lot of new metrics, the whole metrics part of Kafka has been redone — we worked closely with Kafka developers for several weeks to bring order and structure to all Kafka metrics and make them easy to collect, parse and interpret.

We could list all the Kafka metrics you can get via SPM, but in short — SPM monitors all Kafka metrics and, as with all things SPM monitors, all these metrics are nicely graphed and are filterable by server name, topic, partition, and everything else that makes sense in Kafka deployments.

103 Kafka metrics:

Broker: 43 metrics
Producer: 9 metrics
New Producer: 38 metrics
Consumer: 13 metrics

You will be hard-pressed to find another solution that can monitor that many Kafka metrics out of the box! And if you want to do something with your Kafka logs, Logsene will gladly make them searchable for you!

Needless to say, SPM shows the most sought after Kafka metric – the Consumer Lag (see the screenshot below).

Screenshot – Kafka Metrics Overview (click to enlarge)

Screenshot – Consumer Lag (click to enlarge)

Monitoring Kafka in Context

Running Kafka alone is pointless. On one side you process or collect data and push it into Kafka. On the other side you consume that data (maybe processing it some more) and in the end this data typically ends up landing in some data store. Kafka is often used with data processing frameworks like Spark, Storm and Hadoop, or data stores like Cassandra and HBase, search engines like Elasticsearch and Solr, and so on. Wouldn’t it be nice to have a single place to monitor all of these systems? With alerts and anomaly detection? And letting you collect and search all their logs? Guess what? SPM and Logsene do exactly that — they can monitor all of these technologies and make all their logs searchable!

Take a Test Drive — It’s Easy and Free to Get Started

Like what you see here? Sound like something that could benefit your organization? Then try SPM for Free for 30 days by registering here. There’s no commitment and no credit card required.

HAProxy Monitoring Support

New functionality is rolling out in SPM Performance Monitoring! Watch this space for future posts on Transaction Tracing, Global and App-specific Server Views, Kafka 0.8.2 monitoring and other cool stuff. For this post, those of you who use HAProxy are in luck as we just added monitoring support for this popular TCP/HTTP load balancer.

Screenshot – HAProxy Session Rate (click to enlarge)

HAProxy Metrics

SPM collects key metrics from the HAProxy load balancer of the underlying proxies/servers, as you can see in the chart below.

Metric Name	Description
status	1 (UP/OPEN) 0 (DOWN)
downtime	total downtime (in seconds)
rate	number of sessions per second over last elapsed second
rate_max	max number of new sessions per second
rate_lim	limit on new sessions per second
scur	current sessions
smax	max sessions
slimit	sessions limit
stot	total sessions
lbtot	total number of times a server was selected
bin	bytes in
bout	bytes out
dreq	denied requests
dresp	denied responses
ereq	error requests
eresp	response errors
econ	connection errors
wretr	retries (warning)
wredis	redispatches (warning)
weight	server weight (server), total weight (backend)
act	server is active (server), number of active servers (backend)
bck	server is backup (server), number of backup servers (backend)

You can create threshold-based or machine learning-based anomaly detection on any of these metrics, of course, and you can also rely on heartbeat alerts to detect any HAProxy daemon going down. Any alerts can be emailed or you can use any of the SPM Alerts Integrations such as PagerDuty, HipChat, Slack, Nagios, or any other WebHook.

See for Yourself

You can check out SPM’s live demo and see some more of SPM’s monitoring, alerting and anomaly detection functionality. In addition to native monitoring for apps like Solr, Elasticsearch, Hadoop, HBase, Spark, Cassandra, Kafka, Storm, and many more, SPM also integrates with with Logsene Log Management and Analytics to add centralized logging functionality and correlation of metrics, logs, alerts, anomalies, and events.

Take a Test Drive — It’s Easy and Free to Get Started

Like what you see here? Sound like something that could benefit your organization? Then try SPM (and Logsene, too) for Free for 30 days by registering here. There’s no commitment and no credit card required.

Using Elasticsearch Mapping Types to Handle Different JSON Logs

By default, Elasticsearch does a good job of figuring the type of data in each field of your logs. But if you like your logs structured like we do, you probably want more control over how they’re indexed: is time_elapsed an integer or a float? Do you want your tags analyzed so you can search for big in big data? Or do you need it not_analyzed, so you can show top tags via the terms aggregation? Or maybe both?

In this post, we’ll look at how to use index templates to manage multiple types of logs across multiple indices. Also, we’ll explain how to use logging tools (such as Logstash and rsyslog) to handle JSON logging and specify types.

Elasticsearch Mapping and Logs

As you may already know, to control these things in Elasticsearch you’ll need to define a mapping. This works similarly in Logsene, our log analytics SaaS, because it uses Elasticsearch and exposes its API.

With logs you’ll probably use time-based indices, because they scale better (in Logsene, for instance, you get daily indices). That said, to make sure the mapping you define today applies to the index you create tomorrow, you need to define it in an index template.

Managing Multiple Types

Mappings provide a nice abstraction when you have to deal with multiple types of structured data. Let’s say you have two apps generating logs of different structures: both have a timestamp field, but one recording logins has a user field, and another one recording purchases has an amount field.

To deal with this, you can define the timestamp field in the _default_ mapping which applies to all types. Then, in each type’s own mapping we’ll define fields unique to that mapping. The following snippet is an example that works with Logsene, provided that aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee is your Logsene app token. If you roll your own Elasticsearch, you can use whichever name you want, and make sure the template applies to your index pattern.

curl -XPUT 'logsene-receiver.sematext.com/_template/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee_MyTemplate' -d '{
 "template" : "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee*",
 "order" : 21,
 "mappings" : {
  "_default_" : {
   "properties" : {
    "timestamp" : { "type" : "date" }
   }
  },
  "firstapp" : {
   "properties" : {
    "user" : { "type" : "string" }
   }
  },
  "secondapp" : {
   "properties" : {
    "amount" : { "type" : "long" }
   }
  }
 }
}'

Sending JSON Logs to Specific Types

When you send a document to Elasticsearch by using the API, you have to provide an index and a type. You can use an Elasticsearch client for your preferred language to log directly to Elasticsearch or Logsene this way. But I wouldn’t recommend this, because then you’d have to manage things like buffering if the destination is unreachable.

Instead, I’d keep my logging simple and use a specialized logging tool, such as Logstash or rsyslog to do the hard work for me. Logging to a file is usually the easiest option. It’s local, and you can have your logging tool tail the file and send contents over the network. I usually prefer sockets (like syslog) because they let me configure Logstash/rsyslog to:
– write events in a human format to a local file I can tail if I need to (usually in development)
– forward logs without hitting disk if I need to (usually in production)
Whatever you prefer, I think writing to local files or sockets is better than sending logs over the network from your application. Unless you’re willing to do a reliability trade-off and use UDP, which gets rid of most complexities.

Opinions aside, here’s a Logstash configuration for tailing a file with JSON logs separated by a newline. Here’s how you’d send those documents to Logsene via the Elasticsearch API:

input {
 file {
 path => "/var/log/test"
 codec => "json"
 }
}

output {
 elasticsearch {
 host => "logsene-receiver.sematext.com"
 port => 80
 index => "LOGSENE-APP-TOKEN-GOES-HERE"
 index_type => "fileapp"
 protocol => "http"
 manage_template => false
 }
}

Note how the JSON codec does the parsing here, instead of the more expensive and maintenance-heavy approach with grok that we’ve shown in an earlier post on getting started with Logstash. Some applications let you configure the log format, so you can make them write JSON (Apache httpd, for example).

If you want to send JSON over syslog, there’s the JSON-over-syslog (CEE) format that we detailed in a previous post. You can use rsyslog’s JSON parser module to take your structured logs and forward them to Logsene:

module(load="imuxsock")        # can listen to local syslog socket
module(load="omelasticsearch") # can forward to Elasticsearch
module(load="mmjsonparse")     # can parse JSON

action(type="mmjsonparse")  # parse CEE-formatted messages

template(name="syslog-cee" type="list") {  # Elasticsearch documents will contain
  property(name="$!all-json")              # all JSON fields that were parsed
}

action(
  type="omelasticsearch"
  template="syslog-cee"                     # use the template defined earlier
  server="logsene-receiver.sematext.com"
  serverport="80"
  searchType="syslogapp"
  searchIndex="LOGSENE-APP-TOKEN-GOES-HERE"
  bulkmode="on"                                # send logs in batches
  queue.dequeuebatchsize="1000"                # of up to 1000
  action.resumeretrycount="-1"    # retry indefinitely (buffer) if destination is unreachable
)

To send a CEE-formatted syslog, you can run logger ‘@cee: {“amount”: 50}’ for example. Rsyslog would forward this JSON to Elasticsearch or Logsene via HTTP. Note that Logsene also supports CEE-formatted JSON over syslog out of the box if you want to use a syslog protocol instead of the Elasticsearch API.

Filtering by Type

Once your logs are in, you can filter them by type (via the _type field) in Kibana:

However, if you want more refined filtering by source, we suggest using a separate field for storing the application name. This can be useful when you have different applications using the same logging format. For example, both crond and postfix use plain syslog.