Recipe: rsyslog + Kafka + Logstash

This recipe is similar to the previous rsyslog + Redis + Logstash one, except that we’ll use Kafka as a central buffer and connecting point instead of Redis. You’ll have more of the same advantages:

  • rsyslog is light and crazy-fast, including when you want it to tail files and parse unstructured data (see the Apache logs + rsyslog + Elasticsearch recipe)
  • Kafka is awesome at buffering things
  • Logstash can transform your logs and connect them to N destinations with unmatched ease

There are a couple of differences to the Redis recipe, though:

  • rsyslog already has Kafka output packages, so it’s easier to set up
  • Kafka has a different set of features than Redis (trying to avoid flame wars here) when it comes to queues and scaling

As with the other recipes, I’ll show you how to install and configure the needed components. The end result would be that local syslog (and tailed files, if you want to tail them) will end up in Elasticsearch, or a logging SaaS like Logsene (which exposes the Elasticsearch API for both indexing and searching). Of course you can choose to change your rsyslog configuration to parse logs as well (as we’ve shown before), and change Logstash to do other things (like adding GeoIP info).

Getting the ingredients

First of all, you’ll probably need to update rsyslog. Most distros come with ancient versions and don’t have the plugins you need. From the official packages you can install:

If you don’t have Kafka already, you can set it up by downloading the binary tar. And then you can follow the quickstart guide. Basically you’ll have to start Zookeeper first (assuming you don’t have one already that you’d want to re-use):

bin/zookeeper-server-start.sh config/zookeeper.properties

And then start Kafka itself and create a simple 1-partition topic that we’ll use for pushing logs from rsyslog to Logstash. Let’s call it rsyslog_logstash:

bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic rsyslog_logstash

Finally, you’ll have Logstash. At the time of writing this, we have a beta of 2.0, which comes with lots of improvements (including huge performance gains of the GeoIP filter I touched on earlier). After downloading and unpacking, you can start it via:

bin/logstash -f logstash.conf

Though you also have packages, in which case you’d put the configuration file in /etc/logstash/conf.d/ and start it with the init script.

Configuring rsyslog

With rsyslog, you’d need to load the needed modules first:

module(load="imuxsock")  # will listen to your local syslog
module(load="imfile")    # if you want to tail files
module(load="omkafka")   # lets you send to Kafka

If you want to tail files, you’d have to add definitions for each group of files like this:

input(type="imfile"
  File="/opt/logs/example*.log"
  Tag="examplelogs"
)

Then you’d need a template that will build JSON documents out of your logs. You would publish these JSON’s to Kafka and consume them with Logstash. Here’s one that works well for plain syslog and tailed files that aren’t parsed via mmnormalize:

template(name="json_lines" type="list" option.json="on") {
  constant(value="{")
  constant(value="\"timestamp\":\"")
  property(name="timereported" dateFormat="rfc3339")
  constant(value="\",\"message\":\"")
  property(name="msg")
  constant(value="\",\"host\":\"")
  property(name="hostname")
  constant(value="\",\"severity\":\"")
  property(name="syslogseverity-text")
  constant(value="\",\"facility\":\"")
  property(name="syslogfacility-text")
  constant(value="\",\"syslog-tag\":\"")
  property(name="syslogtag")
  constant(value="\"}")
}

By default, rsyslog has a memory queue of 10K messages and has a single thread that works with batches of up to 16 messages (you can find all queue parameters here). You may want to change:
– the batch size, which also controls the maximum number of messages to be sent to Kafka at once
– the number of threads, which would parallelize sending to Kafka as well
– the size of the queue and its nature: in-memory(default), disk or disk-assisted

In a rsyslog->Kafka->Logstash setup I assume you want to keep rsyslog light, so these numbers would be small, like:

main_queue(
  queue.workerthreads="1"      # threads to work on the queue
  queue.dequeueBatchSize="100" # max number of messages to process at once
  queue.size="10000"           # max queue size
)

Finally, to publish to Kafka you’d mainly specify the brokers to connect to (in this example we have one listening to localhost:9092) and the name of the topic we just created:

action(
  broker=["localhost:9092"]
  type="omkafka"
  topic="rsyslog_logstash"
  template="json"
)

Assuming Kafka is started, rsyslog will keep pushing to it.

Configuring Logstash

This is the part where we pick the JSON logs (as defined in the earlier template) and forward them to the preferred destinations. First, we have the input, which will use to the Kafka topic we created. To connect, we’ll point Logstash to Zookeeper, and it will fetch all the info about Kafka from there:

input {
  kafka {
    zk_connect => "localhost:2181"
    topic_id => "rsyslog_logstash"
  }
}

At this point, you may want to use various filters to change your logs before pushing to Logsene or Elasticsearch. For this last step, you’d use the Elasticsearch output:

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:443" # it used to be "host" and "port" pre-2.0
    ssl => "true"
    index => "your Logsene app token goes here"
    manage_template => false
    #protocol => "http" # removed in 2.0
    #port => "443" # removed in 2.0
  }
}

And that’s it! Now you can use Kibana (or, in the case of Logsene, either Kibana or Logsene’s own UI) to search your logs!

Recipe: Apache Logs + rsyslog (parsing) + Elasticsearch

More than two years ago we posted a recipe on how to centralize syslog in Elasticsearch in order to search and analyze them with Kibana, all by using only rsyslog. This works well, because rsyslog is fast and light, as we shown in later posts and recent presentations.

This recipe is about tailing Apache HTTPD logs with rsyslog, parsing them into structured JSON documents, and forwarding them to Elasticsearch (or our log analytics SaaS, Logsene, which exposes the Elasticsearch API). Having them indexed in a structured way will allow you to do better analytics with tools like Kibana:

Kibana_screenshot

We’ll also cover pushing system logs and how to buffer them properly, so it’s an updated, more complete recipe compared to the old one.

Getting the ingredients

Even though most distros already have rsyslog installed, it’s highly recommended to get the latest stable from the rsyslog repositories. This way you’ll benefit from the last two to five years of development (depending on how conservative your distro is). The packages you’ll need are:

With the ingredients in place, let’s start cooking a configuration. The configuration needs to do the following:

  • load the required modules
  • configure inputs: tailing Apache logs and system logs
  • configure the main queue to buffer your messages. This is also the place to define the number of worker threads and batch sizes (which will also be Elasticsearch bulk sizes)
  • parse common Apache logs into JSON
  • define a template where you’d specify how JSON messages would look like. You’d use this template to send logs to Logsene/Elasticsearch via the Elasticsearch output

Loading modules

Here, we’ll need imfile to tail files, mmnormalize to parse them, and omelasticsearch to send them. If you want to tail the system logs, you’d also need to include imuxsock and imklog (for kernel logs).

# system logs
module(load="imuxsock")
module(load="imklog")
# file
module(load="imfile")
# parser
module(load="mmnormalize")
# sender
module(load="omelasticsearch")

Configure inputs

For system logs, you typically don’t need any special configuration (unless you want to listen to a non-default Unix Socket). For Apache logs, you’d point to the file(s) you want to monitor. You can use wildcards for file names as well. You also need to specify a syslog tag for each input. You can use this tag later for filtering.

input(type="imfile"
      File="/var/log/apache*.log"
      Tag="apache:"
)

NOTE: By default, rsyslog will not poll for file changes every N seconds. Instead, it will rely on the kernel (via inotify) to poke it when files get changed. This makes the process quite realtime and scales well, especially if you have many files changing rarely. Inotify is also less prone to bugs when it comes to file rotation and other events that would otherwise happen between two “polls”. You can still use the legacy mode=”polling” by specifying it in imfile’s module parameters.

Queue and workers

By default, all incoming messages go into a main queue. You can also separate flows (e.g. files and system logs) by using different rulesets but let’s keep it simple for now.

For tailing files, this kind of queue would work well:

main_queue(
  queue.workerThreads="4"
  queue.dequeueBatchSize="1000"
  queue.size="10000"
)

This would be a small in-memory queue of 10K messages, which works well if Elasticsearch goes down, because the data is still in the file and rsyslog can stop tailing when the queue becomes full, and then resume tailing. 4 worker threads will pick batches of up to 1000 messages from the queue, parse them (see below) and send the resulting JSONs to Elasticsearch.

If you need a larger queue (e.g. if you have lots of system logs and want to make sure they’re not lost), I would recommend using a disk-assisted memory queue, that will spill to disk whenever it uses too much memory:

main_queue(
  queue.workerThreads="4"
  queue.dequeueBatchSize="1000"
  queue.highWatermark="500000"    # max no. of events to hold in memory
  queue.lowWatermark="200000"     # use memory queue again, when it's back to this level
  queue.spoolDirectory="/var/run/rsyslog/queues"  # where to write on disk
  queue.fileName="stats_ruleset"
  queue.maxDiskSpace="5g"        # it will stop at this much disk space
  queue.size="5000000"           # or this many messages
  queue.saveOnShutdown="on"      # save memory queue contents to disk when rsyslog is exiting
)

Parsing with mmnormalize

The message normalization module uses liblognorm to do the parsing. So in the configuration you’d simply point rsyslog to the liblognorm rulebase:

action(type="mmnormalize"
  rulebase="/opt/rsyslog/apache.rb"
)

where apache.rb will contain rules for parsing apache logs, that can look like this:

version=2

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%] "%verb:word% %request:word% HTTP/%httpversion:float%" %response:number% %bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%

Where version=2 indicates that rsyslog should use liblognorm’s v2 engine (which is was introduced in rsyslog 8.13) and then you have the actual rule for parsing logs. You can find more details about configuring those rules in the liblognorm documentation.

Besides parsing Apache logs, creating new rules typically requires a lot of trial and error. To check your rules without messing with rsyslog, you can use the lognormalizer binary like:

head -1 /path/to/log.file | /usr/lib/lognorm/lognormalizer -r /path/to/rulebase.rb -e json

NOTE: If you’re used to Logstash’s grok, this kind of parsing rules will look very familiar. However, things are quite different under the hood. Grok is a nice abstraction over regular expressions, while liblognorm builds parse trees out of specialized parsers. This makes liblognorm much faster, especially as you add more rules. In fact, it scales so well, that for all practical purposes, performance depends on the length of the log lines and not on the number of rules. This post explains the theory behind this assuption, and there are already some preliminary tests to prove it as well (some of which we’ll present at Lucene Revolution). The downside is that you’ll lose some of the flexibility offered by regular expressions. You can still use regular expressions with liblognorm (you’d need to set allow_regex to on when loading mmnormalize) but then you’d lose a lot of the benefits that come with the parse tree approach.

Templates and actions

If you’re using rsyslog only for parsing Apache logs (and not system logs) and send your logs to Logsene, this bit is rather simple. Because by the time parsing ended, you already have all the relevant fields in the $!all-json variable, that you’ll use as a template:

template(name="all-json" type="list"){
  property(name="$!all-json")
}

Then you an use this template to send logs to Logsene via the Elasticsearch API and using your Logsene application token as the index name:

action(type="omelasticsearch"
  template="all-json"  # use the template defined earlier
  searchIndex="LOGSENE-APP-TOKEN-GOES-HERE"
  searchType="apache"
  server="logsene-receiver.sematext.com"
  serverport="80"
  bulkmode="on"  # use the bulk API
  action.resumeretrycount="-1"  # retry indefinitely if Logsene/Elasticsearch is unreachable
)

Putting both Apache and system logs together

Finally, if you use the same rsyslog to parse system logs, mmnormalize won’t parse them (because they don’t match Apache’s common log format). In this case, you’ll need to pick the rsyslog properties you want and build an additional JSON template:

template(name="plain-syslog"
  type="list") {
    constant(value="{")
      constant(value="\"timestamp\":\"")     property(name="timereported" dateFormat="rfc3339")
      constant(value="\",\"host\":\"")        property(name="hostname")
      constant(value="\",\"severity\":\"")    property(name="syslogseverity-text")
      constant(value="\",\"facility\":\"")    property(name="syslogfacility-text")
      constant(value="\",\"tag\":\"")   property(name="syslogtag" format="json")
      constant(value="\",\"message\":\"")    property(name="msg" format="json")
    constant(value="\"}")
}

Then you can make rsyslog decide: if a log was parsed successfully, use the all-json template. If not, use the plain-syslog one:

if $parsesuccess == "OK" then {
 action(type="omelasticsearch"
  template="all-json"
  ...
 )
} else {
 action(type="omelasticsearch"
  template="plain-syslog"
  ...
 )
}

And that’s it! Now you can restart rsyslog and get both your system and Apache logs parsed, buffered and indexed into Logsene. If you rolled your own Elasticsearch cluster, there’s one more step on the rsyslog side.

Time-based indices in your own Elasticsearch cluster

Logsene uses time-based indices out of the box, but in a local setup you’ll need to do this yourself. Such a design will give your cluster a lot more capacity due to the way Elasticsearch merges data in the background (we covered this in detail in our presentations at GeeCON and Berlin Buzzwords).

To make rsyslog use daily or other time-based indices, you need to define a template that builds an index name off the timestamp of each log. This is one that names them logstash-YYYY.MM.DD, like Logstash does by default:

template(name="logstash-index"
  type="list") {
    constant(value="logstash-")
    property(name="timereported" dateFormat="rfc3339" position.from="1" position.to="4")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339" position.from="6" position.to="7")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339" position.from="9" position.to="10")
}

And then you’d use this template in the Elasticsearch output instead of a static index name (this also requires setting dynSearchIndex to on):

action(type="omelasticsearch"
  template="all-json"
  dynSearchIndex="on"
  searchIndex="logstash-index"
  searchType="apache"
  server="MY-ELASTICSEARCH-SERVER"
  bulkmode="on"
  action.resumeretrycount="-1"
)

And now you’re really done, at least as far as rsyslog is concerned. For tuning Elasticsearch, have a look at our GeeCON and Berlin Buzzwords presentations. If you have additional questions, please let us know in the comments. And if you find this topic exciting, we’re happy to let you know that we’re hiring worldwide.

Recipe: rsyslog + Redis + Logstash

OK, so you want to hook up rsyslog with Logstash. If you don’t remember why you want that, let me give you a few hints:

  • Logstash can do lots of things, it’s easy to set up but tends to be too heavy to put on every server
  • you have Redis already installed so you can use it as a centralized queue. If you don’t have it yet, it’s worth a try because it’s very light for this kind of workload.
  • you have rsyslog on pretty much all your Linux boxes. It’s light and surprisingly capable, so why not make it push to Redis in order to hook it up with Logstash?

In this post, you’ll see how to install and configure the needed components so you can send your local syslog (or tail files with rsyslog) to be buffered in Redis so you can use Logstash to ship them to Elasticsearch, a logging SaaS like Logsene (which exposes the Elasticsearch API for both indexing and searching) so you can search and analyze them with Kibana:

Kibana_search

Continue reading “Recipe: rsyslog + Redis + Logstash”

Presentation: Tuning Elasticsearch Indexing Pipeline for Logs

Fresh from GeeCON in Krakow…we have another Elasticsearch and Logging manifesto from Sematext engineers — and book authors — Rafal Kuc and Radu Gheorghe.  As with many of their previous presentations, Radu and Rafal go into detail on Elasticsearch, Logstash and Rsyslog topics like:

  • How Elasticsearch, Logstash and Rsyslog work
  • Tuning Elasticsearch
  • Using, scaling, and tuning Logstash
  • Using and tuning Rsyslog
  • Rsyslog with JSON parsing
  • Hardware and data tests
  • …and lots more along these lines

[Note: Video of the talk coming soon to this post!]

If you find this stuff interesting and have similar challenges, then drop us a line to chat about our Elasticsearch and Logging consulting services and Elasticsearch (and Solr, too) production support.  Oh yeah, and we’re hiring worldwide if you are into Logging, Monitoring, Search, or Big Data Analytics as much as Radu and Rafal!

Monitoring rsyslog’s Performance with impstats and Elasticsearch

If you’re using rsyslog for processing lots of logs (and, as we’ve shown before, rsyslog is good at processing lots of logs), you’re probably interested in monitoring it. To do that, you can use impstats, which comes from input module for process stats. impstats produces information like:
input stats, like how many events went through each input
queue stats, like the maximum size of a queue
– action (output or message modification) stats, like how many events were forwarded by each action
– general stats, like CPU time or memory usage

In this post, we’ll show you how to send those stats to Elasticsearch (or Logsene — essentially hosted ELK, our log analytics service) that exposes the Elasticsearch API), where you can explore them with a nice UI, like Kibana. For example get the number of logs going through each input/output per hour:
kibana_graph
More precisely, we’ll look at:
– the useful options around impstats
– how to use those stats and what they’re about
– how to ship stats to Elasticsearch/Logsene by using rsyslog’s Elasticsearch output
– how to do this shipping in a fast and reliable way. This will apply to most rsyslog use-cases, not only impstats

Continue reading “Monitoring rsyslog’s Performance with impstats and Elasticsearch”

Using Elasticsearch Mapping Types to Handle Different JSON Logs

By default, Elasticsearch does a good job of figuring the type of data in each field of your logs. But if you like your logs structured like we do, you probably want more control over how they’re indexed: is time_elapsed an integer or a float? Do you want your tags analyzed so you can search for big in big data? Or do you need it not_analyzed, so you can show top tags via the terms aggregation? Or maybe both?

In this post, we’ll look at how to use index templates to manage multiple types of logs across multiple indices. Also, we’ll explain how to use logging tools (such as Logstash and rsyslog) to handle JSON logging and specify types.

Elasticsearch Mapping and Logs

As you may already know, to control these things in Elasticsearch you’ll need to define a mapping. This works similarly in Logsene, our log analytics SaaS, because it uses Elasticsearch and exposes its API.

With logs you’ll probably use time-based indices, because they scale better (in Logsene, for instance, you get daily indices). That said, to make sure the mapping you define today applies to the index you create tomorrow, you need to define it in an index template.

Managing Multiple Types

Mappings provide a nice abstraction when you have to deal with multiple types of structured data. Let’s say you have two apps generating logs of different structures: both have a timestamp field, but one recording logins has a user field, and another one recording purchases has an amount field.

To deal with this, you can define the timestamp field in the _default_ mapping which applies to all types. Then, in each type’s own mapping we’ll define fields unique to that mapping. The following snippet is an example that works with Logsene, provided that aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee is your Logsene app token. If you roll your own Elasticsearch, you can use whichever name you want, and make sure the template applies to your index pattern.

curl -XPUT 'logsene-receiver.sematext.com/_template/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee_MyTemplate' -d '{
 "template" : "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee*",
 "order" : 21,
 "mappings" : {
  "_default_" : {
   "properties" : {
    "timestamp" : { "type" : "date" }
   }
  },
  "firstapp" : {
   "properties" : {
    "user" : { "type" : "string" }
   }
  },
  "secondapp" : {
   "properties" : {
    "amount" : { "type" : "long" }
   }
  }
 }
}'

Sending JSON Logs to Specific Types

When you send a document to Elasticsearch by using the API, you have to provide an index and a type. You can use an Elasticsearch client for your preferred language to log directly to Elasticsearch or Logsene this way. But I wouldn’t recommend this, because then you’d have to manage things like buffering if the destination is unreachable.

Instead, I’d keep my logging simple and use a specialized logging tool, such as Logstash or rsyslog to do the hard work for me. Logging to a file is usually the easiest option. It’s local, and you can have your logging tool tail the file and send contents over the network. I usually prefer sockets (like syslog) because they let me configure Logstash/rsyslog to:
– write events in a human format to a local file I can tail if I need to (usually in development)
– forward logs without hitting disk if I need to (usually in production)
Whatever you prefer, I think writing to local files or sockets is better than sending logs over the network from your application. Unless you’re willing to do a reliability trade-off and use UDP, which gets rid of most complexities.

Opinions aside, here’s a Logstash configuration for tailing a file with JSON logs separated by a newline. Here’s how you’d send those documents to Logsene via the Elasticsearch API:

input {
 file {
 path => "/var/log/test"
 codec => "json"
 }
}

output {
 elasticsearch {
 host => "logsene-receiver.sematext.com"
 port => 80
 index => "LOGSENE-APP-TOKEN-GOES-HERE"
 index_type => "fileapp"
 protocol => "http"
 manage_template => false
 }
}

Note how the JSON codec does the parsing here, instead of the more expensive and maintenance-heavy approach with grok that we’ve shown in an earlier post on getting started with Logstash. Some applications let you configure the log format, so you can make them write JSON (Apache httpd, for example).

If you want to send JSON over syslog, there’s the JSON-over-syslog (CEE) format that we detailed in a previous post. You can use rsyslog’s JSON parser module to take your structured logs and forward them to Logsene:

module(load="imuxsock")        # can listen to local syslog socket
module(load="omelasticsearch") # can forward to Elasticsearch
module(load="mmjsonparse")     # can parse JSON

action(type="mmjsonparse")  # parse CEE-formatted messages

template(name="syslog-cee" type="list") {  # Elasticsearch documents will contain
  property(name="$!all-json")              # all JSON fields that were parsed
}

action(
  type="omelasticsearch"
  template="syslog-cee"                     # use the template defined earlier
  server="logsene-receiver.sematext.com"
  serverport="80"
  searchType="syslogapp"
  searchIndex="LOGSENE-APP-TOKEN-GOES-HERE"
  bulkmode="on"                                # send logs in batches
  queue.dequeuebatchsize="1000"                # of up to 1000
  action.resumeretrycount="-1"    # retry indefinitely (buffer) if destination is unreachable
)

To send a CEE-formatted syslog, you can run logger ‘@cee: {“amount”: 50}’ for example. Rsyslog would forward this JSON to Elasticsearch or Logsene via HTTP. Note that Logsene also supports CEE-formatted JSON over syslog out of the box if you want to use a syslog protocol instead of the Elasticsearch API.

Filtering by Type

Once your logs are in, you can filter them by type (via the _type field) in Kibana:
Type Filtering with Kibana
However, if you want more refined filtering by source, we suggest using a separate field for storing the application name. This can be useful when you have different applications using the same logging format. For example, both crond and postfix use plain syslog.

If you’re looking for a place to send your logs to, check out Logsene!

Top 5 Most Popular Log Shippers

The Log Shipper Poll results are in!  We run Logsene here at Sematext, so we wanted to know what the best log shippers are and what people like to use to ship their logs.  Before we share the results, a few words about the poll:

  • We published it here on our blog on September 22, 2014
  • We automatically tweeted it and posted it to several Devops and similar LinkedIn groups
  • We did not post it to groups or mailing lists for various log shippers we included in the poll to avoid bias
  • We collected 115 votes until now

That said, let’s see how log shipper popularity breaks down.

You can tweet the results of this poll here: Top 5 Most Popular Log Shippers

Log Shipper Popularity
Log Shipper Popularity

Don’t forget to check out Logsene – our Log Management Cloud/On Premises service that will happily take logs from Logstash, Flume, rsyslog, Fluentd, Syslog-ng, syslogd, etc.  Check How to Send Logs to Logsene to see how easy it is.

Encrypting Logs on Their Way to Elasticsearch Part 2: TLS Syslog

In part 1 of the “encrypted logs” series we discussed sending logs to Elasticsearch over HTTPS. This second part is about TLS syslog.

If you wonder what this has to do with Elasticsearch, the point is that TLS syslog is a standard (RFC-5425): any decent version of rsyslog, syslog-ng or nxlog works with it. So you can forward logs over TLS to a recent, “intermediary” rsyslog. Then, you can either use omelasticsearch with HTTPS to ship your logs to Elasticsearch, or you can install rsyslog on an Elasticsearch node (and index logs over HTTP to localhost).

Such a setup will give you the following benefits:

  • it will work with most syslog daemons, because TLS syslog is so widely supported
  • the “intermediate” rsyslog can act as a buffer, taking that pressure off your application servers
  • the “intermediate” rsyslog can be used for processing, like parsing CEE-formatted JSON over syslog. Again, taking load off your applicaton servers

Our log analytics SaaS, Logsene, gives you all the benefits listed above through the syslog endpoint:

TLS syslog flow in Logsene

Client Setup

Before you start, you’ll need a Certificate Authority’s public key, which will be used to validate the encryption certificate from the syslog destination (more about the server side later).

If you’re using Logsene, you can download the CA certificates directly. If you’re on a local setup, or you just want to consolidate your logs before shipping them to Logsene, you can use your own certificates or generate self-signed ones. Here’s a guide to generating certificates that will work with TLS syslog.

With the CA certificate(s) in hand, you can start configuring your syslog daemon. For example, the rsyslog configuration can look like this:

module(load="imuxsock")  # listens for local logs on /dev/log

global (  # global settings
 defaultNetstreamDriver="gtls"  # use TLS driver when it comes to transporting over TCP
 defaultNetstreamDriverCAFile="/opt/rsyslog/ca_bundle.pem"  # CA certificate. Concatenate if you have more
)

action(  # how to send logs
  type="omfwd"                                    # Forward them
  target="logsene-receiver-syslog.sematext.com"   # to Logsene's syslog endpoint
  port="10514"                                    # on port X
  protocol="tcp"                                  # over TCP
  template="RSYSLOG_SyslogProtocol23Format"       # using the RFC-5424 syslog format
  StreamDriverMode="1"                            # via the TLS mode of the driver defined above.
  StreamDriverAuthMode="x509/name"                # Request the machine certificate of the server
  StreamDriverPermittedPeers="*.sematext.com"     # and based on it, just allow Sematext hosts
)

This is the new-style configuration format for rsyslog, that works with version 6 or above. For the pre-v6 format (BSD-style), check out the Logsene documentation. You can also find the syslog-ng equivalent there.

Server Setup

If you’re using Logsene, you might as well stop here, because it handles everything from buffering and indexing to parsing JSON-formatted syslog.

If you’re consolidating logs before sending them to Logsene, or you’re running your local setup, here’s an excellent end-to-end guide to setting up TLS with rsyslog. The basic steps for the server are:

  • use the same CA certificates as the client, so they have the same basis
  • generate the machine public-private key pair. You’ll have to provide both in the rsyslog configuration
  • set up the TLS rsyslog configuration

Explore

Once you start logging, the end result should be just like in part 1. You can use Logsene’s hosted Kibana, your own Kibana or the Logsene UI to explore your logs:

Logsene Screnshot

As always, feel free to contact us if you need any help:

Encrypting Logs on Their Way to Elasticsearch

Let’s assume you want to send your logs to Elasticsearch, so you can search or analyze them in realtime. If your Elasticsearch cluster is in a remote location (EC2?) or is our log analytics service, Logsene (which exposes the Elasticsearch API), you might need to forward your data over an encrypted channel.

There’s more than one way to forward over SSL, and this post is part 1 of a series explaining how.

update: part 2 is now available!

Today’s method is about sending data over HTTPS to Elasticsearch (or Logsene), instead of plain HTTP. You’ll need two pieces to achieve this:

  1. a tool that can send logs over HTTPS
  2. the Elasticsearch REST API exposed over HTTPS

You can build your own tool or use existing ones. In this post we’ll show you how to use rsyslog’s Elasticsearch output to do that. For the API, you can use Nginx or Apache as a reverse proxy for HTTPS in front of your Elasticseach, or you can use Logsene’s HTTPS endpoint:

Rsyslog Configuration

To get rsyslog’s omelasticsearch plugin, you need at least version 6.6. HTTPS support was just added to master, and it’s expected to land in version 8.2.0. Once that is up, you’ll be able to use the Ubuntu, Debian or RHEL/CentOS packages to install both the base rsyslog and the rsyslog-elasticsearch packages you need. Otherwise, you can always install from sources:
– clone from the rsyslog github repository
– run `autogen.sh –enable-elasticsearch && make && make install` (depending on your system, it might ask for some dependencies)

With omelasticsearch in place (the om part comes from output module, if you’re wondering about the weird name), you can try the configuration below to take all your logs from your local /dev/log and forward them to Elasticsearch/Logsene:

# load needed input and output modules
module(load="imuxsock.so") # listen to /dev/log
module(load="omelasticsearch.so") # provides Elasticsearch output capability

# template that will build a JSON out of syslog
# properties. Resulting JSON will be in Logstash format
# so it plays nicely with Logsene and Kibana
template(name="plain-syslog"
         type="list") {
           constant(value="{")
             constant(value="\"@timestamp\":\"")
                 property(name="timereported" dateFormat="rfc3339")
             constant(value="\",\"host\":\"")
                 property(name="hostname")
             constant(value="\",\"severity\":\"")
                 property(name="syslogseverity-text")
             constant(value="\",\"facility\":\"")
                 property(name="syslogfacility-text")
             constant(value="\",\"syslogtag\":\"")
                 property(name="syslogtag" format="json")
             constant(value="\",\"message\":\"")
                 property(name="msg" format="json")
             constant(value="\"}")
         }

# send resulting JSON documents to Elasticsearch
action(type="omelasticsearch"
       template="plain-syslog"
 # Elasticsearch index (or Logsene token)
       searchIndex="YOUR-LOGSENE-TOKEN-GOES-HERE"
 # bulk requests
       bulkmode="on"  
       queue.dequeuebatchsize="100"
 # buffer and retry indefinitely if Elasticsearch is unreachable
       action.resumeretrycount="-1"
 # Elasticsearch/Logsene endpoint
       server="logsene-receiver.sematext.com"
       serverport="443"
       usehttps="on"
)

Exploring Your Data

After restarting rsyslog, you should be able to see your logs flowing in the Logsene UI, where you can search and graph them:

Logsene Screnshot

If you prefer Logsene’s Kibana UI, or you run your own Elasticsearch cluster, you can run make your own Kibana connect to the HTTPS endpoint just like rsyslog or Logsene’s native UI do.

Wrapping Up

If you’re using Logsene, all you need to do is to make sure you add your Logsene application token as the Elasticsearch index name in rsyslog’s configuration.

If you’re running your own Elasticsearch cluster, there are some nice tutorials about setting up reverse HTTPS proxies with Nginx and Apache respectively. You can also try Elasticsearch plugins that support HTTPS, such as the jetty and security plugins.

Feel free to contact us if you need any help. We’d be happy to answer any Logsene questions you may have, as well as help you with your local setup through professional services and production support. If you just find this stuff exciting, you may want to join us, wherever you are.

Stay tuned for part 2, which will show you how to use RFC-5425 TLS syslog to encrypt your messages from one syslog daemon to the other.

Video and Presentation: Indexing and Searching Logs with Elasticsearch or Solr

Interested in log indexing using Elasticsearch or Solr?  Also interested in searching and analyzing logs in real time?

This topic really hits home for us since we released our log analytics tool, Logsene and we also offer consulting services for logging infrastructure.  If you are reading this and looking for a new opportunity then you might be interested to hear that we are hiring worldwide.

If you are into logging like we are, then you will want to check out this presentation delivered by Sematext’s own Radu Gheorghe to the NYC Search, Discovery and Analytics Meetup held recently at Pivotal Labs.  For the purposes of this presentation the term “logs” ranges from server logs and application events to metrics and even social media information.

The presentation has three parts:

  1. Overview of logging tools that play nicely with Elasticseach and Solr (like Logstash, Apache Flume or rsyslog)
  2. Performance tuning and scaling Elasticsearch and Solr
  3. Demo of an end-to-end solution

Here you go – enjoy!