berlin buzzwords

Presentation and Video: Side by Side with Solr and Elasticsearch

Fresh from Berlin Buzzwords where Sematext‘s own Radu Gheorghe and Rafal Kuc presented “Side by Side with Solr and Elasticsearch” on the same stage, at the same time…but in different colors. The talk included live demos, graphing, stats, and hints at juicy things to come. Needless to say — if you deal with Solr and Elasticsearch then there are great insights to be found here!

Here is the presentation:

Side by Side with Elasticsearch and Solr from Sematext Group, Inc.

And here is the video:

Want to Be on Stage Somewhere Like Radu and Rafal Talking About Solr and Elasticsearch?

Or maybe you don’t want the spotlight — that’s cool too. But…if you do enjoy performance monitoring, log analytics, or search analytics, working with projects like Elasticsearch, Solr, HBase, Hadoop, Kafka, and Storm, then drop us a line. We’re hiring planet-wide! Front end and JavaScript Developers, Developer Evangelists, Full-stack Engineers, Mobile App Developers…get in touch!

[Note: for those of you who don’t have the time or inclination to go through all the technical details, here’s a high-level, up-to-date (2015) Solr vs. Elasticsearch overview]

Enjoy!

Berlin Buzzwords 2014 – Side by Side with Elasticsearch and Solr

Last year at Berlin Buzzwords two Sematext Engineers had the opportunity to give two talks. Radu talked about “JSON Logging with Elasticsearch” (video, slides) and Rafał did the second round of Solr vs Elasticsearch in his talk “Battle of the Giants, round 2” (video, slides). We were also happy to be sponsoring Berlin Buzzwords 2013. This year, we decided to go for a talk where two of us can talk on the same stage, at the same time. On Tuesday, 27th of May, at 11:30, in the Frannz Club Radu and Rafał will be giving a talk called “Side by side with Solr and Elasticsearch“.

Solr – established, mature and well known open-source search server, commonly used. Elasticsearch – still young, but quickly gaining popularity, with over 200k downloads per month. Both search servers are based on Lucene – the open-source full text searching Java library, but each with their own extensions, their pros and cons.

We all know that Solr and Elasticsearch are different, but what those differences are and which solution is the best fit for a particular use case is a frequent question. We will try to make those differences clear, not by showing slides and comparing them, but by showing on online demo of both Elasticsearch and Solr:

Set up and start both search servers. See what you need to prepare and launch Solr and Elasticsearch.
Index data right after the server was started using the “schemaless” mode
Create index structure and modify it using the provided API
Explore different query use cases
Scale by adding and removing nodes from the cluster, creating indices and managing shards. See how that affects data indexing and querying.
Monitor and administer clusters. See what metrics can be seen out of the box, how to get them and what tools can provide you with the graphical view of all the goodies that each search server can provide.

If you want to come, hear about both Solr and Elasticsearch from @sematext and how to achieve similar things, what how they behave and don’t see too many slides, come join us 🙂

ElasticSearch & Solr Book Giveaway at Berlin Buzzwords

We’ve given away all 3 free Berlin Buzzwords tickets, but we have more stuff to give away. Stop by our desk at Berlin Buzzwords, say hello, and one of these could be yours to take home:

or perhaps

And don’t miss our 2 talks in Berlin this year!

Berlin Buzzwords 2013 – Free Tickets

Sematext is a Berlin Buzzwords sponsor for the 3rd year in a row and we have three free tickets to give away. If you are a Sematext follower, a client, an SPM or Search Analytics user, of Logsene beta tester, or simply want to go to Berlin to listen to our talks or any other talks about large scale search, storage, data analytics, NoSQL, BigData, and a few other buzzwords, we are giving away 3 free Berlin Buzzwords tickets. Get in touch!

Berlin Buzzwords 2013 – Two Talks from Sematext

Last year at Berlin Buzzwords we were proud to give three talks. Alex talked about “Real-time Analytics with HBase” (slides, video), Otis talked about large scale monitoring in his talked titled “Large Scale ElasticSearch, Solr & HBase Performance Monitoring” (slides, video) and Rafał gave a talk about how we scale ElasticSearch clusters in his “Scaling Massive ElasticSearch Clusters” talk (slides, video). We were also very happy to be one of the sponsors of this great conference 🙂 Because we really enjoyed the conference we decided to submit a few proposals this year and they got accepted. In this years schedule we will be giving the following talks:

Radu: JSON Logging with ElasticSearch

This talk is about aggregating loooots of logs – searching of seriously big data. We’ll go through everything we can possibly go through in 20 minutes. We’ll look at how, where, when, why, and what to log. We’ll show how to use Elasticsearch as a data store for logs and what the benefits of doing so are. We’ll discuss advantages and disadvantages of logging in JSON, which is easily processed by machines, over traditional logging, which is easily processed by humans. Finally, we’ll explore how you can get your logs – JSON or not – into Elasticsearch, run searches and statistics on them, and create pretty graphs you can’t stop staring at.

Rafał: Battle of the Giants, Round 2

Learn about how both of these great enterprise search servers are evolving and adding new features. We will be comparing the latest and greatest versions of Solr and ES, both of which are using Lucene 4.x and bringing different approaches to handling codecs, per field similarities, and more. Of course, we’ll not only look at technical aspects of both Apache Solr and ElasticSearch, but will also dig into the makeup of their contributors, compare the code and of course the user community. By the end of the talk you’ll learn the main differences when it comes to these two search servers, how they handle shard and replica distribution, automatic data replication, and different query types. In addition, you’ll learn what the admin APIs for both Solr and ElasticSearch look like and how to use them to control and alter your cluster state. Last, but not least, you’ll learn what to avoid when using ElasticSearch or Apache Solr.

[Note: for those of you who don’t have the time or inclination to go through all the technical details, here’s a high-level, up-to-date (2015) Solr vs. Elasticsearch overview]

We hope to see some of you in Berlin. If these topics are of interest to you, but you won’t be coming to Berlin, feel free to get in touch, leave comments, or ping @sematext. As usual we’ll be posting slides after the talks and the organizers will probably record the talk and publish it after the conference. And if you love working with things our talks are about, we are hiring world-wide!

Slides: Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB…

In this presentation from Berlin Buzzwords 2012 we show how the SPM, our Performance Monitoring service is built. How metrics are collected, how they are processed, and how they are presented. We share a few findings along the way, too.

Note: we are actively looking for people with strong Java engineers. If that’s you, please get in touch. Separately, if you have interest and/or experience with HBase and/or Analytics, OLAP, and related areas, or if you are looking to work with ElasticSearch, Solr, and search in general please get in touch, too.

See also:

Slides: Real-time Analytics with HBase

Here are slides from another talk we gave at both Berlin Buzzwords and at HBaseCon in San Francisco last month. In this presentation Alex describes one approach to real-time analytics with HBase, which we use at Sematext via HBaseHUT. If you like these slides you will also like HBase Real-time Analytics Rollbacks via Append-based Updates.

Note: we are actively looking for people with strong interest and/or experience with HBase and/or Analytics, OLAP, etc. If that’s you, please get in touch.

The short version is from Buzzwords, while the version with more slides is from HBaseCon:

Slides: Scaling Massive ElasticSearch Clusters

We are done with a 2-days long Berlin Buzzwords conference. The conference was good, a success for both the organizers and for Sematext – we saw a ton of interest for both our Performance Monitoring and Search Analytics services and our talks were well received and attended by 200+ people each. Between the presentations we gave, talking to people interested in our products and/or services, as well as people expressing interest in joining Sematext (ask us how much fun we had in Berlin!), even with 5 Sematextans around we had our hands full.

Note: we are actively looking for people with strong interest and/or experience with ElasticSearch, Solr, and search in general. If that’s you, please get in touch.

Below are the slides from Rafal’s talk about scaling Elastic Search:

Berlin Buzzwords 2012 – Three Talks from Sematext

Last year was our first time at Berlin Buzzwords. We gave 1 full talk about Search Analytics (video) and 2 lightning talks (video, video). We saw a number of good talks, too. We also took part in a HBase Hackathon organized by Lars George in Groupon’s Berlin offices and even found time to go clubbing. So in hopes of paying Berlin another visit this year, a few of us at Sematext (@sematext) submitted talk proposals. Last week we all got acceptance emails, so this year there will be 3 talks from 3 Sematextans at Berlin Buzzwords! Here is what we’ll be talking about:

Rafał: Scaling Massive ElasticSearch Clusters

This talk describes how we’ve used ElasticSearch to build massive search clusters capable of indexing several thousand documents per second while at the same time serving a few hundred QPS over billions of documents in well under a second. We’ll talk about building clusters that continuously grow in terms of both indexing and search rates. You will learn about finding cluster nodes that can handle more documents, about managing shard and replica allocation and prevention of unwanted shard rebalancing, about avoiding expensive distributed queries, etc. We’ll also describe our experience doing performance testing of several ElasticSearch clusters and will share our observations about what settings affect search performance and how much. In this talk you’ll also learn how to monitor large ElasticSearch clusters, what various metrics mean, and which ones to pay extra attention to.

Alex: Real-time Analytics with HBase

HBase can store massive amounts of data and allow random access to it – great. MapReduce jobs can be used to perform data analytics on a large scale – great. MapReduce jobs are batch jobs – not so great if you are after Real-time Analytics. Meet append-only writes approach that allows going real-time where it wasn’t possible before.

In this talk we’ll explain how we implemented “update-less updates” (not a typo!) for HBase using append-only approach. This approach shines in situations where high data volume and velocity make random updates (aka Get+Put) prohibitively expensive. Apart from making Real-time Analytics possible, we’ll show how the append-only approach to updates makes it possible to perform rollbacks of data changes, and avoid data inconsistency problems caused by tasks in MapReduce jobs that fail after only partially updating data in HBase. The talk is based on Sematext’s success story of building a highly scalable, general purpose data aggregation framework which was used to build Search Analytics and Performance Monitoring services. Most of the generic code needed for append-only approach described in this talk is implemented in our HBaseHUT open-source project.

Otis: Large Scale ElasticSearch, Solr & HBase Performance Monitoring

This talk has all the buzzwords covered: big data, search, analytics, realtime, large scale, multi-tenant, SaaS, cloud, performance… and here is why:

In this talk we’ll share the “behind the scenes” details about SPM for HBase, ElasticSearch, and Solr, a large scale, multi-tenant performance monitoring SaaS built on top of Hadoop and HBase running in the cloud. We will describe all its backend components, from the agent used for performance metrics gathering, to how metrics get sent to SPM in the cloud, how they get aggregated and stored in HBase, how alerting is implemented and how it’s triggered, how we graph performance data, etc. We’ll also point out the key metrics to watch for each system type. We’ll go over various pain-points we’ve encountered while building and running SPM, how we’ve dealt with them, and we’ll discuss our plans for SPM in the future.

We hope to see some of you in Berlin. If these topics are of interest to you, but you won’t be coming to Berlin, feel free to get in touch, leave comments, or ping @sematext. And if you love working with things our talks are about, we are hiring world-wide!

Sematext at Berlin Buzzwords 2011

As part of Sematext’s Summer 2011 Conference Tour we are going to be visiting the good old Europe and giving a talk at Berlin Buzzwords. This is the second year for Berlin Buzzwords, “a conference for developers and users of open source software projects, focusing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. Berlin Buzzwords presents more than 30 talks and presentations of international speakers specific to the three tags “search”, “store” and “scale”“. Last year, one of us from Sematext went there as an attendee. This year, three of us are going and one of us is giving a talk – @OtisG will be speaking about Search Analytics on June 6th. That’s the first day of the conference and we are first in line to talk at 11:00 AM, right after the morning coffee. Doug Cutting and Ted Dunning will be giving Keynotes. Some of us may also be there for some of the Hackathons/Workshops before and/or after the conference. If you are going to be there and would like to meet up, please let us know! @sematext.

For more information on this topic read about Sematext Search Analytics service.