Videos: Tuning Solr for Logs and Solr Anti-Patterns

If you’re an avid Solr user you’ll want to check out these Lucene / Solr Revolution videos from two of Sematext’s Solr experts: Rafal Kuc and Radu Gheorghe.

Tuning Solr for Logs

Radu talked about Solr performance tuning, which is always nice for keeping your applications snappy and your costs down. This is especially true for logs, social media and other stream-like data that can easily grow into terabyte territory.

(note: there’s no audio between 3:30 and 4:30; we hope to have this fixed soon and it doesn’t materially affect the talk)

Solr Anti-Patterns

Rafal points out common mistakes and roads that should be avoided at all costs when dealing with Solr.

Slides and Summaries

You can find slides of the Solr presentations in this blog post and summaries in this blog post.

Enjoy!

Video: Scaling Solr with SolrCloud

During last  year’s Lucene Revolution conference in Dublin we had the opportunity to give four talks, one of which was Scaling Solr with SolrCloud. Through it we wanted to share our experiences around scaling Solr, especially as we have experience in running Solr internally and as a team of search consultants.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!

Video: Administering and Monitoring SolrCloud Clusters

As you know, at Sematext, we are not only about consulting services, but also about administration, monitoring, and data analysis. Because of that, during last year’s Lucene Revolution conference in Dublin we gave a talk about administration and monitoring of SolrCloud clusters. During the talk, Rafał Kuć discusses some administration procedures for SolrCloud like collection management and schema modifications with the schema API. In addition, he also talks about why monitoring is important and what to pay attention to. Finally, he shows three real life examples of monitoring usefulnesses.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!

 

 

 

4 Lucene Revolution Talks from Sematext

Bingo! We’re 4 of 4 at Lucene Revolution 2013 – 4 talk proposals and all 4 accepted!  We are hiring just so next year we can attempt getting 5 talks in. 😉  We’ll also be exhibiting at the conference, so stop by.  We will be giving away Solr and Elasticsearch books.  Here’s what we’ll be talking about in Dublin on November 6th and 7th:

In Using Solr to Search and Analyze Logs Radu will be talking about … well, you guessed it – using Solr to analyze logs.  After this talk you may want to run home (or back to the hotel) and hack on LogStash or Flume, and Solr and get Solr to eat your logs…. but don’t forget we have to keep Logsene well fed.  Feed this beast your logs like we feed it ours and help us avoid getting eaten by our own creation.

Abstract:

Many of us tend to hate or simply ignore logs, and rightfully so: they’re typically hard to find, difficult to handle, and are cryptic to the human eye. But can we make logs more valuable and more usable if we index them in Solr, so we can search and run real-time statistics on them? Indeed we can, and in this session you’ll learn how to make that happen. In the first part of the session we’ll explain why centralized logging is important, what valuable information one can extract from logs, and we’ll introduce the leading tools from the logging ecosystems everyone should be aware of – from syslog and log4j to LogStash and Flume. In the second part we’ll teach you how to use these tools in tandem with Solr. We’ll show how to use Solr in a SolrCloud setup to index large volumes of logs continuously and efficiently. Then, we’ll look at how to scale the Solr cluster as your data volume grows. Finally, we’ll see how you can parse your unstructured logs and convert them to nicely structured Solr documents suitable for analytical queries.

Rafal will teach about Scaling Solr with SolrCloud in a 75-minute session.  Prepare for taking lots of notes and for scaling your brain both horizontally and vertically while at the same time avoiding split-brain.

Abstract:

Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you’ll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You’ll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.

Rafal doesn’t like to sleep.  He prefers to write multiple books at the same time and give multiple talks at the same conference.  His second talk is about Administering and Monitoring SolrCloud Clusters – something we and our customers do with SPM all the time.

Abstract:

Even though Solr can run without causing any troubles for long periods of time it is very important to monitor and understand what is happening in your cluster. In this session you will learn how to use various tools to monitor how Solr is behaving at a high level, but also on Lucene, JVM, and operating system level. You’ll see how to react to what you see and how to make changes to configuration, index structure and shards layout using Solr API. We will also discuss different performance metrics to which you ought to pay extra attention. Finally, you’ll learn what to do when things go awry – we will share a few examples of troubleshooting and then dissect what was wrong and what had to be done to make things work again.

Otis has aggregation coming out of his ears and dreams about data visualization, timeseries graphs, and other romantic visuals. In Solr for Analytics: Metrics Aggregations at Sematext we’ll share our experience running SPM on top of SolrCloud (vs. HBase, which we currently use).

Abstract:

While Solr and Lucene were originally written for full-text search, they are capable and increasingly used for Analytics, as Key Value Stores, NoSQL databases, and more. In this session we’ll describe our experience with Solr for Analytics. More specifically, we will describe a couple of different approaches we have taken with SolrCloud for aggregation of massive amounts of performance metrics, we’ll share our findings, and compare SolrCloud with HBase for large-scale, write-intensive aggregations. We’ll also visit several Solr new features that are in the works that will make Solr even more suitable for Analytics workloads.

See you in Dublin!

Sematext at Lucene Revolution 2011

Last year at Lucene Revolution in October in Boston, we shared how we built search-lucene.com and search-hadoop.com.  In May of this year, we’ll again be talking at Lucene Revolution about another topic very dear to us at Sematext – Search Analytics (abstract).  The full conference agenda is available.  Start picking sessions to attend.

This year’s Lucene Revolution is extra interesting because Sematext is also sponsoring the conference.  In addition to that, it’s great to see a couple of our customers be presenting this year!

If you are coming to the conference don’t be afraid to say hello.  And if San Francisco is too far this year and you are on the east coast of the US in mid-June, you can also catch us at the Open Source Search Conference.  And if you are in Europe, you’ll see us there in June of this year, too.  Until then, so long from @sematext.

For more information on this topic read about Sematext Search Analytics service.