If topics like log analytics and Solr are your thing then we may have a treat for you at the upcoming Lucene / Solr Revolution conference in Austin in October. Two of Sematext’s engineers and Solr, Elasticsearch and ELK stack experts — Rafal Kuc and Radu Gheorghe — have proposed a talk called “Large Scale Log Analytics with Solr” and could use some upvoting from the community to get in on this year’s agenda.
To show your support for “Large Scale Log Analytics with Solr” just click here to vote. Takes less than a minute! Even if you don’t attend the conference, we’ll post the slides and video here on the blog…assuming it gets on the agenda. Voting will close at 11:59pm EDT on Thursday, June 25th.
Talk Summary
This talk is about searching and analyzing time-based data at scale. Documents ranging from blog posts and social media to application logs and metrics generated by smart watches and other “smart” things share a similar pattern: timestamp among their fields, rarely changeable, deletion when they become obsolete.
Very often this kind of data is so large that it causes scaling and performance challenges. We’ll address precisely these challenges, which include:
- Properly designing collections architecture
- Indexing data fast and without documents waiting in queues for processing
- Being able to run queries that include time-based sorting and faceting on enormous amounts of indexed data without killing Solr
- …and many more
We’ll start with the indexing pipeline — where you do all your ETL. We’ll show you how to maximize throughput through various ETL tools, such Flume, Kafka, Logstash and rsyslog, and make them scale and send data to Solr.
On the Solr side, we’ll show all sorts of tricks to optimize indexing and searching: from tuning merge policies to slicing collections based on timestamp. While scaling out, we’ll show how to improve the performance/cost ratio.
Thanks for your support!