Opening: HBase and Lucene / Solr / Elastic Search Developer

We are once again looking for smart people.  This time we are looking to hire a person who likes working with HBase and Lucene (or Solr or ElasticSearch).  This particular combination is important to us because the very first target for this person might be the integration of HBase and Lucene / Solr / ElasticSearch.  More specifically, we have our eyes on HBASE-3529, which we’ve closely examined during a recent HBase Hackathon that took place after BerlinBuzzwords.  Of course, we are also open to alternative approaches if the one takes in HBASE-3529 turns out to be problematic.  The work around the marriage of HBase and full-text search is to be done “in the open”, meaning in collaboration with HBase as well as Lucene, Solr, or Elastic Search developers, which makes this project that much more exciting.

Beyond HBase and search integration, we do other interesting stuff with HBase (and Flume and MapReduce and …), so this person would get to work on our Search Analytics and Scalable Performance Monitoring services.

Interested?  Please get in touch and see what else we like on our jobs page.

Search Analytics – Video Interview with Otis Gospodnetić

“I’m shocked companies aren’t using these tools.”

This video interview about Search Analytics is from Techilicious by Josette Rigsby: http://techielicous.com/2011/06/04/search-and-analytics/

“…we had a chance to speak with Otis Gospodnetić, co-author of Lucene in Action and Founder of Sematext regarding search analytics and searching big data.”

Enterprise search is growing in importance along with data sizes; there is simply to much content to locate without the aid of a search tool; but, are users really  finding what they need? Unfortunately, many companies can not answer that question. Gospodnetić advised that organizations should be collecting at least a minimum set of metrics about  search performance and user behavior. However, the majority are not.

Unlike click stream analysis, search analytics provides insight into how users are actually using search – the actual terms they specify – instead of just what they clicked. Key metrics organizations should collect on an on-going basis include:

  • Search failure (zero results)
  • Low click-through rate
  • Most popular searches (words and phrases)

Once the metrics are collected, organizations should analyze the data to improve the search experience. For example, if a significant percentage of queries are failing organizations can use the data from search analytics to find out why. Is it due to misspellings? Are there synonyms? Gospodnetić said,

“I’m shocked companies aren’t using these tools.”

For more information on this topic read about Sematext Search Analytics service.