|
Search With Solr in the Cloud
DataStax Enterprise includes strong enterprise search support via Lucene and Apache Solr. Coming from the Apache Lucene project, Solr is the most popular open source enterprise search platform in use today.
Solr’s primary features include robust full-text search, hit highlighting, faceted search, rich document (e.g., PDF, Microsoft Word) handling, and geospatial search.
By integrating Solr into the DataStax Enterprise big data platform, DataStax extends Solr’s capabilities and overcomes a number of shortcomings that native Solr has such as:
• Lack of data durability (community Solr has no write-ahead log, so data can be lost if a node crashes). No chance of data loss exists with Solr in DataStax Enterprise
• Solr’s write bottleneck, as all writes go through a single master. But with DataStax Enterprise, users can read and write to any Solr node in the cluster
• Replication and sharding of Solr, which is a manual process and requires careful planning for scaling and failover. DataStax Enterprise, however, supplies automatic sharding and no single point of failure
• Manual re-indexing of data. Indexes can be automatically rebuilt in DataStax Enterprise
• Writes to indexes in community Solr cannot span multiple data centers; there is only a single master that replicates via rsync. But, in DataStax Enterprise, multiple writes to search indexes in different data centers are merged together (i.e., writes can occur anywhere)
• Solr indexes in DataStax Enterprise can be dropped/recreated/rebuilt on the fly (versus how things are done in native Solr) |
|