Перевод "Getting Started with Zend_Search_Lucene. Lucene Index Structure"

Zend, “Getting Started with Zend_Search_Lucene. Lucene Index Structure”, public translation into Russian from English More about this translation.

See also 44 similar translations

Translate into another language.

Participants

Join Translated.by to translate! If you already have a Translated.by account, please sign in.
If you do not want to register an account, you can sign in with OpenID.
Pages: ← previous Ctrl next next untranslated
1 2

Getting Started with Zend_Search_Lucene. Lucene Index Structure

In order to fully utilize Zend_Search_Lucene's capabilities with maximum performance, you need to understand it's internal index structure.

An index is stored as a set of files within a single directory.

An index consists of any number of independent segments which store information about a subset of indexed documents. Each segment has its own terms dictionary, terms dictionary index, and document storage (stored field values) [1] Zend_Search_Lucene. All segment data is stored in _xxxxx.cfs files, where xxxxx is a segment name.

Once an index segment file is created, it can't be updated. New documents are added to new segments. Deleted documents are only marked as deleted in an optional <segmentname>.del file.

Document updating is performed as separate delete and add operations, even though it's done using an update() API call [2] Zend_Search_Lucene API. This simplifies adding new documents, and allows updating concurrently with search operations.

On the other hand, using several segments (one document per segment as a borderline case) increases search time:

    • retrieving a term from a dictionary is performed for each segment;

    • the terms dictionary index is pre-loaded for each segment (this process takes the most search time for simple queries, and it also requires additional memory).

If the terms dictionary reaches a saturation point, then search through one segment is N times faster than search through N segments in most cases.

Index optimization merges two or more segments into a single new one. A new segment is added to the index segments list, and old segments are excluded.

Segment list updates are performed as an atomic operation. This gives the ability of concurrently adding new documents, performing index optimization, and searching through the index.

Index auto-optimization is performed after each new segment generation. It merges sets of the smallest segments into larger segments, and larger segments into even larger segments, if we have enough segments to merge.

Pages: ← previous Ctrl next next untranslated
1 2