Перевод "Getting Started with Zend_Search_Lucene. Indexing"

Zend, “Getting Started with Zend_Search_Lucene. Indexing”, public translation into Russian from English More about this translation.

See also 44 similar translations

Translate into another language.


Join Translated.by to translate! If you already have a Translated.by account, please sign in.
If you do not want to register an account, you can sign in with OpenID.
Pages: ← previous Ctrl next next untranslated
1 2

Getting Started with Zend_Search_Lucene. Indexing

Indexing is performed by adding a new document to an existing or new index:

  01. $index->addDocument($doc);

There are two ways to create document object. The first is to do it manually.

Example #1 Manual Document Construction

  01. $doc = new Zend_Search_Lucene_Document();

  02. $doc->addField(Zend_Search_Lucene_Field::Text('url', $docUrl));

  03. $doc->addField(Zend_Search_Lucene_Field::Text('title', $docTitle));

  04. $doc->addField(Zend_Search_Lucene_Field::unStored('contents', $docBody));

  05. $doc->addField(Zend_Search_Lucene_Field::binary('avatar', $avatarData));

The second method is to load it from HTML or Microsoft Office 2007 files:

Example #2 Document loading

  01. $doc = Zend_Search_Lucene_Document_Html::loadHTML($htmlString);

  02. $doc = Zend_Search_Lucene_Document_Docx::loadDocxFile($path);

  03. $doc = Zend_Search_Lucene_Document_Pptx::loadPptFile($path);

  04. $doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($path);

If a document is loaded from one of the supported formats, it still can be extended manually with new user defined fields.

Indexing Policy

You should define indexing policy within your application architectural design.

You may need an on-demand indexing configuration (something like OLTP system). In such systems, you usually add one document per user request. As such, the MaxBufferedDocs option will not affect the system. On the other hand, MaxMergeDocs is really helpful as it allows you to limit maximum script execution time. MergeFactor should be set to a value that keeps balance between the average indexing time (it's also affected by average auto-optimization time) and search performance (index optimization level is dependent on the number of segments).

If you will be primarily performing batch index updates, your configuration should use a MaxBufferedDocs option set to the maximum value supported by the available amount of memory. MaxMergeDocs and MergeFactor have to be set to values reducing auto-optimization involvement as much as possible [1]. Full index optimization should be applied after indexing.

Pages: ← previous Ctrl next next untranslated
1 2