Java Mailing List Archive

http://www.gg3721.com/

Home » Hibernate Commits List »

[hibernate-commits] Hibernate SVN: r14943 - in search/trunk:
 src/java/org/hibernate/search/reader and 1 other directory.

hibernate-commits

2008-07-16


Author LoginPost Reply
Author: epbernard
Date: 2008-07-16 22:29:47 -0400 (Wed, 16 Jul 2008)
New Revision: 14943

Modified:
 search/trunk/doc/reference/en/modules/architecture.xml
 search/trunk/doc/reference/en/modules/configuration.xml
 search/trunk/src/java/org/hibernate/search/reader/ReaderProviderFactory.java
Log:
HSEARCH-212 add documentation for shared segments and give it a name: stared-segments

Modified: search/trunk/doc/reference/en/modules/architecture.xml
===================================================================
--- search/trunk/doc/reference/en/modules/architecture.xml  2008-07-17 01:01:51 UTC (rev 14942)
+++ search/trunk/doc/reference/en/modules/architecture.xml  2008-07-17 02:29:47 UTC (rev 14943)
@@(protected) @@
<?xml version="1.0" encoding="UTF-8"?>
<chapter id="search-architecture">
- <!-- $Id$ -->  
+ <!-- $Id$ -->
+
 <title>Architecture</title>

 <section>
@@(protected) @@
  detects the presence of a transaction and adjust the scoping.</para>

  <note>
-    Hibernate Search works perfectly fine in the Hibernate / EntityManager long conversation pattern aka. atomic conversation.
+     Hibernate Search works perfectly fine in the Hibernate / EntityManager long conversation pattern aka. atomic conversation.
  </note>

  <note>
-    Depending on user demand, additional scoping will be considered, the pluggability mechanism being already in place.
+     Depending on user demand, additional scoping will be considered, the pluggability mechanism being already in place.
  </note>
 </section>

@@(protected) @@
    <title>Shared</title>

    <para>With this strategy, Hibernate Search will share the same
-    IndexReader, for a given Lucene index, across multiple queries and
-    threads provided that the IndexReader is still up-to-date. If the
-    IndexReader is not up-to-date, a new one is opened and provided.
-    Generally speaking, this strategy provides much better performances than
-    the <literal>not-shared</literal> strategy. It is especially true if the
-    number of updates is much lower than the reads. This strategy is the
-    default.</para>
+    <classname>IndexReader</classname>, for a given Lucene index, across
+    multiple queries and threads provided that the
+    <classname>IndexReader</classname> is still up-to-date. If the
+    <classname>IndexReader</classname> is not up-to-date, a new one is
+    opened and provided. Generally speaking, </para>
  </section>

  <section>
+    <title>Shared Segments</title>
+
+    <para>This strategies goes a step further the shared strategy and tries
+    to minimize reopening even when the underlying index has changed. Each
+    <classname>IndexReader</classname> is made of several
+    <classname>SegmentReader</classname>s. This strategy only reopens
+    segments that have been modified or created and shared the already
+    loaded segments. This strategy will become the default strategy in the
+    near future.</para>
+
+    <para>The name of this strategy is
+    <literal>shared-segments</literal>.</para>
+   </section>
+
+   <section>
    <title>Not-shared</title>

-    <para>Every time a query is executed, a Lucene IndexReader is opened.
-    This strategy is not the most efficient since opening and warming up an
-    IndexReader can be a relatively expensive operation.</para>
+    <para>Every time a query is executed, a Lucene
+    <classname>IndexReader</classname> is opened. This strategy is not the
+    most efficient since opening and warming up an
+    <classname>IndexReader</classname> can be a relatively expensive
+    operation.</para>
+
+    <para>The name of this strategy is <literal>not-shared</literal>.</para>
  </section>

  <section>
@@(protected) @@
    needs by implementing
    <classname>org.hibernate.search.reader.ReaderProvider</classname>. The
    implementation must be thread safe.</para>
-
-    <note>
-     <para>Some additional strategies are planned in future versions of
-     Hibernate Search</para>
-    </note>
  </section>
 </section>
</chapter>
\ No newline at end of file

Modified: search/trunk/doc/reference/en/modules/configuration.xml
===================================================================
--- search/trunk/doc/reference/en/modules/configuration.xml  2008-07-17 01:01:51 UTC (rev 14942)
+++ search/trunk/doc/reference/en/modules/configuration.xml  2008-07-17 02:29:47 UTC (rev 14943)
@@(protected) @@
<?xml version="1.0" encoding="UTF-8"?>
<chapter id="search-configuration">
- <!-- $Id$ -->  
+ <!-- $Id$ -->
+
 <title>Configuration</title>

 <section id="search-configuration-directory" revision="1">
@@(protected) @@
        based on an incremental copy mechanism reducing the average copy
        time.</para><para>DirectoryProvider typically used on the master
        node in a JMS back end cluster.</para><para>The <literal>
-        buffer_size_on_copy</literal> optimum depends
-        on your operating system and available RAM; most people reported
-        good results using values between 16 and 64MB.</para></entry>
+        buffer_size_on_copy</literal> optimum depends on your operating
+        system and available RAM; most people reported good results using
+        values between 16 and 64MB.</para></entry>

        <entry><para><literal>indexBase</literal>: Base
        directory</para><para><literal>indexName</literal>: override
@@(protected) @@
        <filename>&lt;sourceBase&gt;/&lt;source&gt;</filename>
        </para><para><literal>refresh</literal>: refresh period in second
        (the copy will take place every refresh seconds).</para><para>
-        <literal>buffer_size_on_copy</literal>: The amount of
-        MegaBytes to move in a single low level copy instruction;
-        defaults to 16MB.</para></entry>
+        <literal>buffer_size_on_copy</literal>: The amount of MegaBytes to
+        move in a single low level copy instruction; defaults to
+        16MB.</para></entry>
      </row>

      <row>
@@(protected) @@
        information (default 3600 seconds - 60 minutes).</para><para>Note
        that the copy is based on an incremental copy mechanism reducing
        the average copy time.</para><para>DirectoryProvider typically
-        used on slave nodes using a JMS back end.</para><para>The <literal>
-        buffer_size_on_copy</literal> optimum depends
-        on your operating system and available RAM; most people reported
-        good results using values between 16 and 64MB.</para></entry>
+        used on slave nodes using a JMS back end.</para><para>The
+        <literal> buffer_size_on_copy</literal> optimum depends on your
+        operating system and available RAM; most people reported good
+        results using values between 16 and 64MB.</para></entry>

        <entry><para><literal>indexBase</literal>: Base
        directory</para><para><literal>indexName</literal>: override
@@(protected) @@
        <filename>&lt;sourceBase&gt;/&lt;source&gt;</filename>
        </para><para><literal>refresh</literal>: refresh period in second
        (the copy will take place every refresh seconds).</para><para>
-        <literal>buffer_size_on_copy</literal>: The amount of
-        MegaBytes to move in a single low level copy instruction;
-        defaults to 16MB.</para></entry>
+        <literal>buffer_size_on_copy</literal>: The amount of MegaBytes to
+        move in a single low level copy instruction; defaults to
+        16MB.</para></entry>
      </row>

      <row>
@@(protected) @@
  <title>Reader strategy configuration</title>

  <para>The different reader strategies are described in <xref
-   linkend="search-architecture-readerstrategy" />. The default reader
-   strategy is <literal>shared</literal>. This can be adjusted:</para>
+   linkend="search-architecture-readerstrategy" />. Out of the box strategies
+   are:</para>

+   <itemizedlist>
+    <listitem>
+     <para><literal>shared</literal>: share index readers across several
+     queries</para>
+    </listitem>
+
+    <listitem>
+     <para><literal>shared-segments</literal>: index readers are shared
+     across several queries and when reopening is needed, the inchanged
+     state is shared. This strategy is the most efficient.</para>
+    </listitem>
+
+    <listitem>
+     <para><literal>not-shared</literal>: create an index reader for each
+     individual query</para>
+    </listitem>
+   </itemizedlist>
+
+   <para>The default reader strategy is <literal>shared</literal>. This can
+   be adjusted:</para>
+
  <programlisting>hibernate.search.reader.strategy = not-shared</programlisting>

-   <para>Adding this property switch to the <literal>non shared</literal>
+   <para>Adding this property switch to the <literal>not-shared</literal>
  strategy.</para>

  <para>Or if you have a custom reader strategy:</para>
@@(protected) @@
  Lucene <literal>IndexWriter</literal> such as
  <literal>mergeFactor</literal>, <literal>maxMergeDocs</literal> and
  <literal>maxBufferedDocs</literal>. You can specify these parameters
-   either as default values applying for all indexes, on a per index
-   basis, or even per shard.</para>
+   either as default values applying for all indexes, on a per index basis,
+   or even per shard.</para>

  <para>There are two sets of parameters allowing for different performance
  settings depending on the use case. During indexing operations triggered
  by database modifications, the parameters are grouped by the
-   <literal>transaction</literal> keyword:
-   <programlisting>hibernate.search.[default|&lt;indexname&gt;].indexwriter.transaction.&lt;parameter_name&gt;</programlisting>
-   When indexing occurs via <literal>FullTextSession.index()</literal> (see <xref
-   linkend="search-batchindex" />), the used properties are those grouped under the <literal>batch</literal> keyword:
-   <programlisting>hibernate.search.[default|&lt;indexname&gt;].indexwriter.batch.&lt;parameter_name&gt;</programlisting>
-   </para>
+   <literal>transaction</literal> keyword: <programlisting>hibernate.search.[default|&lt;indexname&gt;].indexwriter.transaction.&lt;parameter_name&gt;</programlisting>
+   When indexing occurs via <literal>FullTextSession.index()</literal> (see
+   <xref linkend="search-batchindex" />), the used properties are those
+   grouped under the <literal>batch</literal> keyword: <programlisting>hibernate.search.[default|&lt;indexname&gt;].indexwriter.batch.&lt;parameter_name&gt;</programlisting></para>

  <para>Unless the corresponding <literal>.batch</literal> property is
  explicitly set, the value will default to the
-   <literal>.transaction</literal> property.
-   If no value is set for a <literal>.batch</literal> value in a specific shard configuration,
-   Hibernate Search will look at the index section, then at the default section and after that
-   it will look for a <literal>.transaction</literal> in the same order:
-   <programlisting>
+   <literal>.transaction</literal> property. If no value is set for a
+   <literal>.batch</literal> value in a specific shard configuration,
+   Hibernate Search will look at the index section, then at the default
+   section and after that it will look for a <literal>.transaction</literal>
+   in the same order: <programlisting>
  hibernate.search.Animals.2.indexwriter.transaction.max_merge_docs 10
  hibernate.search.Animals.2.indexwriter.transaction.merge_factor 20
  hibernate.search.default.indexwriter.batch.max_merge_docs 100</programlisting>
-   This configuration will result in these settings applied to the second shard of Animals index:
-   <itemizedlist>
-    <listitem><literal>transaction.max_merge_docs</literal> = 10</listitem>
-     <listitem><literal>batch.max_merge_docs</literal> = 100</listitem>
-     <listitem><literal>transaction.merge_factor</literal> = 20</listitem>
-     <listitem><literal>batch.merge_factor</literal> = 20</listitem>
-   </itemizedlist>
-   All other values will use the defaults defined in Lucene.
-   </para>
+   This configuration will result in these settings applied to the second
+   shard of Animals index: <itemizedlist>
+     <listitem>
+      

-   <para>
-   The default for all values is to leave them at Lucene&#39;s own default,
-   so the listed values in the following table actually depend on the
-   version of Lucene you are using;
-   values shown are relative to version <literal>2.3</literal>.
-   For more information about Lucene indexing performances, please
-   refer to the Lucene documentation.</para>
+       <literal>transaction.max_merge_docs</literal>

+       = 10
+     </listitem>
+
+     <listitem>
+      
+
+       <literal>batch.max_merge_docs</literal>
+
+       = 100
+     </listitem>
+
+     <listitem>
+      
+
+       <literal>transaction.merge_factor</literal>
+
+       = 20
+     </listitem>
+
+     <listitem>
+      
+
+       <literal>batch.merge_factor</literal>
+
+       = 20
+     </listitem>
+    </itemizedlist> All other values will use the defaults defined in
+   Lucene.</para>
+
+   <para>The default for all values is to leave them at Lucene's own default,
+   so the listed values in the following table actually depend on the version
+   of Lucene you are using; values shown are relative to version
+   <literal>2.3</literal>. For more information about Lucene indexing
+   performances, please refer to the Lucene documentation.</para>
+
  <table>
    <title>List of indexing performance properties</title>

@@(protected) @@
     </thead>

     <tbody>
-    
      <row>
        <entry><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.[transaction|batch].max_buffered_delete_terms</literal></entry>

-        <entry><para>Determines the minimal number of delete terms required before the buffered
-      in-memory delete terms are applied and flushed. If there are documents
-      buffered in memory at the time, they are merged and a new segment is
-        created.</para></entry>
+        <entry><para>Determines the minimal number of delete terms
+        required before the buffered in-memory delete terms are applied
+        and flushed. If there are documents buffered in memory at the
+        time, they are merged and a new segment is created.</para></entry>

        <entry>Disabled (flushes by RAM usage)</entry>
      </row>
@@(protected) @@
        <entry><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.[transaction|batch].max_buffered_docs</literal></entry>

        <entry><para>Controls the amount of documents buffered in memory
-        during indexing. The bigger the more RAM is consumed.</para>
-       </entry>
+        during indexing. The bigger the more RAM is
+        consumed.</para></entry>

        <entry>Disabled (flushes by RAM usage)</entry>
      </row>
@@(protected) @@
      <row>
        <entry><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.[transaction|batch].max_field_length</literal></entry>

-        <entry><para>The maximum number of terms that will be indexed for a single field.
-        This limits the amount of memory required for indexing so that very large data will not crash the indexing process by
-      running out of memory. This setting refers to the number of running terms,
-      not to the number of different terms.</para>
-      <para>This silently truncates large documents, excluding from the index all terms that occur further in the document.
-      If you know your source documents are large, be sure to set this value high enough to accomodate the expected size.
-      If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError.
-      </para>
-      <para>If setting this value in <literal>batch</literal> differently than in <literal>transaction</literal>
-      you may get different data (and results) in your index depending on the indexing mode.</para>
-       </entry>
+        <entry><para>The maximum number of terms that will be indexed for
+        a single field. This limits the amount of memory required for
+        indexing so that very large data will not crash the indexing
+        process by running out of memory. This setting refers to the
+        number of running terms, not to the number of different
+        terms.</para> <para>This silently truncates large documents,
+        excluding from the index all terms that occur further in the
+        document. If you know your source documents are large, be sure to
+        set this value high enough to accomodate the expected size. If you
+        set it to Integer.MAX_VALUE, then the only limit is your memory,
+        but you should anticipate an OutOfMemoryError. </para> <para>If
+        setting this value in <literal>batch</literal> differently than in
+        <literal>transaction</literal> you may get different data (and
+        results) in your index depending on the indexing
+        mode.</para></entry>

        <entry>10000</entry>
      </row>
-      
+
      <row>
        <entry><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.[transaction|batch].max_merge_docs</literal></entry>

-        <entry><para>Defines the largest number of documents allowed in a segment.
-        Larger values are best for batched indexing and speedier searches.
-        Small values are best for transaction indexing.</para></entry>
+        <entry><para>Defines the largest number of documents allowed in a
+        segment. Larger values are best for batched indexing and speedier
+        searches. Small values are best for transaction
+        indexing.</para></entry>

        <entry>Unlimited (Integer.MAX_VALUE)</entry>
      </row>
@@(protected) @@
      <row>
        <entry><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.[transaction|batch].ram_buffer_size</literal></entry>

-        <entry><para>Controls the amount of RAM in MB dedicated to document buffers.
-        When used together max_buffered_docs a flush occurs for whichever event happens first.</para>
-        <para>Generally for faster indexing performance it's best to flush by RAM usage instead of document
-        count and use as large a RAM buffer as you can.</para>
-        </entry>
+        <entry><para>Controls the amount of RAM in MB dedicated to
+        document buffers. When used together max_buffered_docs a flush
+        occurs for whichever event happens first.</para> <para>Generally
+        for faster indexing performance it's best to flush by RAM usage
+        instead of document count and use as large a RAM buffer as you
+        can.</para></entry>

        <entry>16 MB</entry>
      </row>
+
      <row>
        <entry><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.[transaction|batch].term_index_interval</literal></entry>

-        <entry><para>Expert: Set the interval between indexed terms.</para>
-        <para>Large values cause less memory to be used by IndexReader, but slow random-access to terms.
-        Small values cause more memory to be used by an IndexReader, and speed
-        random-access to terms. See Lucene documentation for more details.</para>
-        </entry>
+        <entry><para>Expert: Set the interval between indexed
+        terms.</para> <para>Large values cause less memory to be used by
+        IndexReader, but slow random-access to terms. Small values cause
+        more memory to be used by an IndexReader, and speed random-access
+        to terms. See Lucene documentation for more
+        details.</para></entry>

        <entry>128</entry>
      </row>
-
     </tbody>
    </tgroup>
  </table>
 </section>
-</chapter>
+</chapter>
\ No newline at end of file

Modified: search/trunk/src/java/org/hibernate/search/reader/ReaderProviderFactory.java
===================================================================
--- search/trunk/src/java/org/hibernate/search/reader/ReaderProviderFactory.java  2008-07-17 01:01:51 UTC (rev 14942)
+++ search/trunk/src/java/org/hibernate/search/reader/ReaderProviderFactory.java  2008-07-17 02:29:47 UTC (rev 14943)
@@(protected) @@
   ReaderProvider readerProvider;
   if ( StringHelper.isEmpty( impl ) ) {
     //put another one
-        readerProvider = new SharedReaderProvider();
+      readerProvider = new SharedReaderProvider();
   }
   else if ( "not-shared".equalsIgnoreCase( impl ) ) {
     readerProvider = new NotSharedReaderProvider();
@@(protected) @@
   else if ( "shared".equalsIgnoreCase( impl ) ) {
     readerProvider = new SharedReaderProvider();
   }
+    else if ( "shared-segments".equalsIgnoreCase( impl ) ) {
+      readerProvider = new SharingBufferReaderProvider();
+    }
   else {
     try {
       Class readerProviderClass = ReflectHelper.classForName( impl, ReaderProviderFactory.class );

_______________________________________________
hibernate-commits mailing list
hibernate-commits@(protected)
https://lists.jboss.org/mailman/listinfo/hibernate-commits
©2008 gg3721.com - Jax Systems, LLC, U.S.A.