Difference between revisions of "IBM CMOD Composite Indexes"

From CMOD.wiki
Jump to navigation Jump to search
m (Added more analogies about CMOD index fields, and caveats for choosing which fields should be CMOD indexes.)
m (Moved caveats section, demoted impacts.)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
{{TOCright}}
{{TOCright}}
= Disambiguation =
= Disambiguation =
Also known as "Content Manager OnDemand Multi Field Indexes".
You might be looking for information on getting metadata from documents to be loaded into Content Manager OnDemand Applications using ACIF - see the IBM CMOD Indexing Reference:  [https://www.ibm.com/support/knowledgecenter/SSEPCD_10.5.0/com.ibm.ondemand.ir.doc/dodnt000.htm v10.5]   [https://cmod.wiki/dox/CMODv10.1/IndexingReference.pdf v10.1]   [https://cmod.wiki/dox/CMODv9.5/IndexingReference.pdf v9.5]   [https://cmod.wiki/dox/CMODv9.0/IndexingReference.pdf v9.0]   [https://cmod.wiki/dox/CMODv8.5/Indexing_Reference.pdf v8.5]   [https://cmod.wiki/dox/CMODv8.4/Indexing_Reference.pdf v8.4]
You might be looking for information on getting metadata from documents to be loaded into Content Manager OnDemand Applications using ACIF - see the IBM CMOD Indexing Reference:  [https://www.ibm.com/support/knowledgecenter/SSEPCD_10.5.0/com.ibm.ondemand.ir.doc/dodnt000.htm v10.5]   [https://cmod.wiki/dox/CMODv10.1/IndexingReference.pdf v10.1]   [https://cmod.wiki/dox/CMODv9.5/IndexingReference.pdf v9.5]   [https://cmod.wiki/dox/CMODv9.0/IndexingReference.pdf v9.0]   [https://cmod.wiki/dox/CMODv8.5/Indexing_Reference.pdf v8.5]   [https://cmod.wiki/dox/CMODv8.4/Indexing_Reference.pdf v8.4]


Line 6: Line 8:
In IBM Content Manager OnDemand, an Application Group's field type can be set to Filter or Index.  Setting the type to 'Index' tells CMOD that it should instruct the database to build an index for that field.  Setting the value to Filter means that the field is used to filter the results after the index search is completed.
In IBM Content Manager OnDemand, an Application Group's field type can be set to Filter or Index.  Setting the type to 'Index' tells CMOD that it should instruct the database to build an index for that field.  Setting the value to Filter means that the field is used to filter the results after the index search is completed.


Database indexes make access to data faster by storing a value with the location of the data in the index -- similar to how the index at the back of a reference book works.  The advantage is that if the data being indexed is mostly unique ("high cardinality") the indexes can be used to make searches very fast.  In the OnDemand Application Group configuration window, in the Field Information tab, fields that are set to 'Index' should also have the 'Required' checkbox selected in the Folder configuration, Field Information Tab.  This forces end users to enter a value into the field in order to perform a search - reducing the potential for end users to submit queries that will perform poorly (consuming a disproportionate amount of server I/O bandwidth, RAM, and CPU), as well as reducing the possibility that one of these 'rogue queries' will impact the performance of other user's searches or retrievals.
Database indexes make access to data faster by storing a list of metadata stored inside the database table alongside the location of the data in the index -- similar to how the index at the back of a reference book works.  The advantage is that if the data being indexed is mostly unique ("high cardinality" is the term to use when talking to your Database Administrator) the indexes can be used to make searches very fast.  In the OnDemand Application Group configuration window, in the Field Information tab, fields that are set to 'Index' should also have the 'Required' checkbox selected in the Folder configuration, Field Information Tab.  This forces end users to enter a value into the field in order to perform a search - reducing the potential for end users to submit queries that will perform poorly (consuming a disproportionate amount of server I/O bandwidth, RAM, and CPU), as well as reducing the possibility that one of these 'rogue queries' will impact the performance of other user's searches or retrievals.


For those who like analogies:  Using 'Filter' Field Type is like trying to find a word by searching every page in a very long book.  Using 'Index' is like using the index at the back of a book, telling you where to find each occurrence of the word by page number.  In fact, this naming convention is used in databases -- indexes point to database pages (like pages in a book), which contain records ("words").  For this reason, you want indexes on fields that are the '''most unique''' like 'customer number' which should uniquely identify a customer account, but not 'zip code', which could encompass tens of thousands of customers in large cities.  To continue the book analogy, the index in the back of your book doesn't contain a list of the pages that the words "and", "the", "a" or "to" -- because those words are likely to appear on every page of the book, and are not just useless, but also consume a huge amount of space while being useless.
For those who like analogies:  Using 'Filter' Field Type is like trying to find a word by searching every page in a very long book.  Using 'Index' is like using the index at the back of a book, telling you where to find each occurrence of the word by page number.  In fact, this naming convention is used in databases -- indexes point to database pages (like pages in a book), which contain records ("words").  For this reason, you want indexes on fields that are the '''most unique''' like 'customer number' which should uniquely identify a customer account, but not 'zip code', which could encompass tens of thousands of customers in large cities.  To continue the book analogy, the index in the back of your book doesn't contain a list of the pages that the words "and", "the", "a" or "to" -- because those words are likely to appear on every page of the book, and are not just useless, but also consume a huge amount of space while being useless.


Composite Indexes are database indexes that are created by adding the values of two or more fields together.  If the fields selected for the composite indexes are mostly unique (high specificity or cardinality) the index may be very large, but it should be able to identify a single record -- and be extraordinarily efficient in the process.  In order to be effective at reducing the duration of queries, ALL of the fields that appear in the composite index must also be set to be 'Required' fields in the Folder configuration.
Composite Indexes are database indexes that are created by adding the values of two or more fields together.  If the fields selected for the composite indexes are mostly unique (high specificity or cardinality) the index may be very large, but it should be able to identify a single record -- and be extraordinarily efficient in the process.  In order to be effective at reducing the duration of queries, ALL of the fields that appear in the composite index must also be set to be 'Required' fields in the Folder configuration.
Finally the '''order''' that the fields appear in the OnDemand Multiple Field Index appears in is also important - the highest cardinality ("most unique") field must appear first for maximum performance.


[[File:RequiringIndexedFields.png|800px|Improving IBM CMOD Performance by Requiring Indexed Fields ]]
[[File:RequiringIndexedFields.png|800px|Improving IBM CMOD Performance by Requiring Indexed Fields ]]
==Caveats==
As mentioned above, it's important to use indexes on fields carefully, to ensure maximum benefit at minimum cost in terms of storage space and time.  Also, the more Content Manager OnDemand Application Group fields that have indexing enabled, the more time and effort it takes to create, update, and maintain them, possibly slowing down the load process, and extending maintenance windows for arsmaint.


= Usage =
= Usage =
Line 29: Line 30:
When selecting which Application Group field to add to the composite index, the order of the fields is extremely important.  Use the field that has the highest cardinality (the most unique values or, alternately the fewest repeated values) generally followed by the date.  This combination allows searches to identify a specific customer's documents nearly instantly, and also zero-in on a specific date range as well.  Optimizing the use of Content Manager OnDemand composite indexes requires research into the types of queries that users are performing in OnDemand.  Your Database Administrator responsible for IBM CMOD will have access to utilities to help determine how best to achieve better Content Manager OnDemand performance, or you can rely on the experts at [https://cmod.cloud/query-optimization/ CMOD.Cloud].
When selecting which Application Group field to add to the composite index, the order of the fields is extremely important.  Use the field that has the highest cardinality (the most unique values or, alternately the fewest repeated values) generally followed by the date.  This combination allows searches to identify a specific customer's documents nearly instantly, and also zero-in on a specific date range as well.  Optimizing the use of Content Manager OnDemand composite indexes requires research into the types of queries that users are performing in OnDemand.  Your Database Administrator responsible for IBM CMOD will have access to utilities to help determine how best to achieve better Content Manager OnDemand performance, or you can rely on the experts at [https://cmod.cloud/query-optimization/ CMOD.Cloud].


= Impact on Storage =
=Caveats=
As mentioned above, it's important to use indexes on fields carefully, to ensure maximum benefit at minimum cost in terms of storage space, bandwidth, and time.  Also, the more Content Manager OnDemand Application Group fields that have indexing enabled, the more time and effort it takes to create, update, and maintain them, possibly slowing down the load process, and extending maintenance windows for arsmaint.
 
== Impact on Storage ==
Composite indexes will require that you store the contents of both columns, along with a list of database page identifiers.  If your CMOD Application Group is especially large, this can cause the database tables for that specific App Group to consume anywhere from 10 to 50% more space.  Check with your Database Administrators and System Administrators to ensure that you have enough space to complete the index creation.
Composite indexes will require that you store the contents of both columns, along with a list of database page identifiers.  If your CMOD Application Group is especially large, this can cause the database tables for that specific App Group to consume anywhere from 10 to 50% more space.  Check with your Database Administrators and System Administrators to ensure that you have enough space to complete the index creation.
== Impact on I/O ==
You may think to yourself, "Storage is cheap, let's index everything!" but there's different cost -- storage bandwidth.  In order for indexes to be the most effective, indexes must be '''compact''' - the fewest number of rows in the index with a relatively short list of records following after it.  Building Multi Field Indexes in CMOD with a large number of fields could effectively DOUBLE the size of the database table -- making the task of searching it just as slow as not having an index at all.

Latest revision as of 21:31, 25 April 2023

Disambiguation

Also known as "Content Manager OnDemand Multi Field Indexes".

You might be looking for information on getting metadata from documents to be loaded into Content Manager OnDemand Applications using ACIF - see the IBM CMOD Indexing Reference: v10.5   v10.1   v9.5   v9.0   v8.5   v8.4

Background

In IBM Content Manager OnDemand, an Application Group's field type can be set to Filter or Index. Setting the type to 'Index' tells CMOD that it should instruct the database to build an index for that field. Setting the value to Filter means that the field is used to filter the results after the index search is completed.

Database indexes make access to data faster by storing a list of metadata stored inside the database table alongside the location of the data in the index -- similar to how the index at the back of a reference book works. The advantage is that if the data being indexed is mostly unique ("high cardinality" is the term to use when talking to your Database Administrator) the indexes can be used to make searches very fast. In the OnDemand Application Group configuration window, in the Field Information tab, fields that are set to 'Index' should also have the 'Required' checkbox selected in the Folder configuration, Field Information Tab. This forces end users to enter a value into the field in order to perform a search - reducing the potential for end users to submit queries that will perform poorly (consuming a disproportionate amount of server I/O bandwidth, RAM, and CPU), as well as reducing the possibility that one of these 'rogue queries' will impact the performance of other user's searches or retrievals.

For those who like analogies: Using 'Filter' Field Type is like trying to find a word by searching every page in a very long book. Using 'Index' is like using the index at the back of a book, telling you where to find each occurrence of the word by page number. In fact, this naming convention is used in databases -- indexes point to database pages (like pages in a book), which contain records ("words"). For this reason, you want indexes on fields that are the most unique like 'customer number' which should uniquely identify a customer account, but not 'zip code', which could encompass tens of thousands of customers in large cities. To continue the book analogy, the index in the back of your book doesn't contain a list of the pages that the words "and", "the", "a" or "to" -- because those words are likely to appear on every page of the book, and are not just useless, but also consume a huge amount of space while being useless.

Composite Indexes are database indexes that are created by adding the values of two or more fields together. If the fields selected for the composite indexes are mostly unique (high specificity or cardinality) the index may be very large, but it should be able to identify a single record -- and be extraordinarily efficient in the process. In order to be effective at reducing the duration of queries, ALL of the fields that appear in the composite index must also be set to be 'Required' fields in the Folder configuration.

Finally the order that the fields appear in the OnDemand Multiple Field Index appears in is also important - the highest cardinality ("most unique") field must appear first for maximum performance.

Improving IBM CMOD Performance by Requiring Indexed Fields

Usage

In solutions where Content Manager OnDemand must process a large number of queries in a short period of time (like tax season) creating indexes on a few single fields might not be enough to ensure that all queries are completed quickly. In these situations, IBM CMOD can be configured to have a 'composite index' - a single index which contains more than one field - like CUST_NUM+STMT_DATE to combine 'customer number' and 'statement date' data into a single index record. This means that a customer who searches for their customer number for a particular date should be able to find their statement immediately, because the index tells the database where to find the rest of their information in the database table.

Composite indexes are especially effective when building line-of-business front-ends through the ODWEK Java API, where you have strict control over the fields used to search for specific documents, like investment statements.

Image: Advanced Indexing Parameters window & configured composite index.

Selecting fields

When selecting which Application Group field to add to the composite index, the order of the fields is extremely important. Use the field that has the highest cardinality (the most unique values or, alternately the fewest repeated values) generally followed by the date. This combination allows searches to identify a specific customer's documents nearly instantly, and also zero-in on a specific date range as well. Optimizing the use of Content Manager OnDemand composite indexes requires research into the types of queries that users are performing in OnDemand. Your Database Administrator responsible for IBM CMOD will have access to utilities to help determine how best to achieve better Content Manager OnDemand performance, or you can rely on the experts at CMOD.Cloud.

Caveats

As mentioned above, it's important to use indexes on fields carefully, to ensure maximum benefit at minimum cost in terms of storage space, bandwidth, and time. Also, the more Content Manager OnDemand Application Group fields that have indexing enabled, the more time and effort it takes to create, update, and maintain them, possibly slowing down the load process, and extending maintenance windows for arsmaint.

Impact on Storage

Composite indexes will require that you store the contents of both columns, along with a list of database page identifiers. If your CMOD Application Group is especially large, this can cause the database tables for that specific App Group to consume anywhere from 10 to 50% more space. Check with your Database Administrators and System Administrators to ensure that you have enough space to complete the index creation.

Impact on I/O

You may think to yourself, "Storage is cheap, let's index everything!" but there's different cost -- storage bandwidth. In order for indexes to be the most effective, indexes must be compact - the fewest number of rows in the index with a relatively short list of records following after it. Building Multi Field Indexes in CMOD with a large number of fields could effectively DOUBLE the size of the database table -- making the task of searching it just as slow as not having an index at all.