Difference between revisions of "Case Insensitive Search"

From CMOD.wiki
Jump to navigation Jump to search
m (A series of small fixes - referenced IBM CMOD and moved table of contents.)
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{TOCright}}
== Case Insensitive Search ==
== Case Insensitive Search ==


=== Introduction ===
=== Introduction ===


When loading data into CMOD with the built-in Indexers like [[Glossary#OnDemand Indexing Tools|ACIF]] or the [[Glossary#PDF Indexer]], some data like [[Customer statements]], [[Invoices]] or [[Statements]] will be generated as 'Mixed Case', that is, containing both uppercase and lowercase characters.  OnDemand Administrators might index the data as it appears in the document, but end-users could search for the same word, with different capitalization --  and this presents a problem.  Searching for 'smith' or 'SMITH' doesn't match 'Smith', as it might appear in a report, statement or invoice, frustrating the attempt to find data on the server.
When loading data into IBM CMOD with the built-in Indexers like [[Glossary#OnDemand_Indexing_Tools|ACIF]] or the [[Glossary#OnDemand_Indexing_Tools|PDF Indexer]], some data like [[Customer statements]], [[Invoices]] or [[Statements]] will be generated as 'Mixed Case', that is, containing both uppercase and lowercase characters.  OnDemand Administrators might index the data as it appears in the document, but end-users could search for the same word, with different capitalization --  and this presents a problem.  Searching for 'smith' or 'SMITH' doesn't match 'Smith', as it might appear in a report, statement or invoice, frustrating the attempt to find data on the server.


=== Explanation ===
=== Explanation ===


By default, these index values are stored in the database as they appear in the report.  In a [[Line Data]] report, all output might appear in "ALL CAPS", whereas a [[PDF]] or [[AFP]] statement might appear in "Mixed Case".  The problem occurs when end users search for a term using the wrong 'case' of letters as described above.  [[Content Manager OnDemand]] doesn't have the ability to perform "case-insensitve" searches that would give us what we'd expect to see.  There are a few ways to resolve this issue.
By default, these index values are stored in the database as they appear in the report.  In a [[Glossary#Acronyms|Line Data]] report, all output might appear in "ALL CAPS", whereas a [[Glossary#Acronyms|PDF]] or [[Glossary#Acronyms|AFP]] statement might appear in "Mixed Case".  The problem occurs when end users search for a term using the wrong 'case' of letters as described above.  [[Introduction|Content Manager OnDemand]] doesn't have the ability to perform "case-insensitve" searches that would give us what we'd expect to see.  There are a few ways to resolve this issue.


=== Solutions ===
=== Solutions ===
Line 19: Line 20:
=== Resulting Issues ===
=== Resulting Issues ===


If the data you're indexing, searching for, and storing in CMOD requires case-sensitivity, there is no good solution to this issue, and end users should be instructed to pay close attention to proper capitalization.
If the data you're indexing, searching for, and storing in IBM CMOD requires case-sensitivity, there is no good solution to this issue, and end users should be instructed to pay close attention to proper capitalization.


The capitalization of a field cannot be changed after an Application Group has been created.
The capitalization of a field cannot be changed after an Application Group has been created.  It *is* possible to update database tables manually, but this is not supported by IBM.


Moving all characters to upper or lower makes it impossible to restore proper capitalization (McDonald, de Klock, Van Oster)
Moving all characters to upper or lower makes it impossible to restore proper capitalization (McDonald, de Klock, Van Oster)


Storing characters in Mixed Case will reduce compression inside database tables, as "smith", "Smith", and "SMITH" are all different, and must be stored individually.
Storing characters in Mixed Case will reduce compression inside database tables, as "smith", "Smith", and "SMITH" are all different, and must be stored individually.
An IBM Developerworks article exists on [https://www.ibm.com/developerworks/data/library/techarticle/0203adamache/0203adamache.html implementing case-insensitivy in DB2], but it's not clear how this can be integrated into IBM Content Manager OnDemand to provide the desired functionality.

Latest revision as of 04:22, 16 June 2017

Case Insensitive Search

Introduction

When loading data into IBM CMOD with the built-in Indexers like ACIF or the PDF Indexer, some data like Customer statements, Invoices or Statements will be generated as 'Mixed Case', that is, containing both uppercase and lowercase characters. OnDemand Administrators might index the data as it appears in the document, but end-users could search for the same word, with different capitalization -- and this presents a problem. Searching for 'smith' or 'SMITH' doesn't match 'Smith', as it might appear in a report, statement or invoice, frustrating the attempt to find data on the server.

Explanation

By default, these index values are stored in the database as they appear in the report. In a Line Data report, all output might appear in "ALL CAPS", whereas a PDF or AFP statement might appear in "Mixed Case". The problem occurs when end users search for a term using the wrong 'case' of letters as described above. Content Manager OnDemand doesn't have the ability to perform "case-insensitve" searches that would give us what we'd expect to see. There are a few ways to resolve this issue.

Solutions

A screen capture of the Application Group Configuration window, with the Field Information Tab selected, and the 'Case' pop-up menu highlighted.

By default, OnDemand chooses to store all values for field type 'String' as uppercase. The string "Smith" becomes "SMITH" in the database. Index values loaded into the database are converted to uppercase, and searches in folders are also converted to uppercase before database queries are performed.



Resulting Issues

If the data you're indexing, searching for, and storing in IBM CMOD requires case-sensitivity, there is no good solution to this issue, and end users should be instructed to pay close attention to proper capitalization.

The capitalization of a field cannot be changed after an Application Group has been created. It *is* possible to update database tables manually, but this is not supported by IBM.

Moving all characters to upper or lower makes it impossible to restore proper capitalization (McDonald, de Klock, Van Oster)

Storing characters in Mixed Case will reduce compression inside database tables, as "smith", "Smith", and "SMITH" are all different, and must be stored individually.

An IBM Developerworks article exists on implementing case-insensitivy in DB2, but it's not clear how this can be integrated into IBM Content Manager OnDemand to provide the desired functionality.