Migrating From Filenet to Content Manager OnDemand

From CMOD.wiki
Revision as of 15:28, 15 October 2015 by Jderrick (talk | contribs) (Added Reconciliation.)
Jump to navigation Jump to search

Scope

These are some recommendations and tips from my experiences migrating Filenet data to Content Manager OnDemand - since I'm not a Filenet admin, Filenet is outside the scope of this article -- this only deals with the OnDemand-specific tips and tricks to make your migration easier.

Nomenclature

This is the worst part of migrating between Filenet and CMOD, as some terms are used in both systems, but have different contexts and meanings. The process is complex enough without the added headaches of misunderstandings brought about by ambiguous terms.

Filenet Nomenclature

Document Class (aka "DocClass")
Defines the metadata used to find individual reports.

Content Manager OnDemand Nomenclature

Application Group (aka "App Group" or simply "AG")
A way of combining many different reports into a single group of data, organized by business need. Accounting reports for accounting, and operations reports for your operations teams. Reports that are bundled into Application Groups need to have the same index fields, storage hierarchy, and retention (ie, expiration) handling.
Application (aka "App")
Defines the type of document (AFP, Line data, PDF, Image), and how to collect search criteria (aka 'indexes') for storage in the database.
Multiple Applications of any data type (ie, AFP for customer statements, Line data reports generated by a mainframe, special letters or notices in PDF format, and incoming faxes stores as TIFF images).
Folder
The Folder in OnDemand abstracts the internal complexities of Application Groups and Applications, and presents users with the fields that they can search (which were populated into the database by the Application) and sets limits on their queries (maximum number of returned hits, fields required for searches, etc.)


Considerations

These are items that should be given priority. Getting it right at this stage will mean a faster, easier, cheaper transition to CMOD at the end of the day.

Converting Document Classes

Content Manager OnDemand ("CMOD") has an entirely different architecture than the Filenet products. In CMOD vernacular, an 'Application' is analogous to an individual report. But in CMOD, the top of the hierarchy is the 'Application Group' -- a grouping of Applications (aka 'reports') where the index fields, storage, and retention requirements are all the same. Properly defined Application Groups can have multiple Applications (again, 'reports') that belong to it. The most rational way to design Application Groups is to combine reports together that fulfill a specific business need. Human Resources reports shouldn't intermingle with Accounts Payable (even if they have the same index fields), and are kept logically separate by keeping their reports in separate Application Groups.

Quantify index usage

OnDemand doesn't like to have indexes defined in the Application Group without a corresponding value appearing in the reports it processes -- it also wastes space inside the database. It seems common in the Filenet world to assign a report to a Document Class that has indexes configured that simply don't exist anywhere in the report. Yes, you can assign default values to the empty fields to get ACIF to stop complaining, but if you want to do this right, you'll want to look into your index usage. Not just which fields you're populating most often, but also which fields your end users are searching on. Eliminating unused fields from Application Group definitions will streamline indexing, reduce storage costs, and reduce complaints from end users at the end of the day.

Transfer in Original Formats

For some Filenet installations, upstream servers (or intermediate file transfer systems) convert report data (from EBCDIC to ASCII) and change the formatting of the report. Content Manager OnDemand doesn't need any data transformation, and can ingest EBCDIC reports (of fixed record length, stream, or variable record lengths) directly and without conversion. Some conversion tools (I'm looking at you, MQ Series File Transfer Edition) can be configured to change the report so drastically, that CMOD can't properly index it.

Wherever possible, remove any data conversion and deliver report data to OnDemand in its original format.

This means that you may require two different Applications ("Report Defintions") for each report -- one for the report in its original format (EBCDIC) and one for the converted version (ASCII). For this reason alone, you should always define an Application ID Field in EVERY Application Group.

Image Overlays for Reports

If a report has a graphic "overlay" (like, an image with boxes around columns, or shaded bars, or graphic logos) this should be documented as early in the process as possible. In order for these overlays to be displayed on all platforms, line data reports will need to be converted to AFP. This will require any overlay graphics not in AFP format to be converted -- a process which can take a considerable amount of time to complete, especially if there is not someone available to do the translation 'in-house'.

Review Report Types & Audience

There's no better time to review the contents of reports, and refer with end users to determine which reports should be stored, indexed, managed, and disposed of in the same manner. Put on your Business Analyst cap and strap on your most comfortable telephone headset, because this is the most time consuming and manual part of the whole process.

Reconciliation

In order to reconcile the documents after the migration is complete, you'll want to have a unique identifier for each document that needs to be moved out of FileNet and into CMOD. Thankfully, FileNet provides a unique 'Document Indentifier' or "DocID" -- which can be loaded into CMOD by adding a corresponding field in the Application Group definition. The DocID field doesn't need to be added to CMOD folders, so it can remain invisible to end users, or exposed to a different folder to be used with existing tools that use them.

Differences in Functionality

Blank Fields

In Filenet systems, metadata fields can be blank -- including date fields. In the OnDemand world, fields are NEVER allowed to be blank -- the rationale is that you can't find a document that doesn't have all of the index values properly populated in the database. In the cases where fields are empty, there are a few options:

  • Remove the field from the Application Group definition
  • 'Retrofit' the index file with the missing data from another source
  • Discard the metadata from Filenet and run the reports through one of the CMOD Indexers like ACIF.

PDF Indexing

PDFs are indexed differently in CMOD than they are in Filenet. Filenet breaks PDF documents into bundles of pages, but the entire PDF document remains available to the end user. In Content Manager OnDemand, when a PDF is indexed, it is broken into individual documents, and the individual PDF file becomes multiple individual PDF files. There is no clear way to reverse this process in CMOD.

Ingesting the exported data

You'll want to make sure that during the export process that you consider the information you'll need to get the exported data into Content Manager OnDemand quickly and easily.

Provide report names

In order to get specific reports into OnDemand, you need to provide the name of an report (likely as an Application). Make life easier for yourself by including the name of the report in the file names you output.

Group reports in chronological order

Due to the way table segmentation works in OnDemand, you'll want to load the data in chronological order. When you name files, consider including a date field in YYYY-MM-DD format, so it can be sorted numerically at load time. This ensures that when the production server goes live, that end users will get speedy and fast database queries.

Concatenate Reports

Concatenating reports together means fewer loads (and less overhead, as each load can represent up to 10k in metadata). It also means you'll get better compression for storage. Depending on the volume of data for a particular report, you may be able to group reports together by month -- and this also works perfectly with the point above, keeping groups of data with similar dates together inside database tables.

Produce output in manageable batches

When producing output, remember that you'll likely need to transfer this data between systems, possibly across the network, and onto different operating systems. There are limitations to different archiving and compression tools (32678 files for .zip archives, and 2GB file size limits for older versions of gzip and bzip2), and you don't want to lose too much time or effort if a file transfer is interrupted. It's best to produce managable, similarly-sized batches that you can use to develop Applications, test loads, and promote from your Development to Quality Assurance ("QA") and Production Servers.

Order of Operations

In order to find problems with reports as quickly as possible, follow these steps in order:

  • Build a test / development Content Manager OnDemand Server
    • Make sure you have some extra temporary storage space to queue up incoming report data
  • Begin delivering duplicates of the report data to CMOD, in its original format.
    • Make some test Application Groups and get some practise indexing these reports, and figuring out any strange or non-standard report types.
  • Do you research -- everything under 'Initial Considerations'.
  • Document your new structure - create new Application Groups, select the reports that will belong to them, and identify your sample data.