Can I split profiling jobs in IBM Information Analyzer by rows AND columns?

41 Views Asked by At

I have some tables that I need to profile on IBM Information Analyzer that have hundreds of thousands (in some cases, even millions) of rows, and hundreds of columns (450 - 500 max). For rows, I have simply taken a sample of 20,000. Is there any setting that I can use to split profiling jobs by columns as well as rows, so that the processing server doesn't choke up?

1

There are 1 best solutions below

0
On

What version of Information Analyzer are you using? What is your data source (which type of database or flat file)? Which of the sampling options are you using? What IA analysis are you running? I will assume Column Analysis. How many processor cores and how much physical memory are available to IA?

1) In general, IA creates a separate job/task for each set of 10 columns. Example: Assume you are analyzing a single 450 column table given. This will spawn 50 jobs/tasks for Column Analysis. If your system is overwhelmed it may be undersized.

2) You can analyze fewer than all the columns. For example you can analyze the first 50 columns.

3) Most analysis you invoke from the UI can also be invoked from the command line. It may be easier for you to script a job that a) determines all the columns for a table, b) generates and runs separate jobs for each set of 50 columns.