Wednesday, December 5, 2018

IBM DataStage Array Size and Record Count tuning

Hi

I was recently working on extracting a table that has 400 columns from a SQL Server DB to load into an Oracle Target Database. I designed my job set the array size to 50,000 with Record count 200,000. To my surprise the job could not pull the top 1000 records for over an hour.






Resolution

It turns out that based on my environment Datastage settings the buffer size is too small to hold an Array size of 50,000 records (considering that each row has 400 columns). I had to reduce it to 100 Array size so the job could run in 3 mins for 1million records

What is Array size: Array size is the number of rows that datastage reads/writes to the database. This is the number of records that is pushed from one stage to another (Pipeline parallelism)

Record count: is the the number of rows that datastage commits in the database after writing as much array size.


Datastage ETL loads Causing Oracle Database to hang up and crash(Oracle Database server would be inaccessible)

Lately, we faced an issue in my project where we noticed that when running 3 jobs that loaded tables of 3.5 Gb,1Gb,2gb at the same time choked the Oracle Database. The job were ran on 2 nodes, which turns out to be 3 x 2 is 6 sessions in parallel. The Dba noticed I/O waits on the Transaction log. This was due to the number of inserts being two much for the database which cause it to lock up.

We also noticed some strange Oracle sessions with SYS user performing UPDATEs on obj$. the Sessions kept growing after its intial appearance.

Another strange thing was the Oracle Dev Database worked well with the same amount of load, when ran in parallel.



SOLUTION
If you notice that the Oracle Database is hanging up when running Datastage jobs. Try to reduce the number of processes ran in Parallel. A rule of thumb is to run any table more than 3GB as a standalone then run the other two jobs (2gb and 1gb) in Parallel

BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment

INTRO: Below is my guide to becoming a data engineer based on the current job market (08/08/2020) demands. I have outlined the TOP 5 foun...