Data Engineer, ETL, ELT Problem solver: IBM DataStage Array Size and Record Count tuning

Wednesday, December 5, 2018

IBM DataStage Array Size and Record Count tuning

Hi

I was recently working on extracting a table that has 400 columns from a SQL Server DB to load into an Oracle Target Database. I designed my job set the array size to 50,000 with Record count 200,000. To my surprise the job could not pull the top 1000 records for over an hour.

Resolution

It turns out that based on my environment Datastage settings the buffer size is too small to hold an Array size of 50,000 records (considering that each row has 400 columns). I had to reduce it to 100 Array size so the job could run in 3 mins for 1million records

What is Array size: Array size is the number of rows that datastage reads/writes to the database. This is the number of records that is pushed from one stage to another (Pipeline parallelism)

Record count: is the the number of rows that datastage commits in the database after writing as much array size.

1 comment:

kalyaniJune 17, 2020 at 5:31 AM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Data Engineer, ETL, ELT Problem solver

Wednesday, December 5, 2018

IBM DataStage Array Size and Record Count tuning

1 comment:

BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment

Report Abuse