Hi
I was recently working on extracting a table that has 400 columns from a SQL Server DB to load into an Oracle Target Database. I designed my job set the array size to 50,000 with Record count 200,000. To my surprise the job could not pull the top 1000 records for over an hour.
Resolution
It turns out that based on my environment Datastage settings the buffer size is too small to hold an Array size of 50,000 records (considering that each row has 400 columns). I had to reduce it to 100 Array size so the job could run in 3 mins for 1million records
What is Array size: Array size is the number of rows that datastage reads/writes to the database. This is the number of records that is pushed from one stage to another (Pipeline parallelism)
Record count: is the the number of rows that datastage commits in the database after writing as much array size.
Subscribe to:
Post Comments (Atom)
BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment
INTRO: Below is my guide to becoming a data engineer based on the current job market (08/08/2020) demands. I have outlined the TOP 5 foun...
-
1. Good Foundational Knowledge of SQL Programming (SQL Query writing) Paid Course The best SQL course that I know. (I have not affili...
-
Hi I was recently working on extracting a table that has 400 columns from a SQL Server DB to load into an Oracle Target Database. I design...
-
In this blog post I will show you how to implement your own custom logging in Azure Data Factory Before you do this please note that Azur...
This comment has been removed by the author.
ReplyDelete