ETL processes include the data transfers in multiple stages. It starts with the transfer of data from legacy source to the staging server, from staging to production database/data warehouse and finally from a data warehouse to data marts. Each step is highly vulnerable and prone to errors or loss of data or incorrect transfer of data. This is where the concept of testing comes into picture in the ETL cycles. The scope of work for any ETL developer does not end with the end of ETL script runs, this is actually the beginning for any developer. A good ETL developer must be able to validate the records and ensure accuracy.
The ETL testing process can be broadly classified into two types:
- OLTP (On-line Transaction Processing)
- OLAP (On-line Analytical Processing)
OLTP is Testing of one particular Database Instance and OLAP involves testing of the whole Datawarehouse. This is the most important statement. OLTP does not imply OLAP.OLTP just ensures correct data transfers from a source to target in one particular database.However, OLAP takes care of the accuracy and performance parameters throughout the data-warehouse.
Challenges faced in ETL Testing:
As mentioned earlier ETL process is full of challenges and prone to errors.At every step, the ETL developers are likely to face a minimum of 5 barriers. Here is the list of a few common challenges in the way of ETL testing :
- Frequent changes in the business requirements leading to change of logic in ETL scripts
- Limited availability of source data
- Not documenting the “source to target” mapping requirement which leads to ambiguous logics
- Delay in output of a complex SQL query leading to slow working rate
- Verifying and validating data comes from different sources with varied formats and structures
- Unstable testing environments
- Huge volume of data to test
Through this article we at Bista Solutions will convey a few important tests everyone needs to perform to validate the ETL processes:
1.Check the Source and Structure of the Data before deciding on the migration Plan:
This step is a prime step in ETL Testing. This step becomes the foundation for the entire ETL process. With growing complexities in data, understanding the structure of the data in source becomes evidently important and prime. After understanding the structure of the one may need to cleanse the data before it actually loaded into the staging area.
2.Ensure that mapping document provided is correct:
The second step is to check if the mapping document provided abides by the business requirements of the client and hence ensure correct mapping of fields from source to target tables.
3.Checking and verifying your ETL scripts :
Your ETL scripts must be smart enough to handle null values in data, it must import or update correct data with proper data types, it would be great if the ETL scripts are automated as well to avoid human interactions and as a result of which introduce errors or bugs.
4.Check for Data Completeness:
Once the data is loaded into the target database the first and most important job is to verify the completeness of data. Also you need to Verify that all the invalid data is either corrected or removed in accordance with requirements.
5.Performance and Scalability:
Completing the migration once is not the end of story. ETL developers must anticipate the growth rate of data and thereby keep the system ready to scale up and give a good performance for the huge amount of data as well.
After all these tests have been performed, the project leads need to get a User acceptance test done from the end users so as to ensure the system fits into their requirements without violating the integrity of the system.They might eventually require to perform regression testing as well if there is new version rollout of the app.
In the ETL processes, One must understand that data accuracy is the key to arrive at important decisions in any business. Having said that, identifying the bugs, performing root cause analysis of each one of them and reporting the bugs at an early stage of software development helps to reduce the cost and time. Before getting into the ETL testing process, you need to check the different systems, their processes, models and business requirements for any inconsistencies or ambiguities. ETL developers also need to do data profiling/data mining in order to understand the trends and patterns of data in a better way and identify any source data bugs.
If you have any query for ETL Testing please drop an email at firstname.lastname@example.org. Also, you can write us through email@example.com and tell us how this blog has helped you.