When developing web applications, or any other software, testing is a crucial part of the process. In many cases, these programs manage and process sensitive or valuable information and need to operate both properly and securely. One of the biggest challenges with application testing is finding useful data for testing. While artificial data sets can be useful for testing the security and edge cases of the application, it’s also necessary to test how the application will perform under “normal” conditions. This requires real data, as using real data for testing brings up concerns about data security.
In this article, we’ll dive more into the issues around application testing and potential solutions to the dilemma of testing fidelity and security.
Contents
The Need for Good Data
When performing quality assurance testing, there are two main types of testing. The first is designed to ensure that the application performs correctly under unusual testing conditions, and the second test functionality under “normal operating conditions”.
While the first type of testing may require more time and resources, building a dataset for stress-testing an application is often easier than testing for normal use cases. In a stress test, “anything goes”, so testers can use fuzzers and other tools to ensure that all potential input cases are covered. It may take a while, but there are few concerns about the data going into the application.
When testing functionality under normal operating conditions, data quality is key. If the testers cannot accurately simulate the types of data that the application will encounter under normal operating conditions, then the results of the test are inaccurate. For this reason, testing with real-world data is always preferable.
Can’t Test on Real PII
The issue with testing with real data arises when dealing with sensitive or personally identifiable information (PII). This is the data that is protected under privacy regulations and carries stiff penalties and reporting requirements if the data is leaked to unauthorized parties.
When performing tests on web applications and other programs that may access the Internet, then the possibility for unintentional data leakage is real. Overlooked design and development flaws may mean that the application is releasing sensitive data in an unencrypted format. As a result, an organization may be required to report the breach and may be subject to fines under the EU’s General Data Protection Regulation (GDPR) and other similar data protection regulations.
One solution to this problem is to perform all testing on systems in an isolated network. This goes beyond the standard guidance of never testing on production systems to even say that the testing systems should not be Internet-accessible. Beyond the potential issues regarding the fidelity of these tests (since an isolated environment cannot accurately mimic full Internet access), this also has issues around privacy regulations.
Under the General Data Privacy Regulation, all uses of customers’ sensitive data may be reported to the data subject (i.e. the person whose sensitive data it is) and explicitly approved by them. The regulation also requires clarity about what exactly the user is opting into. With application testing, this may be complicated since an organization would have to explicitly ask permission for the use of the data subject’s data for application testing. While this requires some lead time, it does allow the organization to use real data for testing if they take the appropriate steps to protect it.
Data Masking for Secure, Accurate Testing
When performing application testing, there is a balance to be achieved between the security and realism of the testing. Testers have the choice between completely isolating the testing infrastructure from the Internet, which improves security but may impact testing fidelity and using an Internet-connected test network. Additionally, they need to choose between using real and artificial data for testing.
While the use of artificial data on an isolated environment may provide the best security arrangement, it can have significant impacts on the usefulness of the test. A properly designed testing environment can take advantage of Internet access while still ensuring the security of the data used during the test.
A vital component of a test environment for applications that process sensitive information is the inclusion of an intelligent data masking security solution. While many appliances have the ability to detect (and possibly prevent) the exfiltration of sensitive data, more advanced options offer the ability to perform intelligent detection and masking of sensitive data.
By deploying a data masking solution to protect your database, you can ensure that normal operations have access to the real data while untrusted operations, like application testing, operate based upon realistic but artificial data. This provides the best of both worlds while testing since this data does not need to be protected like true PII while still maintaining the level of realism necessary for testing to be accurate and useful.
Getting the Most Out of Testing
When performing quality assurance testing, having data that is as realistic as possible is a necessity. The greater the deviation of the test data from the truth, the less useful the test.
The main issue with sourcing data for application testing is the need and inability to perform testing on sensitive data. Many applications process data that is classified and protected as PII, but it is difficult or impossible to create a realistic testing environment that also guarantees that the data will be protected at the necessary levels.
A good solution to this problem is the deployment of a data security solution with built-in data masking. This appliance can replace data going to the test environment with plausible but artificial data, allowing you to test your new application in realistic conditions (with full Internet and database access) without jeopardizing the security of your sensitive data.