Dr. Sumeet Gill, Research Supervisor
Data mining techniques to automate software testing
The design of software tests is mostly based on the testers’ expertise, while test automation tools are limited to execution of pre-planned tests only. Evaluation of test outputs is also associated with a considerable effort by human testers who often have imperfect knowledge of the requirements specification. Not surprisingly, this manual approach to software testing results in heavy losses to the world’s economy. The costs of the so-called" catastrophic" software failures (such as Mars Polar Lander shutdown in 1999) are even hard to measure. In this paper, we demonstrate the potential use of data mining algorithms for automated induction of functional requirements from execution data. The induced data mining models of tested software can be utilized for recovering missing and incomplete specifications, designing a minimal set of regression tests, and evaluating the correctness of software outputs when testing new, potentially flawed releases of the system. To study the feasibility of the proposed approach, we have applied a novel data mining algorithm called Info-Fuzzy Network (IFN) to execution data of a general-purpose code for solving partial differential equations. After being trained on a relatively small number of randomly generated input-output examples, the model constructed by the IFN algorithm has shown a clear capability to discriminate between correct and faulty versions of the program.
Automated Software Testing, Regression Testing, Input-Output Analysis, Info-Fuzzy Networks, Finite Element Solver. Introduction:
Data mining commonly involves four classes of tasks: Association rule learning – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. •Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. •Classification – is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam. Common algorithms include decision tree learning, nearest neighbor, naive Bayesian classification, neural networks and support vector machines. •Regression – Attempts to find a function which models the data with the least error. Results validation
The final step of knowledge discovery from data is to verify the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by the data mining algorithms are necessarily valid. It is common for the data mining algorithms to find patterns in the training set which are not present in the general data set. This is called over fitting. To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The learned patterns are applied to this test set and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish spam from legitimate emails would be trained on a training set of sample emails. Once trained, the learned patterns would be applied to the test set of emails on which it had not been trained. The accuracy of these patterns can then be measured from how many emails they correctly classify. A number of statistical methods may be used to evaluate the algorithm such as ROC curves. If the learned patterns do not meet the desired standards, then it is necessary to reevaluate and change the pre-processing and data mining. If the learned patterns do meet...