Map & Measure Software through Software Reverse Engineering

The purpose of software reverse engineering is to map & measure software to support the maintenance and evolution process of existing software systems. This support can be rendered in many ways, five of which are explicitly addressed here:

  • better comprehend the structure of the systems, that is the connections between the system parts
  • generate graphical documents depicting the architecture of selected components as well as the architecture of the system as a whole
  • perform impact analysis on proposed changes and corrections
  • calculate the costs of individual maintenance tasks
  • trace the links between semantic levels of an application system, i.e. between the requirements, design, code and test cases
  • export structural information on a system to other repositories

Comprehending the System Architecture

The responsible maintainers should understand the basic structures of the system they are maintaining. These structures are reflected in the relationships between the system entities – components, modules, classes, data objects, user interfaces, system interfaces, reports, etc. If a maintainer has to change a module or class, he should know what data that class encapsulates or that module accesses. He should also know what other modules or classes use the functions of that class or module he wants to change and which other modules or classes the target module or class is dependent on. This knowledge of the system entity dependencies is a prerequisite not only for system change, but also for system renovation and system migration. According to the pertinent literature, more than 60% of the time spent on software maintenance is spent trying to understand how the system is composed and what it is doing. To understand what it is doing, one would have to make a dynamic analysis of the transaction paths through the system and connect them to particular use cases. To understand how a system is composed, it is sufficient to make a static analysis of the system architecture captured from the code. In this way it is possible to understand the potential interrelationships between the individual parts and this is a step toward understanding the composition of the system as a whole. The goal of a repository tool is to make this understanding possible by documenting the relationships between system entities in an interactive mode. The maintainer poses questions, like who calls this function or who accesses this table, and the tool answers them directly, rather than forcing the maintainer to search through documents or the code itself.

Generating graphical Documents

For answering specific questions of maintenance engineers, it is better to work interactively with a query-based graphical user interface. However, there are occasions when it is necessary to create printable documents which can be projected for presentations or be put into a system documentation folder. For this reason the repository tool should have a print function, which will print out selected views of the system architecture. The views could be of the composition of individual components, of the structure of individual data bases, of the layout of user interfaces and of the relationship of test cases to functions as well as the overall structure of the system as a whole. These views will give the managers an insight into the complexity of the system architecture and support discussions on what to do with the system. Managers tend to grossly underestimate the complexity of their application systems. These printed documents will provide them a better insight into the nature of the beast they are trying to tame. They illustrate the complexity of the existing software. If for no other purpose they serve to create an awareness of the difficulties involved in changing and evolving a software system.

Performing impact Analysis

Impact analysis is essential to assessing the costs and risks of making changes or corrections. Before changing any element of a software system, one should know what other elements are affected. Changing any element, whether it is a module, a data interface, a data store or a control procedure, can have an adverse effect on the elements related to the element changed. A change to a method used by other methods may cause errors in those using methods. A change to a data base table will certainly have an adverse effect on the modules which access it directly. They have to be adapted to reflect the new data structure even if only one attribute has been inserted or one data type modified. Changes to user interfaces propagate to the classes behind those user interfaces and further to the data sources. There are always paths of dependencies traversing a system architecture. It is nearly impossible to change anything without initiating a ripple effect on the dependent elements.
Systems can and should be designed to minimize such dependencies, i.e. to limit the impact domain of changes. However they seldom are and, if they are, the dependencies tend to grow as a system evolves. A repository tool cannot stop this from happening, but it can document it and make the maintainers more aware of what their actions lead to. Through impact analysis the impacted domain of a proposed change or correction is documented. The maintainer will see which other elements are effected directly and which are effected indirectly up to a given level of dependency. These are the elements that will have to be retested after the change. It may even happen that the maintenance manager decides not to change an element because doing that would have an too big impact on the rest of the system. If that is the case, the impact analysis will have served a good purpose, namely avoiding unreasonable risk.

Calculating the Costs of Maintenance Tasks

A fourth reason for wanting to use a software repository is to aide in calculating the costs of individual maintenance tasks. A user may want to make a change or to add a function but only if the costs do not exceed the expected benefits. It may also be that he only has a limited budget. Thus, a cost estimation is required for each task to be done. Without a repository the costs of the cost estimation may exceed the costs of the change itself if done properly. With a repository the cost estimation can be made on the basis of the impact domain and the metrics of the impacted source modules, data base tables and interfaces. The repository database should have links to the metric database. After the impact analysis has revealed which modules, data objects and interfaces are affected by the proposed change, the user can use a maintenance cost estimation tool like SoftCalc to access the metrics for those entities. They may be in source statements, function-points, data-points, object-points or use-case-points. The tool can then calculate to what degree each entity is affected and take that percentage of the size, whereby the raw size of each module, data structure and interface is to be adjusted by the complexity and the quality of that element. The size of the change is then the sum of the adjusted sizes of each impacted system member. Once the size of the impact domain has been computed, it can then be converted over into costs using the current maintenance productivity table. Without knowing the size, the complexity and the internal quality of the impacted elements, it is not possible to come up with an accurate estimation.

Tracing Links between Semantic Levels of a System

An advantage of the SoftRepo Repository is that it includes all four semantic levels of a software system – the requirement level, the design level, the code level and the testing level. For every level there are uniquely defined artifacts except for the design and coding levels which have more or less the same entities. On the design side are the model entities such as components, classes, methods, interfaces, tables, etc. On the coding side are the implemented versions of those entities. The requirement level has its own entity types like business processes, business objects, business rules, requirements, actors, user interfaces, logical test cases, etc. At the testing level are the physical test cases, the test data objects, the test procedures and the test scripts. One of the main requirements on a repository is that it is able to trace the links between these different semantic levels. For example it should be possible to trace a requirement forward to the module in which it is implemented and the test cases which test it. Going backwards it should be possible to link a module to the requirements which it fulfills and a test case to the modules it tests. Backward tracing is an important criteria for regression testing. If a class or module changes, one should know which test cases to run. Forward tracing is important for maintenance. If a requirement changes, one should know which modules and which test cases are affected. For testing, it is equally important to be able to trace the logical test cases to the physical test cases and visa versa.
Due to its significance for software maintenance, tracing links has become a key research subject in software engineering. It is possible to establish links directly between documents, sources and test cases by scanning the contents, but this approach is much more inefficient and more inaccurate than using a repository. Without a repository automated tracing becomes very difficult. With a repository links between semantic levels can be provided within seconds.