Many enterprise software development projects deal with data. Most of these projects will use a relational database as the primary means for data storage and retrieval. The ability of these stores to use relational algebra will allow well written queries against a well-designed schema to execute in a timely manner. Therefore using performance monitoring during unit tests on a local database will more than likely not show any issues.
Applications that provide some type of graphical user interface often display a master-detail view of the data. These master detail views can also be nested where the hierarchies are several steps deep. As an example imagine an application that displays a catalog hierarchy of categories, sub-categories and products. This example would need three levels nested below a root.
Relational databases are made up of relations (tables) containing attributes (columns) and tuples (rows). These tables are flat in structure and do not provide a simple way of extracting the master detail views described above.
There are different patterns to extract the information above, but care must be taken to ensure they are implemented in an efficient manner. One such approach is eager loading. Using the example above, the application would retrieve all categories, all sub-categories and all products in a single request-response. An implementation of this pattern that might be found in a typical enterprise software development project might use the following logic (often referred to as SELECT N+1) to extract that information:
Get all Categories from the database For each Category Get all Sub-categories from the database For each Sub-category Get all products from the database Loop Loop
This implementation will absolutely work and will more than likely provide fast response times during a unit test on a local developer's machine. When the application goes live in a production environment, however, the users might complain that the page using this logic is taking a long time to load. To illustrate what might be going on here let's put some numbers to the data and say that each node in the hierarchy has 10 items. If we walk through the logic now we can see that we are executing 101 (10 x 10 + 1) separate calls to the database. When more items or more levels are added the problem is compounded. The key point here is that each call must go across a network which adds to the response time. The logic above assumes two of the fallacies of distributed computing, latency is 0 and bandwidth is infinite.
Caching aside, a better implementation of this pattern would be to minimize the number of times the application must interact with the database, minimize the amount of data going across the network and attempt to keep some reuse within code. More than likely there is already a method that retrieves sub-categories by a category key and retrieves products by a sub-category key. An easy refactoring effort would be to modify these methods to take a list of keys and to take advantage of the "IN" keyword in SQL. Using the previous hierarchy, the application would now only need one call for each level resulting in a total of 3 queries, regardless of the number of items. Additional levels in the logic would only result in adding to the roundtrips instead of multiplying them.
This simple change can provide orders of magnitude of relief to the application's use of the network and will result in happier users.