The Next Big Thing
Posts about next generation technologies and their effect on business.

Data Virtualization: Essential but Approach with Caution

Data
Virtualization is the current marketing banner for Enterprise Information Integration (EII). It is an
important adjunct to SOA, but must be undertaken with caution.


David Linthicum, Linthicum Group,
and Bradley Wright, Progress DataDirect, recently presented an ebizQ webinar on
"Putting
Your Data to Work for Your Cloud, BPM, MDM and SOA Project
."  The
thrust of this presentation was that data virtualization will provide
consistent, cross-enterprise access to data in heterogeneous data stores.  This is not a new capability, but
was introduced as EII a number of years ago.


When asked the difference between
data virtualization and EII, Bradley Wright indicated that data virtualization
includes the capability to perform updates. 
While not all EII products supported updates, some did, so it appears
the primary difference is marketing. 
At the same time, as I will discuss, below, data virtualization should
not be used for updates.


The fundamental concept of data
virtualization and EII is that data is accessed from multiple, heterogeneous
databases through a virtual database that provides an integrated, consistent
view of data from these multiple sources. 
Queries are expressed in terms of the virtual database schema and
translated as required, and data from multiple sources is transformed and
integrated to provide a response that is consistent with the virtual database
schema.


SOA increases the importance of
data virtualization because SOA is likely to increase the number and diversity
of databases.  I discussed
this in my blog last year entitled, "Data
Management for SOA
." 
A service should be loosely coupled and its data stores should be hidden
from the service users to maintain flexibility in the implementation of the
service.  This conflicts with
needs for cross-enterprise views of data for planning and decision-making.  Data virtualization can provide
such visibility; however, there are certain realities that must be understood
when using data virtualization.  Loraine
Lawson touched on some limitations in an interview with Peter Tran and Bob
Reary of Composite Software two years ago entitled, "When
Data Virtualization Works - And When It Doesn't
," but there are additional
concerns.


The following paragraphs outline
key limitations of data virtualization that must be considered when setting
expectations and when using data virtualization to obtain composite views.


Data inconsistencies
A data virtualization product can perform data conversions (e.g., feet
to meters), but it can't create data that isn't stored. 
For example, if one organization maintains weekly production figures and
another maintains monthly figures, these two different measures cannot be
reconciled.  If one
organization tracks numbers of defects in one set of categories, and another
uses a different set of categories, the figures cannot be compared or added.


Such problems are fundamental to
the business, and if it is important to examine such data across the
enterprise, then there must be a transformation initiative to make the data
collection and storage consistent with a common scheme.


Process inconsistencies
Some enterprises will have similar business operations that are in
different geographies or produce different categories of products or services.  What they do may be similar, but
they may have business processes that cannot be compared. 
There may be different stages of production or service delivery that are
of interest to top management.  The
different operations may use the same terminology for phases, but the terms are
not applied consistently to the business processes. 
This may lead to top management comparing apples to oranges.  Such discrepancies might extend to
inconsistent metrics such as the definition of rework, and inconsistencies
between sales and the cost of goods sold.


Timing inconsistencies
An enterprise does not operate instantaneously and in lock step.  The orders being received are not
the same as the orders being filled and the orders being shipped. 
The engineering change issued by the engineering department may be
delayed until current inventories are consumed. 
Payment is not due on orders shipped but not yet delivered.  A query that combines data from
different operations will not represent a consistent view of the enterprise.  That requires the definition of
cut-off-points and the time for various activities and transactions to reach
and record consistent points in their operations. 
This is why financial information is not immediately available at the
end of a period.


It is not practical to eliminate
all such inconsistencies or wait to accumulate consistent results.  Users of data virtualization must
understand such limitations when using the composite data.


Resource overload
A data virtualization service will access data from various production
databases.  These databases
are not necessarily configured to handle an increased volume of queries.  Some queries may add unexpected
workload to a database, or the workload from many potential users may be quite
unpredictable.  This ad hoc resource
demand could interfere with mainstream business application performance.  In cloud computing, the resource
may be available on demand, but there could be unacceptable increases in costs.


Update errors
If data virtualization is used to update databases, it will bypass the
applications designed to validate, control and coordinate the updates.  The updates may also be
inconsistent with the current state of the production operations. 
Furthermore, updates normally performed by associated applications may
require coordination and propagation to related operations and applications.  It is very dangerous to bypass the
responsible organizations and their applications to update their databases-it
should not happen.  Any update
should go through the appropriate processes for validation, authorization,
control and coordination that are the responsibility of those business
operations and their applications.


I think data virtualization
(a.k.a., enterprise information integration) is an important technology that
should be part of a SOA strategy, but users must adopt it with their eyes wide
open.  It's a long-term
investment, and it is likely there will always be the need to understand and
allow for inconsistencies in the data.

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.