|
Data Analysis Gets Hot
Overwhelmed by the volumes of data
we’re producing, organizations are looking for new ways to
analyze it into useful information.
Because of the volumes involved, they need to find
techniques to provide as much automation as possible. Both large and established vendors and
small start-ups have offerings here and some very impressive
customer lists. We’ve
been watching this space with great interest, hoping to spot some
winners, but it’s clear that it’s time to do that watching
more publicly (in the hope that you’ll tell us what you’re
looking at, too). What Do We Want To Do: Typically, we’re looking for products,
which combine these features:
Almost all data analysis tools assume
that (1) they will require some set-up (at least to tie them to
their data sources) and customization and (2) their users will
need some training. But not all tools are created equal.
Some are essentially on-going science projects that will
require vast quantities of on-going professional support; others,
once installed and tested, will be very accessible by ordinary
knowledge workers. You
make a decision based on how much customization you need (and how
much time and money you’re willing to spend). IBM Has A Rich, Robust Offering An excellent example of an
“establishment” offering is IBM’s, which takes a soup to
nuts approach. It
ranges from their #1 best selling UDB (DB2) data base manager
through their latest experiments in interfaces and XML tools.
These include their DB2 OLAP Server, which allows business
customers to ask questions against a multidimensional set of data
and DB2 Warehouse Manager (which replaces Visual Warehouse).
IBM offers two tools for federation. DB2 DataJoiner is a
standalone federation engine which supports distributed two-phase
commit and heterogeneous replication; DB2 Relational Connect can
extend DB2 federation to other databases (Access, Informix,
Oracle, SQL Server, and Sybase). Most approaches to information and data
analysis now recognize that information is not just columns of
data, but also mounds of unstructured text documents. IBM offers
Intelligent Miner for Text, a toolkit for system integrators,
solutions providers and application developers featuring text
analysis tools that automatically identify the language of a
document, create clusters as logical views, categorize and
summarize documents, and extract relevant textual information,
such as proper names and multi-word terms.
It includes the IBM Text Search Engine, the NetQuestion
solution for inter-intranet text searches, and a Web crawler.
(Find more details at http://www-3.ibm.com/software/data/iminer/fordata/.)
IBM also supports a number of Research
Projects in this area, including: Visual Attribute Explorer offers a way
to explore data by showing it via bar charts and coordinate plots,
allowing the analyst to apply constraints and immediately see and
assess results. It
allows for the interactive discovery of both the nature of the
data and of relationships between fields within the data.
It may be used independently or with IBM’s DB2
intelligent Miner for Data. Visual Attribute Explorer is
downloadable from Alphaworks http://www.alphaworks.ibm.com/tech/visualexplorer. IBM’s
Xperanto Project is a consolidated development effort that offers
access to diverse data stores (the federation thing – see above)
and supports queries in both SQL and the XQuery language.
Built on DB2 underpinnings, it supports store, search,
cache, transformation, and replication features as well as
integration with WebSphere. You
can view a demonstration of Xperanto at www.ibm.com/software/data/developer/demos/xperanto/.
Aleri Aleri
has applied high-speed vector processing to the problem of data
analysis. This allows Aleri customers to work with real time
transaction processing data and get a continuously updated picture
of their business. One
common method of data analysis is to build a data cube and to then
do analysis by sending queries against the cube’s tables.
It might help to think of Aleri as a virtual n-dimensional
cube. In fact, Aleri
doesn’t need to take the time and resources to build the cube
(many high volume applications require building many data cubes);
it simply analyzes the vectors (columns or rows) of an
n-dimensional cube of any size that it would need to create to
answer a particular query. The
virtual cube is built on-the-fly, in real time.
Aleri is 2.5 years old and
it thinks it’s ready to go up the ramp and find its market.
With U.S. offices in New York and Chicago and development
and sales in St. Petersburg and Western Europe, Aleri is looking
for $30 million in revenue in 2002. It expects to announce its first partnership deals shortly. With the interest in data
analysis – and all that data piling up – we suspect there will
be interested customers eager to see if a real time solution has
finally arrived. InfoTame Aleri is not the only
Russian-developed data analysis software we’ve seen in recent
months. A somewhat
more mature product is InfoTame, with more than 20 installed
customers in its native Russia (including TV stations, government
bodies, and oil companies). The InfoTame product
examines very large text data bases and develops “information
portraits” to help users understand both their content and the
hidden relationships which these document repositories may
contain. This
analysis can be done without relation to the language in which the
documents are written. InfoTame can offer text
summaries, access to the text in context, or graphical depictions
of the patterns it has discerned.
Recently, the
California-based company booked its first US order, from a major
law firm. They will
use InfoTame to discover links and associations buried within
large volumes (~50 Gbytes, over 10 million pages) of text data.
InfoTame’s text-analysis software uncovers relationships between
objects that one is not aware of, and therefore one cannot use a
common search engine. In
a day and a half of a preliminary sample-data analysis, InfoTame
was able to extract over 52,000 text files and uncover many useful
links from only 7% of the total data available. It would take
around 100 days just to load these files manually into a database,
without considering any subsequent analyses. Wohl Associates wrote a
White Paper about InfoTame’s technology and its business
benefits which you can find at http://www.infotamecorp.com/InfoTame_White_Paper_by_Amy_Wohl.pdf.
(In the interests of full disclosure, we need to
report that this results in Wohl Associates receiving a small and
illiquid financial interest in the firm.) Comments or Questions: Send Email to
opinions@wohl.com
|