Xerox Scientists Automate Document Classification

Xerox scientists at the Grenoble research laboratories have created software that models human thinking to classify documents and information, automatically indexing, categorizing, and routing electronic documents.

The software is designed to help businesses keep their e-documents collections orderly and easily accessible. Eric Gaussier, a research scientist at the Xerox Research Centre Europe in Grenoble, France says that the new software can help save time and money and increase productivity. While current categorizing tools treat each subject category independently, the Xerox system, based on patented technologies, uses a hierarchical model that is able to understand the dependency between categories and therefore make a more informed.

  • The system needs little training. Using advanced machine-learning techniques, with only a few examples (as few as ten per category), it quickly learns by itself how to hierarchically classify documents in existing categories.  

  • It can turn unorganized e-files into cleanly labeled document collections.  

  • The system can learn entirely new categories on its own by detecting new or emerging topics and dynamically suggesting new categories to the people who are using the system.  In this it is as precise as humans.

The categorizer handles documents written in up to 20 languages and can be adapted for specific customer requirements. The software routes documents to the right person based on a pre-set user profile.  Incoming mail can be routed to the right person by topic (for example, complaints to customer service and product queries to the right marketing managers).

The technology is based on linguistic analysis and machine-learning techniques. The software is written in Java and can be deployed on multiple platforms including UNIX, Linux and Windows.

The company anticipates the technology to be licensed by software vendors or corporations who wish to incorporate it into document systems focused on areas such as customer relationship management, information retrieval and data management.  

(back to top)  

Comments or Questions: Send Email to opinions@wohl.com

Home/ Search / 2005 Articles / Issue Archive / Free Newsletter

Entire contents © 2001  by Amy D. Wohl. All rights reserved. Reproduction of this publication in any form without prior written permission is forbidden.