United States   > change
Innovation Home
Princeton University
People, innovation and fun: Xerox executive discusses leadership and technology

NSF Frontier Series
Corporate Innovation Strategies in a Global Econom

Vandebroek talks Services Research at the First Services Research Innovation Initiative Symposium
XGS Innovation Thought Leader
Message From the CTO
Current Research Themes
Managing Innovation
GIO Podcast: An Innovation Conversation with Xerox & IBM
Fortune Blog with Sophie Vandebroek
Innovation Organizations
Research and Development
Engineering Center
Intellectual Property Operations
PARC
Innovation Resources
Conferences
Executive Biographies
Focus on Innovation Archive
Innovation Interests
Innovation Newsroom
Multimedia Resources
Publications
Xerox Supports Open Formats
 




Xerox Scientists Invent Software That Automatically Indexes, Categorizes, Routes Electronic Documents

Scientists at Xerox Corporation have invented powerful software that's clever enough to "read" an electronic document, decide how it should be classified by subject, then route it to the right person's e-mail address or online document management system Ü all completely automatically.

The software, which is a categorizing tool, is intended to help businesses keep their e-document collections orderly and easily accessible, and it is available for licensing from Xerox.

"A misshelved book in a library might as well be lost. It's the same with documents that haven't been properly categorized; the document itself may have to be recreated," said Eric Gaussier, a research scientist at the Xerox Research Centre Europe in Grenoble, France. "Our new software can help save time and money and increase productivity. It will ensure that documents are properly classified for future retrieval and that the right information gets into the right hands as quickly as possible."

Categorizing tools currently available in the market treat each subject category independently of each other and are considered "flat." For example, although it might seem obvious to humans that biochemistry and biophysics are related categories of information, a flat categorization system wouldn't make the connection. But the Xerox system, based on patented technologies, uses a hierarchical model that is able to understand the dependency between those two categories and therefore make a more informed decision when classifying a document.

According to data gathered from a pilot test of the software, people found the right documents more often and faster because the software understood relationships between documents and categories.

Anne-Lise Veuthey, a senior researcher at the Swiss Institute of Bioinformatics, an academic nonprofit foundation that researches and develops technology used in biology, participated in the pilot program. "We've found it to be extremely accurate in identifying documents containing the very specific information we need to conduct our research on human genes," Veuthey said.

Technology Highlights
Three integrated functions make the Xerox categorization technology unique:
  • The system can start right away. Using advanced machine-learning techniques, with only a few examples it quickly learns by itself how to hierarchically classify documents in existing categories.
  • The technology is easy to use and helps people create a comprehensive way to turn unorganized e-files into cleanly labeled document collections.
  • The system can learn entirely new categories on its own. The categorization technology detects new or emerging topics and dynamically suggests new categories to the people who are using the system.
The Right Routing
The Xerox categorizer system can handle documents written in up to 20 languages and can be easily adapted for specific customer requirements. The software intelligently routes documents to the right person based on a pre-set user profile.

"This can be used, for example, to route incoming mail to the person responsible for a given topic and eliminate mail in your inbox you aren't interested in," said Gaussier. "Imagine clients' complaints going directly to the person responsible for handling them and your e-mail inbox containing only what you are interested in."

The categorization technology was developed by XRCE researchers based on their deep expertise in linguistic analysis and machine-learning techniques. The software is written in Java and can be deployed on multiple platforms including UNIX, Linux and Windows. The company anticipates the technology to be licensed by software vendors or corporations who wish to incorporate it into document systems focused on areas such as customer relationship management, information retrieval and data management.

 
Focus on Innovation Archive
2008
Xerox Makes Environmental Remediation Patents Available to All Through Eco-Patent Commons
Scientists Develop 3-D Document Visualization for "No Surprises" Printing
DARPA program builds on PARC foundation in printing large-area, flexible electronics
Xerox Joins IORG
Xerox Research Centre Europe coordinates EU CACAO project to provide cross-language access to online catalogues and libraries
Incubating Inside Xerox Labs: Innovation that Benifits the Workplace, Healthcare, and the Environment
Robert Loce Elected SPIE Fellow
Rochester Engineering Society Celebrates Technical Excellence
Xerox is Among the World's Best Analyst Competing to Win the Edelman Prize for Achievemnt in Operations Research & Analytics
Patent Powerhouse: Xerox Boasts 101 Inventors with 50 or More Patents
2007
Xerox Reveals Breakthrough Software that Categorizes Text and Images at the Same Time
Xerox funds new services laboratory at NC State University
The Science Consultant Program: Bringing Science to Life for 40 Years
Xerox Technology Tricks Counterfeiters
Xerox Opens Its Labs to Journalists on TechDay
R&D Magazine Lauds Xerox FreeFlow VI Software Suite
Getting to 100 before 50; Xerox scientist Bob Loce Reaches Patent Milestone
Xerox to Fund Green, Nano, Imaging Fellowships at MIT School of Engineering
Know-How Results in breakthrough paper: saves trees and money
Xerox Funds 11 New University Research Projects
Surpassing Search: New Xerox text mining software goes beyond "keywords" to deliver more relevant information
Xerox receives the National Medal of Technology
Now You See It, Now You Don't: Xerox Scientists Develop Fluorescent Writing To Deter Counterfeiting
Xerox Scientist Creates 'Color Language' Making Color Matching as Easy as Describing a Color
PARC Scientist Stu Card Wins Franklin Institute Bower Award for Achievement in Science
Inside Innovation at Xerox: Scientists Create a Rainbow of Custom Blended Colors for DocuTech Highlight Color Systems
Xerox's Santokh Badesha Reaches Rare Milestone; Inventor Awarded 150th Patent
Content Centric Networking
Groundbreaking Canadian Nanotechnology Partnership Lays Foundation For Big Success From Tiny Tech
Xerox Awarded 27 Percent More Patents In 2006
2006
2005
2004
2003
2002
2001
Contact Us: for questions about Xerox research and innovation, patents or technology licensing, scientific work and related inquiries, please email: xigwebmaster@xerox.com

Outside Submissions: Xerox encourages and welcomes unsolicited ideas and suggestions. More information on submitting your ideas to Xerox for review can be found here.

If you have any questions, please don't hesitate to contact us by email at Outsidesubmissions@xerox.com.

For all other inquiries, please use the appropriate contacts listed at Contact Xerox.