United States   > change
Innovation Home
Princeton University
People, innovation and fun: Xerox executive discusses leadership and technology

NSF Frontier Series
Corporate Innovation Strategies in a Global Econom

Vandebroek talks Services Research at the First Services Research Innovation Initiative Symposium
XGS Innovation Thought Leader
Message From the CTO
Current Research Themes
Managing Innovation
GIO Podcast: An Innovation Conversation with Xerox & IBM
Fortune Blog with Sophie Vandebroek
Innovation Organizations
Research and Development
Engineering Center
Intellectual Property Operations
PARC
Innovation Resources
Conferences
Executive Biographies
Focus on Innovation Archive
Innovation Interests
Innovation Newsroom
Multimedia Resources
Publications
Xerox Supports Open Formats
 



Xerox Innovation at Work: TrueMatch

Have you ever had a hardcopy of a document but couldn't find the electronic one? Or perhaps you wanted to ensure you can find the correct electronic version of a "final" draft you have in your hand?

Using sophisticated linguistic technology developed by researchers at Xerox Research Centre Europe, a new Xerox technology called TrueMatch can locate the original electronic file for you in seconds.

Helping to bridge the gap that exists between hardcopy and digital documents, TrueMatch is now out of the labs and being put to work in the office as an innovative feature of Xerox's Freeflow SMARTsend® Pro 2.0 scanning software, announced today at AIIM/On Demand, a document imaging conference. SMARTsend Pro allows users to securely search and retrieve documents from Xerox DocuShare® and Microsoft SharePoint repositories.

TrueMatch has advanced search capabilities that let knowledge workers easily find an electronic copy of a hardcopy document stored in a repository. People can simply run a paper document through a multifunction device and, upon command, TrueMatch will, through its search-like interface, locate the original and multiple versions of the electronic file. TrueMatch can find the exact copy of a document, other versions or related materials.

This is especially useful and cost-effective for businesses that deal with information that is constantly being updated. For example, using this technology and a multifunction system, customers in shops or back offices can retrieve and print the most up-to-date sales price lists on-demand rather than printing centrally and distributing a large amount of material.

How TrueMatch Works
TrueMatch analyzes the content of the document, once a hardcopy has been scanned and processed through Optical Character Recognition (OCR). TrueMatch first extracts the key elements - words or multiword expressions corresponding to possible topics of the document.

TrueMatch next ranks these key elements, taking into account several parameters including the number of times they appear in the input document and their average frequencies in the English language. Top-ranked elements are used to build and run queries. TrueMatch analyzes the documents returned through the queries and compares the results with the input document. To make this comparison, TrueMatch searches for the presence of the key elements in the retrieved documents. When all are present, it will look for finer information, such as the exact word order, to distinguish between a perfect match and a revision.

As a result, TrueMatch is both fast and has a high success rate in identifying the electronic version of the input document - a perfect match - if the input document is distinctive, such as technical documents.

TrueMatch can also match documents even when the input document is only a portion of the entire document needing retrieval. In this case, the longer the input document, the better the results. However, the technology is so "smart" that in almost all cases it is able to find the correct corresponding document even when a one-page fragment is used.

Because errors can be introduced during OCR, TrueMatch has to be smart and flexible enough to consider documents as candidates for perfect matches even if some key elements are not found in the target. To do this, TrueMatch works with a tolerance level during the matching stage. The tolerance level has a default value, but the administrator can tune it if needed. Poor-quality paper documents can lead to high OCR error levels. Typically errors can impact more than 5 percent of the characters, which means one word in three may be erroneous. At the same time, if only a portion of the paper input has deteriorated, the remaining part is often sufficient to identify the electronic version and return it to the user.

 
Focus on Innovation Archive
2008
Xerox Honors Local Inventors at Annual Patent Dinner
Public Gets Sneak Peek at Xerox’s Erasable Paper at WIRED NextFest
Xerox Makes Environmental Remediation Patents Available to All Through Eco-Patent Commons
Scientists Develop 3-D Document Visualization for "No Surprises" Printing
DARPA program builds on PARC foundation in printing large-area, flexible electronics
Xerox Joins IORG
Xerox Research Centre Europe coordinates EU CACAO project to provide cross-language access to online catalogues and libraries
Incubating Inside Xerox Labs: Innovation that Benifits the Workplace, Healthcare, and the Environment
Robert Loce Elected SPIE Fellow
Rochester Engineering Society Celebrates Technical Excellence
Xerox is Among the World's Best Analyst Competing to Win the Edelman Prize for Achievemnt in Operations Research & Analytics
Patent Powerhouse: Xerox Boasts 101 Inventors with 50 or More Patents
2007
Xerox Reveals Breakthrough Software that Categorizes Text and Images at the Same Time
Xerox funds new services laboratory at NC State University
The Science Consultant Program: Bringing Science to Life for 40 Years
Xerox Technology Tricks Counterfeiters
Xerox Opens Its Labs to Journalists on TechDay
R&D Magazine Lauds Xerox FreeFlow VI Software Suite
Getting to 100 before 50; Xerox scientist Bob Loce Reaches Patent Milestone
Xerox to Fund Green, Nano, Imaging Fellowships at MIT School of Engineering
Know-How Results in breakthrough paper: saves trees and money
Xerox Funds 11 New University Research Projects
Surpassing Search: New Xerox text mining software goes beyond "keywords" to deliver more relevant information
Xerox receives the National Medal of Technology
Now You See It, Now You Don't: Xerox Scientists Develop Fluorescent Writing To Deter Counterfeiting
Xerox Scientist Creates 'Color Language' Making Color Matching as Easy as Describing a Color
PARC Scientist Stu Card Wins Franklin Institute Bower Award for Achievement in Science
Inside Innovation at Xerox: Scientists Create a Rainbow of Custom Blended Colors for DocuTech Highlight Color Systems
Xerox's Santokh Badesha Reaches Rare Milestone; Inventor Awarded 150th Patent
Content Centric Networking
Groundbreaking Canadian Nanotechnology Partnership Lays Foundation For Big Success From Tiny Tech
Xerox Awarded 27 Percent More Patents In 2006
2006
2005
2004
2003
2002
2001
Contact Us: for questions about Xerox research and innovation, patents or technology licensing, scientific work and related inquiries, please email: xigwebmaster@xerox.com

Outside Submissions: Xerox encourages and welcomes unsolicited ideas and suggestions. More information on submitting your ideas to Xerox for review can be found here.

If you have any questions, please don't hesitate to contact us by email at Outsidesubmissions@xerox.com.

For all other inquiries, please use the appropriate contacts listed at Contact Xerox.