Indexing and Searching XML Files with Google Desktop

It took me a while to find this tip, so here’s how to get Google Desktop to
index XML files:

I think by default GDS has XML file indexing TURNED OFF in
the registry – it won’t index them at all. This key is in
HKEY_CURRENT_USER\Software\Google\Google
Desktop\file_extensions_to_skip.  You’ll have to remove “xml” from that
list (I think)

(source: http://groups.google.com/group/Google-Desktop_past-discussions/browse_thread/thread/6059c564cf1bf007/4cb1bf32429413bb)

The ‘xml’ extension was in this list, and I removed it.  Now I wait
until the reindex is complete and try some queries.  Update to follow.

<update 2006-12-29>

The reindex has taken place, and there are plenty of XML files in the index
now.  I can find them by simply searching for *.xml, but I still can’t
search the contents of an XML file in Google Desktop Search.  Not such a
surprise, since Google Desktop Search’s help indicates it does not support XML
files (http://desktop.google.com/support/bin/answer.py?answer=12634&topic=201).

Before I get too far into what I’m trying next, let me explain my
need.  We use XML to send and receive shipment details with a business
partner.  From time to time, I need to be able to quickly locate a specific
shipment in the hundreds of XML files stored in the archive.  I’m positive
I’m not the only person in this situation.

I found a plug-in at http://www.trivex.net/ which will have
GDS treat XML as plain text, and that should hopefully improve the search. 
Apparently Copernic can be set to do this
from its control panel, and that’s the next stop.  A plain text search is
good enough, but ideally, either GDS or Copernic would support an XQuery for a
specific element or attribute value.

So I’ve installed Larry’s Any Text plug in (http://www.trivex.net/).  That was darn
easy.  I had to add the XML extension to the config file furing
installation, and the plug-in has triggered a reindex.  I’ll let you know
how this one works out.

<update 2006-12-29>

The index updated over lunch, and Larry’s plug-in did the trick.  I can
pull up the XML files I need simply by order number or some other
identifier.  It’s great.  Too bad it’s not an XQuery or something
similar, but it’ll work.  Keep in mind that GDS does not support wildcards,
so you have to enter the entire string you’re looking for, not a partial order
number.  Fortunately, element and attribute values are percieved as whole
words.

One thought on “Indexing and Searching XML Files with Google Desktop”

Comments are closed.