These are our contentions:
- Metadata matters.
- Metadata adds worth to data.
- Documents are data.
- Keywords are the essential data in documents.
- Keywords in context create knowledge.
- Documents have worth because they contain knowledge.
- Enterprise search finds keywords.
- Findable keywords yield documents.
- Findable keywords in context yield documents with knowledge.
- Knowledge in documents has worth.
- Metadata is not essential for enterprise search.
- We don’t need metadata.
What’s our point? Before answering that question, we invite understanding of the context: This project is about implementation of enterprise search within a large but not humongous non-profit organization. We’re talking about 170 paid employees, with easily an equal number of volunteers of one kind or another. So let’s say for purposes of context that we have 350+ real people using our networked infrastructure. We have two — count ‘em, two — IT guys. We’re not talking Fortune 500 here. We’re not even talking Fortune 500,000. That’s our world.
Working on this project, we have evaluated what we need from metadata as part of enterprise search implementation. Our conclusion? We don’t need metadata.
Or better said, we don’t need to add metadata for a Google Search Appliance (GSA) to accomplish what we want to accomplish with enterprise search. We could use metadata more — and there are several very impressive features in a GSA that can exploit external metadata and metadata biasing of search results — assuming the organization has the resources to organize and manage metadata. But as a practical matter, do we have the resources to go down that path and, ultimately, do we need it? No.
In fact, as part of this project, we have put a metadata model in place, a simple “labeling” or tagging system. It exploits our Sharepoint server installation with a practical (if kludgy) way to add metadata to files saved to a shared document repository. For example, when saving a file in a directory in our structural taxonomy, as the user navigates — say, to the Income Maintenance folder…

…a dialog box pops up with a prompt to add one or more optional “LSNC labels” to the file, associating the file with additional folders or categories in our taxonomy:

In the above example, an Excel spreadsheet with unemployment data is being saved to the “Unemployment Insurance” folder, a subfolder under the “Income Maintenance” top-level directory, but is also marked or tagged as “Data-Statistics-GIS” and “Employment.” Even then, this kludge only works with Microsoft applications, which is to say Sharepoint doesn’t work as cooperatively with other applications we rely on, like WordPerfect, Adobe Acrobat and others.
Regardless, is the addition of metadata to documents a good thing? Obviously, yes. Metadata matters. (Taxonomy matters, too… yet to what purpose?) Do you need to add metadata to documents for effective enterprise search, and specifically with a Google Search Appliance? Not really, not for what we are doing. Why not? Because improvements in search algorithms are such that metadata is not needed to help the search.
The poster child for these gains in enterprise search algorithms is, not surprisingly, Google whose GSA has matured considerably. Google is a verb. Microsoft (or Sharepoint) are not. A principal reason for that is Google years ago broke out early from the search-engine pack and raised the bar in terms of quality of search results. Google became what the average person now expects from search. That is why it is a verb. It is what most people do. They Google. Another reason is that Google simplifies search.
In the context of our project, at the scale and with the resources available to even a fairly large non-profit, what is practical or impractical in using metadata? And even if used, does it affect the quality of enterprise search results enough to warrant those additional costs in time and money?
So far, we don’t see it.