Selecting GSA targets – Part One: Four abstract targets
It is, of course, not enough to simply build an enterprise search platform. Sure, you can do what we did on day one, when our Google Search Appliance (GSA) arrived and we gleefully hooked it up to our local Sacramento Office network and did a global target of everything. You know, just to see if our GSA worked. It did. And in short order, as we blew out its one-million file crawl limit, we discovered the obvious: LSNC has a whole lot of documents and other files strewn about on various file servers and desktops, like so much digital flotsam. Needless to say, we did not need a TIG-funded GSA to reveal that fact. To know that, all one has to do is invoke Windows Explorer and peruse one’s local office file server. Enough said.
From the perspective of our enterprise search goals, most of these files do not contain content that has what we refer to as “shared value.” Namely, advocacy or other work-related content or information that LSNC staff would want to search for because they want it or need it to get the job done.
This observation does not suggest that all the other individual documents or files have no worth. They do, but to other purpose. For example, on a practical level, an advocate may have any number of drafts or versions of a document or file, but what the organizations will want to target and what users will want to get their hands on is the final or more polished version of that content. And that is likely what the original author will intend to share.
But if the organization targets everything, well, in the broadest sense what those who search will get is a lot of extraneous or incorrect or incomplete content. And a less serious but real-world challenge is the organization’s need to separate the true wheat (even if marginal) from the inevitable digital chaff on local office file servers and desktops. (Oh, come on — you know what we’re talking about here! All those personal photos, MP3s, YouTube videos, recipes from the Food Network, National Geographic wallpapers, long forgotten software downloads, … need I go on?)
There is a separate set of challenges to initially identify existing content that one would want to target with a GSA that has, after all, a set file limit. And then one has to work out practical policies and protocols for how to handle new content to be added to those targets. In upcoming posts, we will document how LSNC has approached both of these challenges.
But for now, here is a macro breakdown of what content we value and are initially targeting with the GSA. It is actually more simple to do than we initially thought it would be:
- Designated document repository master directory structures – that’s a mouthful, but it turns out that’s how we refer to it. We have worked out what we consider to be a basic, workable “taxonomy” for organizing files, to be detailed in an upcoming post. The short version is that both existing and new content that has been identified as valued will reside on project-specific files servers that have purposefully organized directory structures. This will make more sense once we explain (fairly soon) why we are adopting the structures or organizations we have worked out, and why, and how they will serve the overarching goal of “findability.” Stay tuned.
- Shared intranet content – within LSNC, we refer to our intranet as the “secured network,” the lingua franca here for what other organizations refer to as their intranet. At this juncture, most legal services programs have some sort of intranet structure already in place, with varied user-side implementations to give staff access to its content. (Currently, ours is built out with MediaWiki as the principal content management tool, but soon to be supplanted with either WordPress and/or Google Sites. (I have posted details on that side story at LSNC’s tech blog, Webdogs 2.0.) By historical definition, everything on our existing intranet is valued. It’s fairly lean, mean, to the point, well organized and includes among other things, in no particular order:
- Administrative manual
- Case management manual
- Development and funding-raising resources
- LSC policy archive
- LSNC forms (administrative and case-related)
- LSNC policy archive
- MCLE – Training resources and forms
- Personnel and other shared human resource information
- Specialized Regional Counsel content (content subject to gatekeeper function)
- Specialized client content (content targeted for LawHelp access)
- Select LSNC public web content – LSNC is now reaping dramatic benefits from its decade-long focus on using its public web presence to create and share usable content for advocates. We are still in the process of parsing out those portions of the LSNC public content we want to target with the GSA, but these include our rich reservoir of advocate content on CalWorks (the name of California’s TANF program) and Food Stamps, and special project-specific content that derives from our Race Equity Project and housing and economic development work. The point here is that our enterprise search model will include not just valued content behind our firewall but also select public content that is every bit as valuable to our staff in getting the job done.
- Pika Case Management System – this will likely be the last piece of the enterprise search puzzle for us, but a major chunk of our GSA file limit will be devoted to exploiting the GSA to alter dramatically how LSNC staff search and locate data within Pika. We have already run some initial targeting tests on Pika and we really, really liked what the search results looked like. It is not a technical challenge to target Pika with a GSA, not at all, but there are some significant challenges in sorting out how best to limit the GSA crawl to target precisely what we really want to make searchable, without blowing out our GSA file limit. Once we work out those kinks, we will likely replace the native Pika search functions (which is little more than a raw SQL search function) with a customized subset of GSA functions.
In the scheme of this project, content is king, knowledge content rules, and the Google Search Appliance is Gandalf, the wizard asking “What do you see? Can you see anything?” Indeed.

