Posts tagged gsa

Our final 15 minutes of Google fame

LSNC logo

It was a pretty nice surprise for LSNC several months back to be asked by Google to present Advancing Knowledge Sharing with Google: The LSNC Story, with its focus on what we accomplished with The Findability Project.

Prior to but independent of that webinar, Google interviewed LSNC about The Findability Project and LSNC’s larger experience of integrating a Google Search Appliance with Google Apps and the Pika case management system. At its Google Enterprise customer solutions site, Google currently features and has posted its LSNC case study. Sure, it’s a marketing stroke but, still, it’s great to be included.

Pika and the Google Search Appliance make nice

For those who have followed The Findability Project, I am pleased to report we have surmounted the basic technical problems of targeting our Pika CMS with the Google Search Appliance.

The back story is one I have purposefully repeated whenever giving a presentation about the project, namely, that our Pika Plan A did not work. We encountered code anomalies in Pika that, among other things, cause it to auto-generate new case intakes and case records when it is crawled by the GSA. As a result, we were unable to use the GSA to crawl the Pika client case content dynamically generated as web pages. Plan A would have been the easiest, no-brainer way to go but we were not able to do so. So Plan B was to have the GSA target the Pika MySQL database directly. Status report: Mission accomplished.

There are GSA capacity issues for us, since our particular GSA’s one million “record” capacity means one million web pages or database records, inclusive, and these database records are not the same thing as the count for client case records. At any given time, we may have some 130,000 to nearly 200,000 client cases in our Pika system (and even more in archival data storage), but from a database perspective, these add up to multi-millions of “records,” e.g., various types of time records, case notes, contacts, and so on. Part of the challenge for us was to sort out which pieces of those millions of database records were the ones most needed and useful to our users.

The solution? Using a well-tailored query, we have the GSA do a selective crawl of the Pika MySQL database to return the most commonly sought and used Pika content: Case numbers, client names, office designations and case notes… tons of case notes. The basic technical explanation is the GSA performs a database query, returns it as an XML feed, indexes that feed, against which the user’s search terms are queried and ultimately returned as viewable HTML

What does the the search result look like? A Google search result. The clickable link displays the case number, client name, LSNC office and primary advocate name, e.g., “90-10-123456 ~ John Client ~ Sacramento ~ Jane Advocate.” Below that it displays in-context text with the search terms highlighted in bold, essentially like a regular Google search result. Clicking the link dynamically displays the actual Pika case note shown in context. Assuming there are multiple possible matches for a particular Pika case record, there is a link to display all the “omitted results,” akin to how regular Google searches work, so the users can see all possible, not just probable matches. Clicking through the GSA search result link also gives the user direct clickable access to the particular client case record since clicking through takes the user to the actual Pika client case record.

That’s the name of that tune.

Coda re 2010 TIG Knowledge Management session

Last Wednesday at the 2010 LSC TIG Conference, Chicago-Kent’s Ron Staudt and I did a joint session, Knowledge Management – What It Is, Why It Matters, and (Google) Options For Making What You Know Findable. Ron, of course, was cogent, concise and charismatic and stayed within his presentation window and hit all his marks. Me? Regrettably, after all these years, I still haven’t figured out how to squeeze 10 pounds of cement into a 5 pound bag, and didn’t even get to several key points I had hoped to make about enterprise search and The Findability Project. To make matters worse on my end, at the beginning of my segment the Flash demo of how LSNC’s enterprise search front end works faltered badly since it displayed so poorly when projected. (More than one person mentioned to me afterwards that they were simply not able to see accurately what I was describing at the moment. (Uh, it seemed like a good idea at the time.)

With those apologies out of the way, allow me to annotate a few points now to make up for at least a few things that I did not cover during the presentation:

The LSNC “portal,” “intranet” and “document repository”

I feel I successfully got across the point that there is a broader sense of “search” at play that is important to grok, as an organization works toward enterprise or so-called “universal” search. However, because I ran out my clock and didn’t have time to talk at length, I didn’t quite get to describing the varied content targets that LSNC has identified as valuable, useful and usable and therefore all that which we wanted to make readily, easily findable. In going over all that, in passing I mentioned that The Findability Project originally included a SharePoint component which is now being abandoned, in favor of our relying on components of the Google Apps platform, specifically, Google Sites.

The LSNC Shared Portal demo’d but not successfully displayed during the presentation is itself not part of Google Sites. The portal is itself a point-of-entry front end built on a WordPress PHP installation, and designed to complement our Pika 4.0 installation, which is also a PHP application. The portal is a point-of-entry but not a strictly controlled one, in the sense that users are not required to go through it to access either Pika or their Google Apps. But the portal is a custom user-interface that affords our users quick, efficient access to all the core web-based applications they need to do their work, plus a program calendar and a slew of LSNC-specific newsfeeds. And then there is the portal’s killer app: The enterprise search box, the findability trigger that searches all of the valued, useful, usable shared content. The enterprise search box initially gives you what I described in the session as “horizontal” search; at the (poorly displayed) search result page our users then have access to “vertical” filtering options.

And, as illustrated with the search for my personnel information and photo, our users can use the enterprise search box to do special data queries to get specially tailored search results. For example, when I did the demo search for “staff brian,” here’s what was basically happening: Triggered by the keyword “staff,” the Google Search Appliance (GSA) activates a OneBox module that did a query of our Pika CMS database, returned that query result as XML, which in turn was processed through XSLT and output for display as HTML.

The other private content areas I described are all now, or soon will be, part of our domain’s Google Sites. All of our organization’s “official” intranet content is now positioned at a Google Sites location, as is our new “shared document repository.” The GSA works very well with the Google Apps platform, and natively integrates with Google Analytics, among other things. Great stuff.

SharePoint issues

My observation at the beginning of my segment that LSNC was the first legal services field program to adopt the Google Apps platform and the first to abandon SharePoint was not intended to be provocative. It was intended to be transparent about what we are doing and why. Unfortunately, I never got around to explaining our organization’s views on SharePoint.

The short version is this: Given what we want and need to do with our shared work and collaboration space, we simply no longer see any advantages to using SharePoint. Zero. Zip. Nada. At launch of The Findability Project we viewed SharePoint as a key component for hosting and building and sharing content. And SharePoint is a great option for that. It is a very impressive product. But about six months into The Findability Project, Google unleashed Google Sites as part of the Google Apps platform, and for us it was a game changer. Google Apps is free (for non-profits, for the foreseeable future), we don’t have to host, maintain, secure, update or fix it, and Google continues to aggressively improve its features, along with everything else within Google Apps. And we are able to do pretty much everything we need to be able to do with it. True, SharePoint has an enormous mindshare within corporate America. And organizations do need to evaluate whether SharePoint has features or functionality that are unique or indispensible to it. For us, it has none.

Oh, and did I mention that the GSA works natively with Google Apps?

While not the reasons why we have bailed out on SharePoint, there are these views questioning what role SharePoint has in your future: Peter Campbell’s article, Why SharePoint Scares Me; and more contrariness from Dion Hichcliffe, Sharepoint and Enterprise 2.0: The good, the bad, and the ugly.

More self-criticism: What we don’t like about our user interface

Perhaps I spent too much time trying to drive home the importance of usability as a concept and how it relates to findability. I am fascinated by usability concepts and, after now years of practical experience, sobered by the reality of how challenging it is to do well. We are very pleased with what we have accomplished with our portal (and related search result page and Pika CMS) designs shown in the slides. But I also had planned on taking a few minutes to highlight what are remaining problems with our design, and “usability” thoughts about improving or fixing them. For example, we already plan on altering how we use tags as part of the portal page, and will soon be modifying the vertical filtering options on the enterprise search result page, to expand those options and make them more intuitive. I think we have done good. I think we can do better. And we will.

Knowledge management as poetry

I wasn’t entirely irresponsible about keeping within my allotted time. One thing I considered doing but dropped from my presentation to save time, was my giving a dramatic reading of the most famous poem ever about knowledge management. Yes, there is such a thing:

“The Unknown” by Donald Rumsfeld

As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don’t know
We don’t know.

[U.S. Department of Defense news briefing, February 12, 2002]

As far as I can tell, this was someone who never actually grasped basic concepts of findability.

But that’s me. What do I know.

Revised: What the LSNC Shared Portal now looks like

We have now posted a further revised Jing video with audio providing a brief, 4-minute overview of the LSNC Shared Portal. This is the actual intro overview video we circulated internally to provide all staff with a basic visual and feature orientation, before our more extended, in-house live demos to be conducted next week.

It’s not so easy to do a public video demo of our new Pika 4.0 case management system design changes, because of confidentiality issues, but we will post select screenshots reasonably soon so you can get a visual idea of changes we have made to that application.

TIG final evaluation report for The Findability Project

For those interested, here is the recently approved TIG final evaluation report for The Findability Project.

This TIG project was funded for an 18-month period from January 2008 through June 2009. Much of the report will ring familiar to those who have followed the project here, since much of what has already been posted mirrors what would be required in a TIG evaluation report. Essentially, this public project site enabled us to give others in the legal services community an ongoing, if lagging, report of progress on the project, while at the same time considerably easing the process of writing up the evaluation report at the end of the project since we had already written most it as we went along.

We’re winding things down here, but we will continue to post here at least through the next TIG conference in early 2010. Among other things, we will be detailing how in finalized form we are integrating our project’s GSA test frontend functionality into a more expansive shared organization portal, part of our current deployment of a heavily customized version of Pika 4.0. We have finished the LSNC redesign of Pika 4.0 as well as a new LSNC shared portal “front door” (built on WordPress), both of which are scheduled to be in place and in use by LSNC staff the day after the Labor Day break.

Stay tuned, people!

Getting Google-y with the enterprise

As a coda to the post yesterday about findability, the pervasiveness of the Google search paradigm, and what that means for the non-profit enterprise, I want to take a moment to put focus on a question during the session about an online post screenshot highlighted in one of the slides: “Why Enterprise Search Will Never Be Google-y.” I fear I did a poor job of answering the question about how it is that the author viewed Google enterprise search as different from other types of enterprise search. Mea culpa.

A couple of follow-up observations, to better respond:

As mentioned during the presentation, one point of the slide was to draw attention to The Noisy Channel, a very search geeky, characteristically Google-contrary, but always interesting, worthwhile blog helmed by Daniel Tunkelang, chief scientist at Endeca, a high-end direct competitor with Google in the enterprise market. Agree or not, there is a lot to learn about search from The Noisy Channel. It is one of my must-reads.

The title of Daniel Tunkelang’s highlighted post derives directly from Chris Sherman’s pithy, two-page online article with the same name, Why Enterprise Search Will Never Be Google-y (from the Enterprise Search Sourcebook 2008.) The gist of Daniel’s post and Chris’ article that prompted it is this: The “simple search” or “known item” search we all commonly associate with Google (the noun and the verb) short changes what enterprise search can or should be for those who use it. The tension between these two enterprise search models is why I highlighted these two paragraphs from Daniel’s post:

The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn’t recommend that anyone try to compete with the GSA on its turf.

But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for “enterprise search” is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.

The irony here is that, contrary to the entertainingly provocative “never will be Google-y” in the title, for some market segments enterprise search is already Google-y. In some respects, Daniel’s post and Chris’ article both actually make the case for, not against, the Google enterprise model, which is to say that for some segments of the enterprise market Google and its search appliance may very well be the way to go. Our experience is that it is a particularly viable way for a non-profit legal services program.

Why do I say that? Even assuming arguendo that Google Search Appliance (GSA) improvements “should be seen in the context of state of the art,” for many organizations this state-of-the-art is a rarified and unobtainable reality. One has to wonder, after costing out a solution with one of the three major market leaders in enterprise search (Autonomy, Endeca and FAST), whether a Google box doesn’t look pretty damn good and pretty damn doable, given what it does. As Daniel himself observes, “I wouldn’t recommend that anyone try to compete with the GSA on its turf.” Is that turf a real solution for some market segments? While Chris invokes a clever if overstated “oil and water” metaphor about the differences between web and enterprise search, he follows it by suggesting the exact opposite: Some enterprise search segments are well served by the Google paradigm, notably including “intranet search” –

Many organizations are encouraging employees to communicate internally via blogs, or to participate in community-based knowledge repositories such as internal wikis. This is one area where there is a genuine parallel between enterprise information systems and web content, and Google excels at understanding and surfacing this type of content.

Tell me about it.

Findability and the Google search paradigm

Following up on an NTAP presentation I gave last Thursday, Findability and the Google Search Paradigm: Integrating Search as a Organizational Solution, here is a publicly viewable set of the presentation slides, which are in a Google Docs presentation format and include embedded links to a lot of the material I discussed during the presentation. You can find the New York Times article I mentioned about Twitter as an example of “crowd-sourcing” at David Pogue’s post, The Twitter Experiment.

I painted with a broad brush during the presentation. The goal of the presentation was to offer the legal services community a broader view, and an emerging view, of what it means to search, to search on the enterprise, and to suggest what it means to Google search on the enterprise. These are just the slides. While I gave a brief live demonstration of how our GSA installation actually functions when generating and filtering search results, you’ll have to come to the upcoming 2010 LSC Technology Initiative Grants Conference to get a more expansive demonstration and technical explanation of our implementation, including a solution (hopefully) to the problems we’ve had with Pika CMS integration into our enterprise search solution.

As is my bad habit, I went long and so the discussion at slide 72 about the real and imagined obstacles to implementing enterprise search in a non-profit environment got short shrift, and for that I apologize. I promise to do a better job with those issues at the TIG conference. In our experience getting our “stuff” organized, and hammering out practices and protocols, was a much larger time commitment on this project than the strictly technical stuff. And then there are the paralysis-against-progress problems that large organizations may experience since, in my view, they mistakenly think they have to have everything about taxonomy, vocabularies, folksonomies and metadata in place. For example, I have argued here, with our somewhat novel Google Search Appliance implementation in a non-profit environment, that we could do fine for now without relying significantly on metadata to make our project work. Others beg to differ.

In any event, I hope the presentation last Thursday was helpful. Let’s all talk again at TIG in January 2010.

A quick and dirty OneBox using PHP

Arguably the most common, if not first, Google Search Appliance (GSA) OneBox module that organizations implement is a module that returns personnel information or listings of some kind. It is one of the most obviously useful OneBox results one can come up with. As we ramped up to implement our version of it, we were surprised to discover that most publicly available examples or models for creation of OneBox modules rely on technologies (ASP and Java being among the most prevalent) that we do not use. We could not find an example of such a OneBox using simple PHP/MySQL.

Our goal was to build an easily replicable OneBox module that does work with PHP, which we do use. A lot. PHP is at the heart of the Pika CMS as well as our public websites built on WordPress.

Here’s an example of what our OneBox special query result looks like, with the first keyword “staff” being the trigger and the second keyword “ukiah”, the name of one of our local office locations. The query returns a OneBox result listing all the active staff in that office:

Clicking on the link for each person’s name triggers a new display with a photo of the person and his or her vitals. This module also works using the same initial trigger with a staff person’s particular name.

Most simply put, this OneBox module works by querying the MySQL database “users” table in the Pika CMS, the application used by all our active employees, across all positions, to record their time and work activity. More specifically, the module breaks down into five basic steps:

  • the OneBox module sends a query to a targeted PHP file
  • the PHP code runs a query against the targeted MySQL database
  • the PHP code then outputs the returned data as XML
  • the GSA reads that XML output
  • the GSA then formats that output for display as a search result

Within the GSA console, one creates a module by selecting OneBox Modules > Create Module Definition, selecting the Trigger (in our case, “staff’), and then identifying the Provider, which in this example is the PHP file we created and attached to the module as an External Provider, by inserting the URL to the PHP file.

You can download as a ZIP file the PHP code and related GSA stylesheet template used in this example.

The PHP file is annotated, but has select information edited or removed (host, passwords, etc.), for obvious reasons. Looking at the PHP code, in sequence the PHP submits the query, connects to the database, joins data from a combination of data tables in our case management system, then takes the results from the MySQL query and outputs it as XML, i.e., the “OneBoxResults” in the code.

Once the GSA outputs the query results as XML, it can then publish the results to a OneBox Stylesheet Template, which one can edit by clicking on the Edit XSL link at the bottom of the console page for the particular module.

How we organized our targeted Google Sites content

Since we’re on the subject of revisions and updates today, here’s another about how we finalized our Google Sites content.

As noted earlier, The Findability Project planned integration of select Google Sites content as a GSA target. How we created LSNC’s “official” intranet site with Google Sites was covered (briefly) as part of a recent NTAP presentation.

Since that presentation, we have pretty much completed the migration of all our intranet content over to what LSNC calls its “Shared Private Network” (SPN). For those curious, here is a screenshot of the current site’s home page; and here’s a screenshot of the top levels of the sitemap. As you can see, we have worked to keep the hierarchy simple which means manageable, especially given the number of different folks who have responsibility to maintain its content. Also, we have created a large number of Google Sites file cabinet “upload” pages to make management of those file easier, for the same reasons. So far, so good.

What is great about all this is that the GSA easily targets this selected Google Site, and returns great results from the site. Users can have it both ways, by searching from the GSA frontend but with equal ease from the native search function within the Google Site itself. It’s all good.

Google Apps, SharePoint and this project

At the outset, let it be acknowledged that SharePoint is a great product. For good reason, many in the legal services community have either adopted or are at least seriously looking at SharePoint as a core component of their network infrastructure. A notable example of this trend from earlier this year is Tom Winter’s video collection of SharePoint Resources for Legal Aid. Impressive.

That said, observant followers of The Findability Project may have noticed our chronic inattention, and now outright de-emphasis of SharePoint. There’s a reason. Actually, several reasons.

When we submitted our TIG proposal in 2007, we proposed SharePoint as a key component of the technical specifications for this project. Once we received the grant in 2008, that is exactly how we proceeded as we put together our so-called blunt-instrument build. At the time, we put in place an open-source Google SharePoint connector that plays nicely with the Google Search Appliance (GSA). (We have documented how we configured the SharePoint side of things; we will eventually document how the Google connector configurations work.)

From the get-go we recognized the basic promise of SharePoint, i.e., it offers an array of enterprise platform options for creating and maintaining organizational portals and managing content. All stuff we wanted as we built out our project, moved toward positioning our content in very purposeful ways, and worked out optimal ways for our organization to communicate, share and find content. True, we were less sanguine about SharePoint’s enterprise search features. Not because it is not effective. It is. But we had greater confidence in the algorithms and effectiveness of Google enterprise search, which natively works with most everything Google, and SharePoint does not. But we will put that tribal view aside, for the moment. We give SharePoint its due: Impressive.

That was late 2007, early 2008. This is now, a little more than a year later. What happened in the interim? Google Apps happened … way more, way better Google Apps including an increasingly impressive array of collaboration features … including domain Google Sites … integration of Google Analytics into Google Apps … and then at the end of 2008 some serious happy with the version 5.2 update for the Google Search Appliance, which now integrates with Google Apps, including Google Sites.

Way impressive.

Even though we had SharePoint in place and could have built out our intranet using it, we all but immediately and instinctively moved on to Google Sites once it became available to us in 2008 and, in short order, built things out that way. (See Google Apps Redux for more about how LSNC currently uses Google Apps, including Google Sites.) It is not that SharePoint is not useful to accomplish many of the same things. It is. But at what cost and at what loss in usability?

For a modestly sized non-profit like ours (about 130 employees and two actual IT staff, not wannabees), the Google Apps platform has proven to be a phenomenal, secure, essentially zero-cost, zero-maintenance way to have access to pretty much all the basic collaborative and communication technologies now deemed baselines for the legal services community. (Oh, yeah, the baselines happened in 2008, also.)

And all this stuff works very nicely with the Google Search Appliance. SharePoint, not so much.