Posts tagged findability

Pika and the Google Search Appliance make nice

For those who have followed The Findability Project, I am pleased to report we have surmounted the basic technical problems of targeting our Pika CMS with the Google Search Appliance.

The back story is one I have purposefully repeated whenever giving a presentation about the project, namely, that our Pika Plan A did not work. We encountered code anomalies in Pika that, among other things, cause it to auto-generate new case intakes and case records when it is crawled by the GSA. As a result, we were unable to use the GSA to crawl the Pika client case content dynamically generated as web pages. Plan A would have been the easiest, no-brainer way to go but we were not able to do so. So Plan B was to have the GSA target the Pika MySQL database directly. Status report: Mission accomplished.

There are GSA capacity issues for us, since our particular GSA’s one million “record” capacity means one million web pages or database records, inclusive, and these database records are not the same thing as the count for client case records. At any given time, we may have some 130,000 to nearly 200,000 client cases in our Pika system (and even more in archival data storage), but from a database perspective, these add up to multi-millions of “records,” e.g., various types of time records, case notes, contacts, and so on. Part of the challenge for us was to sort out which pieces of those millions of database records were the ones most needed and useful to our users.

The solution? Using a well-tailored query, we have the GSA do a selective crawl of the Pika MySQL database to return the most commonly sought and used Pika content: Case numbers, client names, office designations and case notes… tons of case notes. The basic technical explanation is the GSA performs a database query, returns it as an XML feed, indexes that feed, against which the user’s search terms are queried and ultimately returned as viewable HTML

What does the the search result look like? A Google search result. The clickable link displays the case number, client name, LSNC office and primary advocate name, e.g., “90-10-123456 ~ John Client ~ Sacramento ~ Jane Advocate.” Below that it displays in-context text with the search terms highlighted in bold, essentially like a regular Google search result. Clicking the link dynamically displays the actual Pika case note shown in context. Assuming there are multiple possible matches for a particular Pika case record, there is a link to display all the “omitted results,” akin to how regular Google searches work, so the users can see all possible, not just probable matches. Clicking through the GSA search result link also gives the user direct clickable access to the particular client case record since clicking through takes the user to the actual Pika client case record.

That’s the name of that tune.

Legal research and the need to be “more like Google”

A few months back, there was a good amount of copy about Google Scholar features for searching federal and state court decisions — an impressive step up for using Google, at least at a consumer-user level, to find court decisions, but (puhleeeze) not as a tool for serious research of legal consequence. More recently the New York Times ran a feature article about changes afoot in Westlaw and Lexis, both of which “will undergo sweeping changes in a bid to make it easier and faster for lawyers to find the documents they need.” The opening salvo in this clash of the legal research titans occurred this week with debut of WestlawNext. To hear Westlaw and Lexis talk about it, what they are in part reacting to is the perceived need to be “more like Google.”

Yes, but one’s understanding of that conclusion depends on how one defines or explains what it means to “Google” things. At the recent TIG conference, during the “findability” segment I presented, I made a point stressing the significance of Google as not being “Google” itself, as pervasive as it is in all our lives. Rather, the significance of Google is the dramatic paradigm shift that has occurred in how we search for and use information. Google is a primary agent of this paradigm shift but certainly not the only one. And the connections between specific search paradigms (universal search, vertical search, faceted search, and so on), the relative ease of locating or discovering information, and improvements in user-interface and usability design — all are converging to enhance the findability of what one is looking for.

That said, the impact of all these trends on specialized (re)search tools like Westlaw and Lexis is pretty obvious. If “Wexis” users are demanding their research tools become “more like Google,” what the users are saying is that those companies must make a paradigm shift, or they’ll go to a company that gets it.

Findability slides and video from 2010 TIG conference

I’m not sure what happened with the slides or recording of the Knowledge Management session at the recent 2010 TIG Conference. The session doesn’t show up in the LSC documentation of the event.

In any event, here’s a set of the slides for my “findability” segment, about search paradigms, findability as a concept and what we’ve done to implement enterprise search using a Google one-two punch: the Google Search Appliance in combo with the Google Apps platform. Also, here’s the brief flash video of our portal front end and search result/filtering examples that I ran during the presentation but displayed so poorly. The point of the video was to give the audience a real-world feel for how it all works. Again, my apologies for how bad the video displayed in that setting. Lesson learned.

Coda re 2010 TIG Knowledge Management session

Last Wednesday at the 2010 LSC TIG Conference, Chicago-Kent’s Ron Staudt and I did a joint session, Knowledge Management – What It Is, Why It Matters, and (Google) Options For Making What You Know Findable. Ron, of course, was cogent, concise and charismatic and stayed within his presentation window and hit all his marks. Me? Regrettably, after all these years, I still haven’t figured out how to squeeze 10 pounds of cement into a 5 pound bag, and didn’t even get to several key points I had hoped to make about enterprise search and The Findability Project. To make matters worse on my end, at the beginning of my segment the Flash demo of how LSNC’s enterprise search front end works faltered badly since it displayed so poorly when projected. (More than one person mentioned to me afterwards that they were simply not able to see accurately what I was describing at the moment. (Uh, it seemed like a good idea at the time.)

With those apologies out of the way, allow me to annotate a few points now to make up for at least a few things that I did not cover during the presentation:

The LSNC “portal,” “intranet” and “document repository”

I feel I successfully got across the point that there is a broader sense of “search” at play that is important to grok, as an organization works toward enterprise or so-called “universal” search. However, because I ran out my clock and didn’t have time to talk at length, I didn’t quite get to describing the varied content targets that LSNC has identified as valuable, useful and usable and therefore all that which we wanted to make readily, easily findable. In going over all that, in passing I mentioned that The Findability Project originally included a SharePoint component which is now being abandoned, in favor of our relying on components of the Google Apps platform, specifically, Google Sites.

The LSNC Shared Portal demo’d but not successfully displayed during the presentation is itself not part of Google Sites. The portal is itself a point-of-entry front end built on a WordPress PHP installation, and designed to complement our Pika 4.0 installation, which is also a PHP application. The portal is a point-of-entry but not a strictly controlled one, in the sense that users are not required to go through it to access either Pika or their Google Apps. But the portal is a custom user-interface that affords our users quick, efficient access to all the core web-based applications they need to do their work, plus a program calendar and a slew of LSNC-specific newsfeeds. And then there is the portal’s killer app: The enterprise search box, the findability trigger that searches all of the valued, useful, usable shared content. The enterprise search box initially gives you what I described in the session as “horizontal” search; at the (poorly displayed) search result page our users then have access to “vertical” filtering options.

And, as illustrated with the search for my personnel information and photo, our users can use the enterprise search box to do special data queries to get specially tailored search results. For example, when I did the demo search for “staff brian,” here’s what was basically happening: Triggered by the keyword “staff,” the Google Search Appliance (GSA) activates a OneBox module that did a query of our Pika CMS database, returned that query result as XML, which in turn was processed through XSLT and output for display as HTML.

The other private content areas I described are all now, or soon will be, part of our domain’s Google Sites. All of our organization’s “official” intranet content is now positioned at a Google Sites location, as is our new “shared document repository.” The GSA works very well with the Google Apps platform, and natively integrates with Google Analytics, among other things. Great stuff.

SharePoint issues

My observation at the beginning of my segment that LSNC was the first legal services field program to adopt the Google Apps platform and the first to abandon SharePoint was not intended to be provocative. It was intended to be transparent about what we are doing and why. Unfortunately, I never got around to explaining our organization’s views on SharePoint.

The short version is this: Given what we want and need to do with our shared work and collaboration space, we simply no longer see any advantages to using SharePoint. Zero. Zip. Nada. At launch of The Findability Project we viewed SharePoint as a key component for hosting and building and sharing content. And SharePoint is a great option for that. It is a very impressive product. But about six months into The Findability Project, Google unleashed Google Sites as part of the Google Apps platform, and for us it was a game changer. Google Apps is free (for non-profits, for the foreseeable future), we don’t have to host, maintain, secure, update or fix it, and Google continues to aggressively improve its features, along with everything else within Google Apps. And we are able to do pretty much everything we need to be able to do with it. True, SharePoint has an enormous mindshare within corporate America. And organizations do need to evaluate whether SharePoint has features or functionality that are unique or indispensible to it. For us, it has none.

Oh, and did I mention that the GSA works natively with Google Apps?

While not the reasons why we have bailed out on SharePoint, there are these views questioning what role SharePoint has in your future: Peter Campbell’s article, Why SharePoint Scares Me; and more contrariness from Dion Hichcliffe, Sharepoint and Enterprise 2.0: The good, the bad, and the ugly.

More self-criticism: What we don’t like about our user interface

Perhaps I spent too much time trying to drive home the importance of usability as a concept and how it relates to findability. I am fascinated by usability concepts and, after now years of practical experience, sobered by the reality of how challenging it is to do well. We are very pleased with what we have accomplished with our portal (and related search result page and Pika CMS) designs shown in the slides. But I also had planned on taking a few minutes to highlight what are remaining problems with our design, and “usability” thoughts about improving or fixing them. For example, we already plan on altering how we use tags as part of the portal page, and will soon be modifying the vertical filtering options on the enterprise search result page, to expand those options and make them more intuitive. I think we have done good. I think we can do better. And we will.

Knowledge management as poetry

I wasn’t entirely irresponsible about keeping within my allotted time. One thing I considered doing but dropped from my presentation to save time, was my giving a dramatic reading of the most famous poem ever about knowledge management. Yes, there is such a thing:

“The Unknown” by Donald Rumsfeld

As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don’t know
We don’t know.

[U.S. Department of Defense news briefing, February 12, 2002]

As far as I can tell, this was someone who never actually grasped basic concepts of findability.

But that’s me. What do I know.

The Findability Project site goes dark

The domain-specific Findability Project site will go dark the first week of December 2009. The formal aspects of the TIG-funded project were completed months ago. We have posted a few items since then, but we are now, in a purposeful way, winding down the public aspects of the project.

The project content will endure but in a different location, here at Webdogs 2.0, the LSNC technology blog where we have long archived all of our public web development projects. The Findability Project is the latest, no doubt not the last, to find its archival home at Webdogs 2.0. For now, we have simply duplicated the site over to a subdirectory there. Eventually all TFP content will be integrated natively into the Webdogs 2.0 site.

From time to time, we will continue this conversation about search, enterprise search, and making organizational content findable, and therefore authentically usable, over at Webdogs 2.0.

Watch the skies, people. Or at least your search patterns. We do.

TFP, out.

"A List Apart" search / usability trifecta

Search is nothing new but it is, paradoxically, the new new within some circles of web design and definitely a core element of any sensible usability construct for web sites and web applications. On that note, A List Apart, the New York Times of web design, today publishes a search cum usability trifecta hitting on several issues I will be alluding to during the upcoming TIG conference, including what to make of your metrics. All the articles are read-worthy:

Revised: What the LSNC Shared Portal now looks like

We have now posted a further revised Jing video with audio providing a brief, 4-minute overview of the LSNC Shared Portal. This is the actual intro overview video we circulated internally to provide all staff with a basic visual and feature orientation, before our more extended, in-house live demos to be conducted next week.

It’s not so easy to do a public video demo of our new Pika 4.0 case management system design changes, because of confidentiality issues, but we will post select screenshots reasonably soon so you can get a visual idea of changes we have made to that application.

Summer hiatus

Many months of fish to fry ahead, including bearing down on LSNC’s customized rebuild of Pika 4.0 and working out a solution for integrating our Google Search Appliance. So, laying low until October 1. See you back here this Fall and in early 2010 at the next Austin TIG, when we serve up the whole enchilada.

Getting Google-y with the enterprise

As a coda to the post yesterday about findability, the pervasiveness of the Google search paradigm, and what that means for the non-profit enterprise, I want to take a moment to put focus on a question during the session about an online post screenshot highlighted in one of the slides: “Why Enterprise Search Will Never Be Google-y.” I fear I did a poor job of answering the question about how it is that the author viewed Google enterprise search as different from other types of enterprise search. Mea culpa.

A couple of follow-up observations, to better respond:

As mentioned during the presentation, one point of the slide was to draw attention to The Noisy Channel, a very search geeky, characteristically Google-contrary, but always interesting, worthwhile blog helmed by Daniel Tunkelang, chief scientist at Endeca, a high-end direct competitor with Google in the enterprise market. Agree or not, there is a lot to learn about search from The Noisy Channel. It is one of my must-reads.

The title of Daniel Tunkelang’s highlighted post derives directly from Chris Sherman’s pithy, two-page online article with the same name, Why Enterprise Search Will Never Be Google-y (from the Enterprise Search Sourcebook 2008.) The gist of Daniel’s post and Chris’ article that prompted it is this: The “simple search” or “known item” search we all commonly associate with Google (the noun and the verb) short changes what enterprise search can or should be for those who use it. The tension between these two enterprise search models is why I highlighted these two paragraphs from Daniel’s post:

The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn’t recommend that anyone try to compete with the GSA on its turf.

But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for “enterprise search” is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.

The irony here is that, contrary to the entertainingly provocative “never will be Google-y” in the title, for some market segments enterprise search is already Google-y. In some respects, Daniel’s post and Chris’ article both actually make the case for, not against, the Google enterprise model, which is to say that for some segments of the enterprise market Google and its search appliance may very well be the way to go. Our experience is that it is a particularly viable way for a non-profit legal services program.

Why do I say that? Even assuming arguendo that Google Search Appliance (GSA) improvements “should be seen in the context of state of the art,” for many organizations this state-of-the-art is a rarified and unobtainable reality. One has to wonder, after costing out a solution with one of the three major market leaders in enterprise search (Autonomy, Endeca and FAST), whether a Google box doesn’t look pretty damn good and pretty damn doable, given what it does. As Daniel himself observes, “I wouldn’t recommend that anyone try to compete with the GSA on its turf.” Is that turf a real solution for some market segments? While Chris invokes a clever if overstated “oil and water” metaphor about the differences between web and enterprise search, he follows it by suggesting the exact opposite: Some enterprise search segments are well served by the Google paradigm, notably including “intranet search” –

Many organizations are encouraging employees to communicate internally via blogs, or to participate in community-based knowledge repositories such as internal wikis. This is one area where there is a genuine parallel between enterprise information systems and web content, and Google excels at understanding and surfacing this type of content.

Tell me about it.

Findability and the Google search paradigm

Following up on an NTAP presentation I gave last Thursday, Findability and the Google Search Paradigm: Integrating Search as a Organizational Solution, here is a publicly viewable set of the presentation slides, which are in a Google Docs presentation format and include embedded links to a lot of the material I discussed during the presentation. You can find the New York Times article I mentioned about Twitter as an example of “crowd-sourcing” at David Pogue’s post, The Twitter Experiment.

I painted with a broad brush during the presentation. The goal of the presentation was to offer the legal services community a broader view, and an emerging view, of what it means to search, to search on the enterprise, and to suggest what it means to Google search on the enterprise. These are just the slides. While I gave a brief live demonstration of how our GSA installation actually functions when generating and filtering search results, you’ll have to come to the upcoming 2010 LSC Technology Initiative Grants Conference to get a more expansive demonstration and technical explanation of our implementation, including a solution (hopefully) to the problems we’ve had with Pika CMS integration into our enterprise search solution.

As is my bad habit, I went long and so the discussion at slide 72 about the real and imagined obstacles to implementing enterprise search in a non-profit environment got short shrift, and for that I apologize. I promise to do a better job with those issues at the TIG conference. In our experience getting our “stuff” organized, and hammering out practices and protocols, was a much larger time commitment on this project than the strictly technical stuff. And then there are the paralysis-against-progress problems that large organizations may experience since, in my view, they mistakenly think they have to have everything about taxonomy, vocabularies, folksonomies and metadata in place. For example, I have argued here, with our somewhat novel Google Search Appliance implementation in a non-profit environment, that we could do fine for now without relying significantly on metadata to make our project work. Others beg to differ.

In any event, I hope the presentation last Thursday was helpful. Let’s all talk again at TIG in January 2010.