Posted on: September 10, 2015in Blog
4 Methods to Help Improve Your Keyword Search Results
This post explains four best practices for developing better functioning keywords: visibility, dictionary access, iteration/testing, and keyword expansion via concept analytics.
In 1985, the Blair and Maron study, found that keywords alone identified less than 20 percent of the relevant documents in their test case of roughly 350,000 pages of text. Thirty years and a many conversation later, the majority of cases we work continue to rely on keyword searching as the primary means by which to reduce document collections by identifying a relevant, document-review set. Despite these findings and judicial statements about the inefficacy of keyword searching in eDiscovery, the prevailing practice continues to be search, process, review, produce.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
So, given that keywords are not very effective and that they still remain as the dominant culling tool in eDiscovery, how do we make them function better? There are four keys to developing better functioning keywords. Those keys are: visibility, dictionary access, iteration/testing and keyword expansion via concept analytics.
Too often search terms are created in a vacuum with no insight into what words or phrases actually exist in the dataset. In the past when processing rates exceeded $2000 per gigabyte terms were used to reduce collections prior to processing thereby saving vast amounts of money. Today the market has changed and processing rates are at historic lows. Searching in the review tool affords greater visibility into search results. Case teams have immediate access to search hits and can quickly determine which terms are successful and which terms need refinement. Visibility allows you to see keywords in context rather than as a number on a report.
All keyword indexes are built off a dictionary or list of terms in a particular database. Some review tools expose this dictionary to end users so that you can look up the number of instances and documents that contain any specific term. This is a powerful tool because before negotiating terms with the opposition, you can get an understanding of what terms exist. Dictionaries also allow you the ability to submit fuzzy searches to search terms with variable spellings. Fuzzy searching is a good strategy for commonly misspelled last names. For databases with a high volume of OCR, it may not be enough to simply search for the name, “Johnson.” A dtSearch dictionary will tell you how many variations of the word “Johnson” exist, while allowing you to eliminate legitimate variations such as “Johanson” and “Johnston.”
Although keyword expansion requires that addition of an analytics engine, it can be a powerful tool for creating search terms lists to run against the dtSearch index. When you submit a keyword to the expansion engine, the analytics index will identify all the terms that are “conceptually” related to your term. By conceptually related we don’t mean synonyms, but terms that share conceptual meaning within your specific dataset. For example, an R&D team may be working on a project code named “falcon” but the product name might be GL4800. The case team probably knows about GL4800, but may not know to also search for “falcon.” Keyword expansion will help you find those conceptually similar terms.
Finally, it is important to test your terms and to run multiple iterations of terms to ensure that your recall is what you might expect. Too often, we run terms only to have case teams want to refine them after review has started because of the large number of false positives. Testing and sampling terms is an iterative process. Most review tools allow you to sample search hits, allowing you to confirm the results of your search terms before finalizing and building a review. Building this practice into your future reviews may require a little time at the outset, but will save you time in the end.
While search terms remain the primary means by which case teams cull and identify documents to review for litigation, the fact is that they are terribly unreliable. That being said, the use of keywords as a review strategy is not going away any time soon. The four keys noted here will help to improve keyword recall and make sure that you are making the most of your terms. The success of your search terms is predicated upon the time you put into development and testing before you finalize them.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted November 16, 2017
5 Workflow Tips for Conducting a Foreign Language Review
Posted November 10, 2017
What You Need to Know About Managed Review and the eDiscovery Process
Posted November 02, 2017
7 Steps to Help You Defensibly Migrate eDiscovery Data
Posted October 27, 2017
CLE Webinar with Lewis Brisbois: How to Do Social Media Collection and Presentation Right
Posted October 26, 2017
Despite Clawback, Defendant’s Reckless Abandon of Rule 502 Bites Back
Posted October 20, 2017
How to Use the eDiscovery PST Export Tool in Office 365 E3
Posted October 12, 2017
Recent eDiscovery Cases for Mobile Phones and Social Media
Posted October 05, 2017
Raising Objections to the Format of ESI Productions: Do it Early and Do it Clearly
Posted September 27, 2017
5 Reasons eDiscovery Alternative Fee Models Make Sense for You
Posted September 22, 2017
Why it's Crucial to Have a Corporate Mobile Device Policy