Posted on: September 10, 2015in Blog
4 Methods to Help Improve Your Keyword Search Results
This post explains four best practices for developing better functioning keywords: visibility, dictionary access, iteration/testing, and keyword expansion via concept analytics.
In 1985, the Blair and Maron study, found that keywords alone identified less than 20 percent of the relevant documents in their test case of roughly 350,000 pages of text. Thirty years and a many conversation later, the majority of cases we work continue to rely on keyword searching as the primary means by which to reduce document collections by identifying a relevant, document-review set. Despite these findings and judicial statements about the inefficacy of keyword searching in eDiscovery, the prevailing practice continues to be search, process, review, produce.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
So, given that keywords are not very effective and that they still remain as the dominant culling tool in eDiscovery, how do we make them function better? There are four keys to developing better functioning keywords. Those keys are: visibility, dictionary access, iteration/testing and keyword expansion via concept analytics.
Too often search terms are created in a vacuum with no insight into what words or phrases actually exist in the dataset. In the past when processing rates exceeded $2000 per gigabyte terms were used to reduce collections prior to processing thereby saving vast amounts of money. Today the market has changed and processing rates are at historic lows. Searching in the review tool affords greater visibility into search results. Case teams have immediate access to search hits and can quickly determine which terms are successful and which terms need refinement. Visibility allows you to see keywords in context rather than as a number on a report.
All keyword indexes are built off a dictionary or list of terms in a particular database. Some review tools expose this dictionary to end users so that you can look up the number of instances and documents that contain any specific term. This is a powerful tool because before negotiating terms with the opposition, you can get an understanding of what terms exist. Dictionaries also allow you the ability to submit fuzzy searches to search terms with variable spellings. Fuzzy searching is a good strategy for commonly misspelled last names. For databases with a high volume of OCR, it may not be enough to simply search for the name, “Johnson.” A dtSearch dictionary will tell you how many variations of the word “Johnson” exist, while allowing you to eliminate legitimate variations such as “Johanson” and “Johnston.”
Although keyword expansion requires that addition of an analytics engine, it can be a powerful tool for creating search terms lists to run against the dtSearch index. When you submit a keyword to the expansion engine, the analytics index will identify all the terms that are “conceptually” related to your term. By conceptually related we don’t mean synonyms, but terms that share conceptual meaning within your specific dataset. For example, an R&D team may be working on a project code named “falcon” but the product name might be GL4800. The case team probably knows about GL4800, but may not know to also search for “falcon.” Keyword expansion will help you find those conceptually similar terms.
Finally, it is important to test your terms and to run multiple iterations of terms to ensure that your recall is what you might expect. Too often, we run terms only to have case teams want to refine them after review has started because of the large number of false positives. Testing and sampling terms is an iterative process. Most review tools allow you to sample search hits, allowing you to confirm the results of your search terms before finalizing and building a review. Building this practice into your future reviews may require a little time at the outset, but will save you time in the end.
While search terms remain the primary means by which case teams cull and identify documents to review for litigation, the fact is that they are terribly unreliable. That being said, the use of keywords as a review strategy is not going away any time soon. The four keys noted here will help to improve keyword recall and make sure that you are making the most of your terms. The success of your search terms is predicated upon the time you put into development and testing before you finalize them.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted February 23, 2017
Women in eDiscovery Atlanta | New Data Technology Trends
Posted February 23, 2017
Corporate Internal Investigations Best Practices
Posted February 13, 2017
4 Key Internal Roles Involved with Conducting Corporate Investigations
Posted February 09, 2017
Corporate Internal Investigations: A Legal & IT Love Story [Webinar]
Posted February 09, 2017
Intellectual Property Theft: How to Ensure a Defensible Investigation
Posted February 02, 2017
Could the Amazon Echo be a New Source of ESI?
Posted January 26, 2017
Information Governance Policies: The Fundamental Building Block to eDiscovery
Posted January 25, 2017
4 Urban Legends about Analytics and e-Discovery
Posted January 19, 2017
Legal Hold Triggers: When Should You Document Your Reasonable Anticipation of Litigation?
Posted January 12, 2017
5 New Year's Resolutions from an Experienced eDiscovery Team