Posted on: September 10, 2015in Blog
4 Methods to Help Improve Your Keyword Search Results
This post explains four best practices for developing better functioning keywords: visibility, dictionary access, iteration/testing, and keyword expansion via concept analytics.
In 1985, the Blair and Maron study, found that keywords alone identified less than 20 percent of the relevant documents in their test case of roughly 350,000 pages of text. Thirty years and a many conversation later, the majority of cases we work continue to rely on keyword searching as the primary means by which to reduce document collections by identifying a relevant, document-review set. Despite these findings and judicial statements about the inefficacy of keyword searching in eDiscovery, the prevailing practice continues to be search, process, review, produce.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
So, given that keywords are not very effective and that they still remain as the dominant culling tool in eDiscovery, how do we make them function better? There are four keys to developing better functioning keywords. Those keys are: visibility, dictionary access, iteration/testing and keyword expansion via concept analytics.
Too often search terms are created in a vacuum with no insight into what words or phrases actually exist in the dataset. In the past when processing rates exceeded $2000 per gigabyte terms were used to reduce collections prior to processing thereby saving vast amounts of money. Today the market has changed and processing rates are at historic lows. Searching in the review tool affords greater visibility into search results. Case teams have immediate access to search hits and can quickly determine which terms are successful and which terms need refinement. Visibility allows you to see keywords in context rather than as a number on a report.
All keyword indexes are built off a dictionary or list of terms in a particular database. Some review tools expose this dictionary to end users so that you can look up the number of instances and documents that contain any specific term. This is a powerful tool because before negotiating terms with the opposition, you can get an understanding of what terms exist. Dictionaries also allow you the ability to submit fuzzy searches to search terms with variable spellings. Fuzzy searching is a good strategy for commonly misspelled last names. For databases with a high volume of OCR, it may not be enough to simply search for the name, “Johnson.” A dtSearch dictionary will tell you how many variations of the word “Johnson” exist, while allowing you to eliminate legitimate variations such as “Johanson” and “Johnston.”
Although keyword expansion requires that addition of an analytics engine, it can be a powerful tool for creating search terms lists to run against the dtSearch index. When you submit a keyword to the expansion engine, the analytics index will identify all the terms that are “conceptually” related to your term. By conceptually related we don’t mean synonyms, but terms that share conceptual meaning within your specific dataset. For example, an R&D team may be working on a project code named “falcon” but the product name might be GL4800. The case team probably knows about GL4800, but may not know to also search for “falcon.” Keyword expansion will help you find those conceptually similar terms.
Finally, it is important to test your terms and to run multiple iterations of terms to ensure that your recall is what you might expect. Too often, we run terms only to have case teams want to refine them after review has started because of the large number of false positives. Testing and sampling terms is an iterative process. Most review tools allow you to sample search hits, allowing you to confirm the results of your search terms before finalizing and building a review. Building this practice into your future reviews may require a little time at the outset, but will save you time in the end.
While search terms remain the primary means by which case teams cull and identify documents to review for litigation, the fact is that they are terribly unreliable. That being said, the use of keywords as a review strategy is not going away any time soon. The four keys noted here will help to improve keyword recall and make sure that you are making the most of your terms. The success of your search terms is predicated upon the time you put into development and testing before you finalize them.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted January 18, 2018
5 Expert Predictions for the eDiscovery Industry in 2018
Posted January 17, 2018
Get Your Passport to GDPR Success - LegalTech New York 2018
Posted January 11, 2018
Is Your Organization Vulnerable to a Cyber Attack? 3 Steps to Put Your Mind at Ease
Posted January 04, 2018
How the EU and China Plan to Deal with Multinational Data
Posted December 28, 2017
How to Navigate International Data Privacy Laws for eDiscovery
Posted December 21, 2017
Cross-Border eDiscovery: An Introduction to Cultural and Legal Obstacles
Posted December 14, 2017
Webinar Q&A Featuring Panelists from Special Counsel and Brainspace
Posted November 30, 2017
Help Your Employees Find the Information They Need with Machine Learning
Posted November 22, 2017
How to Use Managed and Prioritized Workflows to Reduce the Cost of Review [On-Demand Webinar]
Posted November 16, 2017
5 Workflow Tips for Conducting a Foreign Language Review