Posted on: October 07, 2015in Blog
How to Remove False Positives From Proximity Searches
This post explains how to reduce the number of false positives returned in your keyword search results by using these customizable dtSearch operators.
Counsel conducting discovery often need to remove false positives from their search hits through the use of exclusion criteria. This is typically the case when a search term hits on words contained in an email footer. The NOT W/ or the AND NOT dtSearch operators can be used to exclude most false positives. For example, if we are searching for the word privileged, we can use privileged NOT w/1 “This message is privileged and confidential.” However, other cases can be much more difficult.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
Consider the following hypothetical case. Suppose your client is a milk company and they want to search an employee’s email account for any correspondence regarding price skimming. A search for skim* W/15 price yields too many false positives that reference skim milk instead of price skimming. You might consider using (skim W/15 price) AND NOT milk, however, this construction is over-exclusive and would filter out documents that discuss both price skimming and skim milk. You might also be tempted to use the following invalid construction:
INVALID SEARCH: (skim* NOT W/3 milk) W/15 (price)
Though the above search may appear to return your intended search hits, a NOT W/ proximity operator should not be nested within an additional W/ proximity operator because this results in unclear search syntax and an inaccurate set of search hits. The above string may only return a partial list of hits instead of the full population of hits responsive to your discovery request. Additionally, there is a chance the above string will return a handful of documents that are completely outside the intended search criteria. A more accurate method of discovering our responsive documents will combine the following two search formulas:
CERTAIN HITS: (skim* W/15 price) AND NOT (skim* W/3 milk)
- AND -
POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price) AND (skim* W/3 milk)
The Certain Hits search string will return a subset of documents that meet our search criteria without returning any of the false positives that reference milk. The Potential Hits string will return additional documents meeting our search criteria as well as a limited number of false positives. In the first search above, we verify that skim* is near price and that milk is not near skim*. These hits meet our criteria and we can prioritize for review. In the second search above, we verify a document has the word skim* without milk nearby and we verify that an instance of skim* is within 15 words of price. However, if this document also contains the phrase skim* near milk somewhere in the document, then we cannot be sure which instance of skim is actually falling within 15 words of price. Only a careful review of each document in the Potential Hits group will determine whether or not it is a false positive. Alternatively, you may want to return all potential and certain hits in one search, and that string is below:
ALL CERTAIN AND POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price)
The above case is certainly an exception but highlights the need to intimately understand how different dtSearch operators function together and where problems can arise. As advanced as today’s search tools are, they do not always throw errors when search terms contain invalid syntax. If there’s any doubt how a search term is functioning, it’s always best to have an industry expert assist with formatting and testing search terms. Included below is a resource for your personal use, each string is substituted with generic placeholders that can be modified with terms specific to your search.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted July 13, 2017
How to Use Office 365 and X1 Discovery to Achieve Your Team's eDiscovery Goals [Webinar]
Posted July 12, 2017
Microsoft Office 365 is Disrupting the eDiscovery Industry in a Major and Permanent Fashion
Posted July 06, 2017
China's Cybersecurity Strategy: 5 Updates You Need to Know
Posted July 05, 2017
3 Workflows to Enhance Your Document Review Process
Posted June 28, 2017
Should you be using TAR? Judge Peck recommends you do
Posted June 21, 2017
Control Litigation Costs by Making the Most of Your Internal Capabilities
Posted June 15, 2017
Shanghai OSAC Quarterly Meeting
Posted June 15, 2017
5 Ways to Reduce eDiscovery Costs Before and During Litigation
Posted June 07, 2017
Defensible Deletion Strategy: Getting Rid of Your Unnecessary Data
Posted May 31, 2017
How Does the EU-US Privacy Shield Affect Cross-Border Discovery?