Posted on: September 01, 2015in Blog
The Three Groups of Discovery Analytics and When to Apply Them
This series defines predictive, structured, and conceptual analytics, and will explain how to efficiently apply each class during discovery.
The use of data analytics is everywhere and used by all of us nearly every day. Sometimes we are aware we are using analytics, sometimes we are not – that’s part of the point. We as humans need tools and systems to synthesize the ever growing amount of data so we can make more informed decisions, quicker and with more confidence. Data analytics enables those systems.
Webster’s Dictionary defines data analytics as --“a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making.”
The primary goal of analytics is to turn data into useful information. Analytics has multiple approaches, encompassing diverse techniques under a variety of names, in multiple areas of business, science, and social science domains, including Discovery. It is everywhere and not going away anytime soon.
Join Special Counsel, Brainspace and D4 at ILTACON for breakfast and an informative roundtable on Aug. 16th to learn how you can leverage the right workflow and technology in conjunction with one another to reduce the overall time and cost of review.
Electronically Stored Information (ESI) is doubling every two years. Analytics is the primary means by which legal professionals can streamline the discovery process to make it affordable, reasonable and proportional for all parties involved.
3 Primary Classes of Analytics for Discovery
Analytics for discovery can be divided into three primary classes: Predictive, Structured, and Conceptual. This series will dive into each class and explain how to efficiently apply them in discovery workflows.
Predictive Analytics (also called “Predictive Coding” or “Technology Assisted Review”) is a workflow that requires a subject matter expert to review a small subset of documents in order to train the system on what the human is looking for until the system can statistically “predict” how the human would code the rest of the collection. Once that is complete, the legal team can make informed decisions on how best to approach the collection for review and determine the total cost implications. Predictive Analytics is based on Machine Learning concepts from the Computer Science word and is a form of first generation Artificial Intelligence.
Predictive Analytics has taken longer to develop and is the newest class of Analytics for the use in the Legal and Compliance world. Keyword searching — in which a pre-determined set of words are run against all of the documents, and only those documents that “hit” on one or more key words are reviewed — was a breakthrough technique in the early days of Discovery (circa early 80’s). Keyword search was advanced through methods such as Boolean searches, which allowed the combination of a set of “but not” terms to disambiguate over-inclusive keywords. For example, a search string would state, “Include (keyword1 OR keyword2 OR keyword3, etc.), but not (excludeword1 OR excludeword2 OR excludeword3, etc.).” These combinations (or “ontologies”) often included proximity limiters (i.e., “Abraham” within/2 words of “Lincoln”).
Studies have shown that ontologies can improve retrieval recall to a range of 50 to 65 percent. However, given the syntax rules that govern ontologies and the ambiguity built into languages, counsel had to engage highly skilled linguists to develop productive combinations that they hope would cover all the bases. This was expensive and time consuming. The application of predictive analytics changes all of that by automating the generation of keywords, concepts and attribute content behind the scenes using Artificial Intelligence found in machine learning technology.
Structured analytics deals with textual similarity and is based on syntactic approaches that utilize character organization in the data as the foundation for the analysis. The goal is to provide better group identification and sorting. One primary example of structured analytics for eDiscovery is Email Thread detection where analytics organizes the various email messages between multiple people into one conversation. Another primary example is Near Duplicate detection where analytics identifies documents with like text that can be then used for various useful workflows. Other examples include Language identification and Repeated Content detection.
Structured Analytics has been around since the early 90’s with Equivio leading the way with their unique Email Thread and Near Dupe Detection functionality. The ability to detect and pull together email conversations as well as to identify the single email containing all the previous email components (a.k.a the “inclusive” email) has reduced document review from 20-40% since the beginning. Today, Structured Analytics are provided by many software providers such as Content Analyst (embedded in kCura’s Relativity and Ipro’s eclipse), and Microsoft/Equivio Zoom.
Conceptual analytics takes a semantic approach to explore the conceptual content, or meaning of the content within the data. Approaches such as Clustering, Categorization, Conceptual Search, Keyword Expansion, Themes & Ideas, Intelligent Folders, etc. are dependent on technology that builds and then applies a conceptual index of the data for analysis.
Conceptual analytics traces back to US Government funding projects to analyze international news and intelligence feeds back in the mid 1980’s. The technical underpinnings for those projects were unclassified in order to commercialize and brought into the non-government markets by Engenium (purchased by Kroll in 2006) and Content Analyst being among the first to offer conceptual search to the legal industry. Content Analyst is the technology that powers Relativity’s and Eclipse Conceptual functions (Conceptual Search, Keyword Expansion, Categorization, Clustering, etc.).
As stated before, analytics is everywhere and used for many different functions throughout various industries. Turning ESI into useful information through the use of analytics allows individuals to make informative decisions that are reinforced through gathered data. Throughout the next few weeks, we will be providing information about the different functions that each class of analytics play within the Discovery realm. The focus of our next post will be on Predictive Analytics as a way to significantly lower costs, reduce review time and substantially increase quality for document review.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted August 10, 2017
Webinar Q&A Featuring Panelists from Office 365 and X1
Posted August 02, 2017
PREX17 | 6th Annual Conference on Preservation Excellence
Posted August 02, 2017
ILTACON 2017 | D4 Booth #238 and Executive Roundtables
Posted July 28, 2017
Far East Review: Experts Weigh In on China & Japan's Growing eDiscovery Markets
Posted July 26, 2017
Office 365 Feature Comparisons to Consider Before You Choose a License
Posted July 13, 2017
How to Use Office 365 and X1 Discovery to Achieve Your Team's eDiscovery Goals [Webinar]
Posted July 12, 2017
Microsoft Office 365 is Disrupting the eDiscovery Industry in a Major and Permanent Fashion
Posted July 06, 2017
China's Cybersecurity Strategy: 5 Updates You Need to Know
Posted July 05, 2017
3 Workflows to Enhance Your Document Review Process
Posted June 28, 2017
Should you be using TAR? Judge Peck recommends you do