Posted on: September 01, 2015in Blog
The Three Groups of Discovery Analytics and When to Apply Them
This series defines predictive, structured, and conceptual analytics, and will explain how to efficiently apply each class during discovery.
The use of data analytics is everywhere and used by all of us nearly every day. Sometimes we are aware we are using analytics, sometimes we are not – that’s part of the point. We as humans need tools and systems to synthesize the ever growing amount of data so we can make more informed decisions, quicker and with more confidence. Data analytics enables those systems.
Webster’s Dictionary defines data analytics as --“a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making.”
The primary goal of analytics is to turn data into useful information. Analytics has multiple approaches, encompassing diverse techniques under a variety of names, in multiple areas of business, science, and social science domains, including Discovery. It is everywhere and not going away anytime soon.
Electronically Stored Information (ESI) is doubling every two years. Analytics is the primary means by which legal professionals can streamline the discovery process to make it affordable, reasonable and proportional for all parties involved.
3 Primary Classes of Analytics for Discovery
Analytics for discovery can be divided into three primary classes: Predictive, Structured, and Conceptual. This series will dive into each class and explain how to efficiently apply them in discovery workflows.
Predictive Analytics (also called “Predictive Coding” or “Technology Assisted Review”) is a workflow that requires a subject matter expert to review a small subset of documents in order to train the system on what the human is looking for until the system can statistically “predict” how the human would code the rest of the collection. Once that is complete, the legal team can make informed decisions on how best to approach the collection for review and determine the total cost implications. Predictive Analytics is based on Machine Learning concepts from the Computer Science word and is a form of first generation Artificial Intelligence.
Predictive Analytics has taken longer to develop and is the newest class of Analytics for the use in the Legal and Compliance world. Keyword searching — in which a pre-determined set of words are run against all of the documents, and only those documents that “hit” on one or more key words are reviewed — was a breakthrough technique in the early days of Discovery (circa early 80’s). Keyword search was advanced through methods such as Boolean searches, which allowed the combination of a set of “but not” terms to disambiguate over-inclusive keywords. For example, a search string would state, “Include (keyword1 OR keyword2 OR keyword3, etc.), but not (excludeword1 OR excludeword2 OR excludeword3, etc.).” These combinations (or “ontologies”) often included proximity limiters (i.e., “Abraham” within/2 words of “Lincoln”).
Studies have shown that ontologies can improve retrieval recall to a range of 50 to 65 percent. However, given the syntax rules that govern ontologies and the ambiguity built into languages, counsel had to engage highly skilled linguists to develop productive combinations that they hope would cover all the bases. This was expensive and time consuming. The application of predictive analytics changes all of that by automating the generation of keywords, concepts and attribute content behind the scenes using Artificial Intelligence found in machine learning technology.
Structured analytics deals with textual similarity and is based on syntactic approaches that utilize character organization in the data as the foundation for the analysis. The goal is to provide better group identification and sorting. One primary example of structured analytics for eDiscovery is Email Thread detection where analytics organizes the various email messages between multiple people into one conversation. Another primary example is Near Duplicate detection where analytics identifies documents with like text that can be then used for various useful workflows. Other examples include Language identification and Repeated Content detection.
Structured Analytics has been around since the early 90’s with Equivio leading the way with their unique Email Thread and Near Dupe Detection functionality. The ability to detect and pull together email conversations as well as to identify the single email containing all the previous email components (a.k.a the “inclusive” email) has reduced document review from 20-40% since the beginning. Today, Structured Analytics are provided by many software providers such as Content Analyst (embedded in kCura’s Relativity and Ipro’s eclipse), and Microsoft/Equivio Zoom.
Conceptual analytics takes a semantic approach to explore the conceptual content, or meaning of the content within the data. Approaches such as Clustering, Categorization, Conceptual Search, Keyword Expansion, Themes & Ideas, Intelligent Folders, etc. are dependent on technology that builds and then applies a conceptual index of the data for analysis.
Conceptual analytics traces back to US Government funding projects to analyze international news and intelligence feeds back in the mid 1980’s. The technical underpinnings for those projects were unclassified in order to commercialize and brought into the non-government markets by Engenium (purchased by Kroll in 2006) and Content Analyst being among the first to offer conceptual search to the legal industry. Content Analyst is the technology that powers Relativity’s and Eclipse Conceptual functions (Conceptual Search, Keyword Expansion, Categorization, Clustering, etc.).
As stated before, analytics is everywhere and used for many different functions throughout various industries. Turning ESI into useful information through the use of analytics allows individuals to make informative decisions that are reinforced through gathered data. Throughout the next few weeks, we will be providing information about the different functions that each class of analytics play within the Discovery realm. The focus of our next post will be on Predictive Analytics as a way to significantly lower costs, reduce review time and substantially increase quality for document review.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted September 22, 2016
The Hybrid Approach to an eDiscovery Managed Services Model
Posted September 15, 2016
5 Document Management Best Practices for Beginners
Posted September 13, 2016
Innoxcell Annual Symposium 2016 | Shanghai Series
Posted September 08, 2016
Maintaining a Great Wall of Data Control in eDiscovery
Posted September 08, 2016
PREX16 | 5th Annual Conference on Preservation Excellence
Posted September 01, 2016
Uncovering Enterprise Vault Stub Files and Their Missing Attachments
Posted August 25, 2016
How to Use the eDiscovery PST Export Tool in Office 365
Posted August 18, 2016
4 Hidden Costs of Purchasing a Legacy Document Review Tool
Posted August 11, 2016
Pokémon-Go Spawns eDiscovery of Augmented Reality
Posted August 10, 2016
Where in the World is D4? - ILTACON 2016