希望访问中文页面? 请点此(简体中文版)  

Posted on: February 24, 2015

in Blog

PDF Files Can Wreak Havoc in ESI Processing, Review and Production

This post explains how the PDF format can complicate document review and production workflows, and three key things to keep in mind when using this format.

PDF Files Can Wreak Havoc in ESI Processing, Review and Production

While it may be easy to create PDF files for day-to-day business use, when it comes to document review and production deliverables, the PDF format complicates things.

3 Things to Consider When Using PDF Format in Review and Production Deliverables

Here are three important considerations to remember when including PDF files in data deliverables:

1. PDF files are considered native files, not image files, by nearly all review applications.

Simply put, in order to get PDF files into a review tool, some form of processing/conversion must occur unless you don’t mind reviewing in Adobe Acrobat directly, or in an application with Adobe Acrobat-like functionality. The PDF leads to inefficient workflows and higher review costs.

2. Bad Document Breaks / Boundaries

You can do a lot with PDF files: merge multiple PDFs into a single document; create PDF portfolio documents from multiple sources; create e-books with hyperlinks to external files; convert electronic files, such as Microsoft Office documents, to PDFs “on the fly.” During document processing, review, and production, however, the PDF format can be a nightmare.

How many of us have been on the receiving end of a single, 1,000 page, 100 MB PDF file, which actually turns out to be 100-200 individual documents combined into one giant file? If the file itself doesn’t generate errors, then how do you review something that large? Some documents will be responsive, some will be privileged, but you cannot code them separately.

D4 often gets requests to have Logical Document Determination (LDD) performed on PDF files after they have been loaded into a review platform. Bad document breaks in PDF files are causing a huge uptick in LDD, which in turn leads to multiple processing requests, numerous and inefficient reviews, project delays, and additional costs, which are often unnecessary. When agreeing to form of production, go with the industry—and review platform—standard: single-page, Group IV TIFF images with load files instead of the PDF. This will significantly streamline the review and/or production process.

3. It is difficult to work with large files.

What can be done with a single, 1,000 page, 100 MB PDF file? Not a lot, and here’s why:

  • Large files are difficult to email. Most email systems have file size restrictions in place and emails will get kicked back to the sender. The maximum file size is typically 10 MB.
  • Large files usually crash native file viewers in review platforms. This leads end-users to mistakenly believe something is wrong with the review application, resulting in time wasted troubleshooting “technical/database” issues.
  • Large files usually take a long time to upload and download. Data deliverables and productions can be delayed, leading to missed deadlines.
  • Large files sometimes crash computer systems. Applications typically get “hung up” because they cannot handle the large file sizes. Many times this causes project and production delays as well as lost work product.

What is the solution for large PDF files? In my experience at D4, if customers are unable to request new deliverables from their client, or the producing party, they almost always choose to convert the PDFs to single-page, Group IV TIFF images for their in-house, or hosted review platform.

Processing and Review Platforms Handle these Types of PDF Formats Differently

Here are a few examples of PDF formats that, if handled improperly, have been known to wreak havoc on processing, document review and production, leading to inefficient workflows and higher review costs.

Pay close attention to PDF files that are/contain:

  • Text-searchable, with poor OCR text
  • Image-only PDF files (which cannot be easily identified from searchable PDF files)
  • Secured/password protected
  • Saved/created with a variety of different compressions that impact processing (especially PDF files created from scans on in-house copiers)
  • Portfolio files
  • Hyperlinks and bookmarks to external content
  • Animations
  • Embedded files
  • Embedded fonts/text

There is a common misconception in the industry: many people treat PDF files the same way they do TIFF images. However, there is one key difference. A PDF is considered a native file, and a TIFF image is considered an image, which is why review platforms handle them differently and offer different functionality for each format.

Keeping that in mind can help save your team and clients many work hours and unnecessary costs.

Discover More:

Discover More Categories

D4 Weekly eDiscovery Outlook

Power your eDiscovery intellect with our weekly newsletter.