E-Discovery in litigation today presents a number of challenges in creating a defensible, efficient, and iterative search protocol. A defensible keyword search protocol should contain, at a minimum, the following ten strategic steps:

1. Define the data you are looking for and determine where it is located.

It’s important to first define and identify the potentially relevant documents that will be needed for a request of production (RFP). However, defining the universe of required documents is not necessarily an easy task. The attorney should know which electronic devices may contain the data, such as network servers, computer workstations, laptops, cellphones, etc., as well as the custodians of the data, retention policies, and record keeping practices. To ensure compliance and efficiency in RFP and to reduce e-discovery costs, maintain an electronically stored information (ESI) “Data Map” that identifies and details the flow of data and how it can be retrieved.

2. De-duplicate & filter.

Simply put, de-duplication replaces duplicate data on a disk with references to a shared copy. When duplicate data is detected, the instance is referenced back to the saved shared copy. Thus, only one copy of similar documents is stored. The search will therefore be faster and more cost efficient.
Filter out any unnecessary file extensions. For example, exclude sound files, design files, and any unresponsive system files. Exclude custodians not relevant to the case, time periods that are outside the scope of the RFP, and identify any other parameters that will help reduce the volume of data to be searched.

3. Understand the limitations of technology.

Know what e-discovery technology tools are capable of (and what they are not). Understand how fast the tools work; how data is captured and indexed; whether embedded data can be searched; whether the tools have the ability to perform searches across metadata or the ability to search important file formats; and any other essential functions in the overall discovery process. Determine whether the processes are understandable and defensible in court.
Keep in mind that searching tools cannot search image format files such as faxes or pdfs/tiffs that contain no detectable textual content and have not been previously converted for electronic search or storage (OCR’d). These documents must be identified and handled separately. Wherever possible, render those documents searchable. Maintenance, licensing issues, available resources and capabilities are also important factors to consider.

4. Consult all relevant persons.

To ensure the most relevant keywords for the search are utilized, all data custodians and any key players in the possession of potentially relevant information should be consulted. These persons are most likely to help create a keyword list that will yield the most relevant results. Also, in the absence of a properly configured “data mapping,” these persons can help identify the various devices on which ESI resides.

5. Collaborate with the other side.

Courts not only expect to see collaboration with the other side, they welcome it. Be proactive and discuss keywords you are considering with your adversary in an effort to reach mutually acceptable keywords and search methodology. Doing so will save you time and help avoid arguments about irrelevant searches. The collaborative process might also help you identify search terms that you haven’t previously considered.

6. Address synonymy, misspellings, word variations, and ambiguity.

Looking for words with identical or similar meanings, common misspellings, and word variations are helpful tools for finding specific documents. The human language is full of ambiguity and variations. Make use of available tools, such as the website www.dumbtionary.com, which can be used to find the most common misspellings for a given word. Similarly, www.synonymy.com and www.wordhippo.com can be used to find words that are synonymous. Be aware that instant messaging and text messages often contain slang known as “txt-speak.” The key players involved can help identify some of this language commonly used in their environment.

7. Utilize statistical sampling.

The Sedona Conference expressed the position that the document review process is well suited to the application of statistical sampling to improve quality and reduce costs. In the case of search terms, it’s desirable to run them against a statistical sample of your data set and the custodians that are most representative of that sample. In utilizing statistical sampling, however, use care in the actual selection methodology used, especially if the ESI collection is incomplete. Sampling cost considerations should be evaluated against the costs of more extensive document review due to inefficient keyword selection.

8. Evaluate hits.

Review the results to determine whether the number of relevant documents returned is satisfactory and identify any files that are not searchable, encrypted, etc. Eliminate any noise hits, and refine and tweak your keywords to increase the potentially responsive documents and eliminate non-responsive ones. Test, retest, and make refinements as you go along.

9. Quality Assurance: Review unresponsive documents.

Courts demand quality assurance on keyword searches to ensure all necessary steps have been taken and potentially relevant documents have not been missed. Review a representative sample of the documents deemed unresponsive by keyword searches to confirm their status. If potentially responsive documents are found, then the search methodology utilized must be revisited.

10. Document your search strategy.

A defensible search strategy and methodology should be adequately documented and choices justifiable. As you work through different steps of the search process, keep a log of your actions in as much detail as possible. This will help you convince the court that the search used the appropriate terms, the appropriate data sets, and produced the highest number of potentially responsive documents. A log will also enable you replicate your search process for verification, if necessary.
Finally, courts don’t require perfection in e-discovery, but instead look for a reasonable, reliable, and defensible approach. Taking time to craft a defensible keyword search protocol will help protect an attorney from possible sanctions and enable him or her to offer a satisfactory methodology for finding responsive documents for production.

Kyprianou is president of Axiana LLC in Morristown (www.axiana.com), which specializes in computer forensics and e-discovery. He is a certified examiner and a member of the International Society of Forensic Computer Examiners and the Association of Certified Fraud Examiners.