Using Analytics to Clean Out the ESI Garage
by Robert D. Brownstone and Gabriela P. Baron
As time passes and we acquire more "stuff," it gets harder to winnow down our possessions. Who longs to spend the weekend cleaning the garage? It's easier to keep piling up things up, with no discipline for storage or removal. Thus, many a garage keeps getting fuller and more chaotic until two distinct problems emerge: difficulty finding a particular object and increased risk of hidden hazards.
The same issues pervade the electronic information management (EIM) environment. The overwhelming volume of data generated daily leads to a similar approach to electronically stored information (ESI) for organizations of all shapes and sizes. Especially as companies grow, they are lulled into the sense that it is easier and better to focus on the urgent matters at hand and let the emails, electronic files and database contents keep stacking up.
But there are many risks inherent in saving all ESI forever. Potentially harmful content resides all over the place, whether it's a "smoking gun" message, or something written and kept so long that it becomes susceptible to misinterpretation when taken out of context years later.
Within an organization that has a save-everything policy, there likely are redundant copies of information, resulting in sourcing and paying for extra storage space. These costs are multiplied by the "rule of three," by which all live data is backed up in at least two places. Moreover, the search for particular information becomes a near impossible and expensive chore. Additionally, more personally identifiable information (PII) and sensitive confidential data (i.e., intellectual property and trade secrets) stored at more locations means big risks.
The ESI garage model is the information governance (JG) strategy upon which organizations have traditionally relied. Even when an IT department is tasked with the responsibility of managing the data, this strategy falls flat. The primary reason: IT focuses on what it does best, maintaining access to data rather than extracting the most value from data.
The notion of IG is vague, and no panacea but organizations need to start somewhere. An IG initiative should entail the use of advanced analytics and intelligent automated assessments of big data sets to cull out irrelevant data, keep relevant data, and identify PII, intellectual property and other sensitive data that must be kept and segmented in order to ensure data security and privacy.
Savvy C-suites are adopting sound IG policies to not only promote efficiency when locating information, but to facilitate greater compliance with electronic discovery, data security and privacy legal compliance. IG can help contend, for example, with thorny international legal issues in cross-border data transfers day-to-day, as well as in e-discovery. Furthermore, as corporations look toward the next generation of technology and archiving systems, a solid JG program make moving data from one system to another and retrieving it easier.
Companies implementing effective JG will also benefit from enhanced visibility of corporate data, enabling the use of more in-depth analytics and the discovery of valuable insights and trends to maximize the value of retained data. If data is a crystallization of a moment in time, then IG is the storyteller, piecing together facts and information into a narrative.
Even more significant, IG enables multiple cost savings. In proactive mode, JG-savvy organizations experience lower storage costs for live and backed-up data. In reactive mode - for example, addressing a lawsuit - they will see reduced e-discovery costs. Indeed, because IG and e-discovery have parallel workflows (finding relevant data is always the first step), JG-strong corporations will be in a stronger litigation posture.
WHERE TO BEGIN
Embarking on an JG program is daunting for any organization. In a packed garage, one would start by manually reviewing and organizing what's been tucked away, shelf by shelf, until the space is neat and tidy. With ESI, the concept is similar. Cull through the data in discrete chunks until all of it has been reviewed, and a system is in place for future storage. By tackling small portions at a time, organizations will see results and a return on investment.
A careful, considered approach is key when starting to parse organizational data via this "data remediation" process. As a first step, Legal and Compliance should ensure the organization's IG policies and procedures are sound. Some organizations may need to start by designing and implementing a corporate governance framework, while others will need to update their existing records retention policies and procedures.
This first step is critical from a risk and compliance standpoint because it can guard against future spoliation allegations. The organization's data deletion project must be defensible, meaning it has memorialized reasons for the data destruction, covering what, why, how, by whom and when the ESI was destroyed.
Defensible deletion involves careful consideration of what ESI the organization intends to exercise its discretion to retain or purge, bearing in mind the nuances and contents of ESL Different file types are used for different job functions. In addition, Legal should ensure retention of ESI that may be subject to a litigation hold or relevant to issues in litigation or government inquiries.
Once the process is clearly defined and memorialized, there are two approaches for data remediation. The first approach is akin to damming a stream. With this approach, the organization must adopt a disciplined plan for newly generated data and information. The second approach is akin to cleaning a swamp. With that approach, companies must cull through existing data troves and purge the excess.
Interestingly, some organizations find the latter approach the easier to implement because most already have at least some applicable e-discovery tools in place. These work to automatically classify ESI using specified criteria, such as date and keywords.
THE RIGHT TOOLS
Using the right tools is essential for maximizing efficiency and cost-effectiveness. Some e-discovery analytics can be applied to IG simply by being deployed upstream in the process. Those analytical tools, usually used for making sense of large data sets in incident-response scenarios, include:
• De-duplication: identifying exact copies or similar versions of documents and messages.
• Concept analysis: clustering of e-documents, messages, etc. under substantive topics chosen/created by software.
• Email redundancy: separating last message from each string.
• Relationship analysis: graphically depicting who knows/communicates with whom.
Another key e-discovery analytical tool is artificial intelligence-based, technology-assisted review, often called "predictive coding," which uses statistical modeling and machine learning. The technology underpinning predictive coding software functions like spam filters and targeted advertising. Predictive coding leverages machine learning and human review of samples in an iterative process, until the team is comfortable with the system's decision-making.
In e-discovery, that person-plus-machine process parses relevant from irrelevant documents. In IG, that same process can parse to-be-retained from to-be-deleted documents.
Lawyers and records managers should stay abreast of ESI technologies. Pertinent innovative technologies are evolving from the e-discovery and enterprise content management fields. Savvy e-discovery providers will incorporate ECM technology into their existing review and analysis tools to help organizations save money by tackling ESI for both IG and e-discovery.
DEPLOYMENT: STARTING THE CLEANUP
A data remediation program can begin anywhere the organization prefers. Tools can be deployed as part of a legacy data clean-up project, a litigation hold tracking system, a data loss prevention initiative, a big data analytics project or an enterprise-wide archiving migration plan.
Many organizations prefer to start by tackling unstructured data (i.e., email or instant messaging), because it is riskier than structured data (i.e. database-stored). Individuals often feel freer to express themselves in informal, unstructured environments, and unstructured ESI is more difficult to parse than already automatically-classified information.
No matter where the process begins, cull through the ESI first, then move data to new locations after remediation. Before you get to the details of deployment, vet any e-discovery or ECM platform for sufficient scalability to your IG initiative.
Ensure that IG becomes part of the corporate culture. Employees need to be aware of the corporation's records retention and information-management policies just as they are mindful of corporate expectations regarding HR practices, regulatory compliance or confidentiality requirements. Like violations in those areas, amassing large ESI volumes companywide can have a very high ultimate price.
Training on IG should teach managers and staff to rethink how they use data, so that they keep only what is required or needed, and no more. Individuals should be guided by the Legal and Compliance specialists as well as e-discovery specialists conversant in defensible deletion. Training contemporaneous with regime change also provides an opportunity to emphasize the importance of litigation holds.
Once IG becomes embedded in the fabric of corporate culture, organizations will reap the rewards from a cost-savings, risk-mitigation and business-value perspective. While cleaning up decades of ESI is daunting, it only becomes more so as more data is stuffed into the company storage bin. The time to start the clean-up is now.
Robert D. Brownstone is Technology and E-discovery Counsel, Litigation, and co-chair of the Electronic Information Management group at Silicon-Valley headquartered Fenwick & West LLP. He advises clients on a wide range of legal and IT issues. He has also taught e-discovery law and process as adjunct professor at a number of universities, and in 2015 will teach the course at the Brooklyn and University of San Francisco schools of law.
Gabriela P. Baron is the Senior Vice President of Xerox Litigation Services (XLS). She has assisted clients with regulatory investigations, major class actions, employment matters and commercial cases filed in federal and state courts.
Today’s General Counsel, Nov 2014, 22.