Using Analytics to Clean Out the ESI Garage
by Robert D.
Brownstone and Gabriela P. Baron
As time passes and we acquire more
"stuff," it gets harder to winnow down our possessions. Who longs to
spend the weekend cleaning the garage? It's easier to keep piling up things up,
with no discipline for storage or removal. Thus, many a garage keeps getting
fuller and more chaotic until two distinct problems emerge: difficulty finding
a particular object and increased risk of hidden hazards.
The same issues pervade the
electronic information management (EIM) environment. The overwhelming volume of
data generated daily leads to a similar approach to electronically stored
information (ESI) for organizations of all shapes and sizes. Especially as
companies grow, they are lulled into the sense that it is easier and better to
focus on the urgent matters at hand and let the emails, electronic files and
database contents keep stacking up.
But there are many risks inherent
in saving all ESI forever. Potentially harmful content resides all over the
place, whether it's a "smoking gun" message, or something written and
kept so long that it becomes susceptible to misinterpretation when taken out of
context years later.
Within an organization that has a
save-everything policy, there likely are redundant copies of information,
resulting in sourcing and paying for extra storage space. These costs are
multiplied by the "rule of three," by which all live data is backed
up in at least two places. Moreover, the search for particular information
becomes a near impossible and expensive chore. Additionally, more personally
identifiable information (PII) and sensitive confidential data (i.e.,
intellectual property and trade secrets) stored at more locations means big
risks.
The ESI garage model is the information
governance (JG) strategy upon which organizations have traditionally relied.
Even when an IT department is tasked with the responsibility of managing the
data, this strategy falls flat. The primary reason: IT focuses on what it does
best, maintaining access to data rather than extracting the most value from
data.
The notion of IG is vague, and no
panacea but organizations need to start somewhere. An IG initiative should
entail the use of advanced analytics and intelligent automated assessments of
big data sets to cull out irrelevant data, keep relevant data, and identify
PII, intellectual property and other sensitive data that must be kept and
segmented in order to ensure data security and privacy.
Savvy C-suites are adopting sound
IG policies to not only promote efficiency when locating information, but to
facilitate greater compliance with electronic discovery, data security and
privacy legal compliance. IG can help contend, for example, with thorny
international legal issues in cross-border data transfers day-to-day, as well
as in e-discovery. Furthermore, as corporations look toward the next generation
of technology and archiving systems, a solid JG program make moving data from
one system to another and retrieving it easier.
Companies implementing effective JG
will also benefit from enhanced visibility of corporate data, enabling the use
of more in-depth analytics and the discovery of valuable insights and trends to
maximize the value of retained data. If data is a crystallization of a moment
in time, then IG is the storyteller, piecing together facts and information
into a narrative.
Even more significant, IG enables
multiple cost savings. In proactive mode, JG-savvy organizations experience
lower storage costs for live and backed-up data. In reactive mode - for
example, addressing a lawsuit - they will see reduced e-discovery costs.
Indeed, because IG and e-discovery have parallel workflows (finding relevant
data is always the first step), JG-strong corporations will be in a stronger litigation
posture.
WHERE TO BEGIN
Embarking on an JG program is
daunting for any organization. In a packed garage, one would start by manually
reviewing and organizing what's been tucked away, shelf by shelf, until the
space is neat and tidy. With ESI, the concept is similar. Cull through the data
in discrete chunks until all of it has been reviewed, and a system is in place
for future storage. By tackling small portions at a time, organizations will
see results and a return on investment.
A careful, considered approach is
key when starting to parse organizational data via this "data
remediation" process. As a first step, Legal and Compliance should ensure
the organization's IG policies and procedures are sound. Some organizations may
need to start by designing and implementing a corporate governance framework,
while others will need to update their existing records retention policies and
procedures.
This first step is critical from a
risk and compliance standpoint because it can guard against future spoliation
allegations. The organization's data deletion project must be defensible,
meaning it has memorialized reasons for the data destruction, covering what,
why, how, by whom and when the ESI was destroyed.
Defensible deletion involves
careful consideration of what ESI the organization intends to exercise its
discretion to retain or purge, bearing in mind the nuances and contents of ESL
Different file types are used for different job functions. In addition, Legal
should ensure retention of ESI that may be subject to a litigation hold or
relevant to issues in litigation or government inquiries.
Once the process is clearly defined
and memorialized, there are two approaches for data remediation. The first
approach is akin to damming a stream. With this approach, the organization must
adopt a disciplined plan for newly generated data and information. The second
approach is akin to cleaning a swamp. With that approach, companies must cull
through existing data troves and purge the excess.
Interestingly, some organizations
find the latter approach the easier to implement because most already have at
least some applicable e-discovery tools in place. These work to automatically
classify ESI using specified criteria, such as date and keywords.
THE RIGHT TOOLS
Using the right tools is essential
for maximizing efficiency and cost-effectiveness. Some e-discovery analytics
can be applied to IG simply by being deployed upstream in the process. Those
analytical tools, usually used for making sense of large data sets in
incident-response scenarios, include:
• De-duplication: identifying exact copies or similar versions
of documents and messages.
• Concept analysis: clustering of e-documents, messages,
etc. under substantive topics chosen/created by software.
• Email redundancy: separating last message from each
string.
• Relationship analysis: graphically depicting who
knows/communicates with whom.
Another key e-discovery analytical
tool is artificial intelligence-based, technology-assisted review, often called
"predictive coding," which uses statistical modeling and machine
learning. The technology underpinning predictive coding software functions like
spam filters and targeted advertising. Predictive coding leverages machine
learning and human review of samples in an iterative process, until the team is
comfortable with the system's decision-making.
In e-discovery, that person-plus-machine
process parses relevant from irrelevant documents. In IG, that same process can
parse to-be-retained from to-be-deleted documents.
Lawyers and records managers should
stay abreast of ESI technologies. Pertinent innovative technologies are
evolving from the e-discovery and enterprise content management fields. Savvy
e-discovery providers will incorporate ECM technology into their existing
review and analysis tools to help organizations save money by tackling ESI for
both IG and e-discovery.
DEPLOYMENT: STARTING THE CLEANUP
A data remediation program can
begin anywhere the organization prefers. Tools can be deployed as part of a
legacy data clean-up project, a litigation hold tracking system, a data loss
prevention initiative, a big data analytics project or an enterprise-wide
archiving migration plan.
Many organizations prefer to start
by tackling unstructured data (i.e., email or instant messaging), because it is
riskier than structured data (i.e. database-stored). Individuals often feel
freer to express themselves in informal, unstructured environments, and
unstructured ESI is more difficult to parse than already automatically-classified
information.
No
matter where the process begins, cull through the ESI first, then move data to
new locations after remediation. Before you get to the details of deployment,
vet any e-discovery or ECM platform for sufficient scalability to your IG
initiative.
Ensure that IG becomes part of the
corporate culture. Employees need to be aware of the corporation's records retention
and information-management policies just as they are mindful of corporate
expectations regarding HR practices, regulatory compliance or confidentiality
requirements. Like violations in those areas, amassing large ESI volumes
companywide can have a very high ultimate price.
Training
on IG should teach managers and staff to rethink how they use data, so that
they keep only what is required or needed, and no more. Individuals should be
guided by the Legal and Compliance specialists as well as e-discovery
specialists conversant in defensible deletion. Training contemporaneous with
regime change also provides an opportunity to emphasize the importance of
litigation holds.
Once IG becomes embedded in the
fabric of corporate culture, organizations will reap the rewards from a
cost-savings, risk-mitigation and business-value perspective. While cleaning up
decades of ESI is daunting, it only becomes more so as more data is stuffed
into the company storage bin. The time to start the clean-up is now.
Authors
Robert D. Brownstone is Technology and E-discovery Counsel,
Litigation, and co-chair of the Electronic Information Management group at
Silicon-Valley headquartered Fenwick & West LLP. He advises clients on a
wide range of legal and IT issues. He has also taught e-discovery law and
process as adjunct professor at a number of universities, and in 2015 will
teach the course at the Brooklyn and University of San Francisco schools of
law.
rbrownstone@fenwick.com
Gabriela P. Baron is the Senior Vice President of Xerox
Litigation Services (XLS). She has assisted clients with regulatory
investigations, major class actions, employment matters and commercial cases
filed in federal and state courts.
Gabriela.Baron@xls.xerox.com
Today’s General Counsel, Nov 2014, 22.