dfm covers

Nuix and EDRM republish Enron data set cleansed of more than 10,000 items containing private, health and financial information

Print PDF
Wednesday, 15 May 2013 19:37 Written by DFM News

Nuix, a worldwide provider of information management technologies, and EDRM, the leading standards organisation for the eDiscovery and information governance market, have today republished the EDRM Enron PST Data Set after cleansing it of private, health and personal financial information. Nuix and EDRM have also published the methodology Nuix’s staff used to identify and remove more than 10,000 high-risk items at nuix.com/enron.

The EDRM Enron data set is an industry-standard collection of email data that the legal profession has used for many years for electronic discovery training and testing. It was sourced from the Federal Energy Regulatory Commission’s investigation into collapsed energy firm Enron. In early 2012, the EDRM Enron PST Data Set and the EDRM Enron Data Set v2 became an Amazon Web Services Public Data Set, making them a valuable public resource for researchers across a variety of disciplines.

“Recently, we have been working closely with Nuix to cleanse the data set of private information about the company’s former employees and make the cleansed data set readily available to the community,” said George Socha and Tom Gelbmann, co-founders of EDRM. “These efforts help to protect the privacy of hundreds of individuals and we encourage anyone who finds private data that we did not remove to notify us.”

Using a series of investigative workflows on the EDRM Enron PST Data Set, Nuix consultants Matthew Westwood-Hill and Ady Cassidy identified more than 10,000 items including:

· 60 items containing credit card numbers, including departmental contact lists that each contained hundreds of individual credit cards
· 572 containing Social Security or other national identity numbers—thousands of individuals’ identity numbers in total
· 292 containing individuals’ dates of birth
· 532 containing information of a highly personal nature such as medical or legal matters.

Many items contained multiple instances and types of information. This included departmental contact list spreadsheets with dates of birth, credit card numbers, Social Security numbers, home addresses and other private details of dozens of staff members.

The investigative team also clearly demonstrated that these items did not stay within the Enron firewall. For example, some staff emailed “convenience copies” of documents containing private data to their personal addresses.

“Nuix and our partners have conducted sweeps for private and credit card data for dozens of corporate customers and we are yet to encounter a data set that did not include some inappropriately stored personal, financial or health information,” said Eddie Sheehy, CEO of Nuix. “The increasing burden of privacy and data breach regulations, combined with the strict requirements of credit card companies, make this an unacceptable business risk.”

“Using the methodology we are publishing alongside the cleansed EDRM Enron data, organisations can identify private and financial data, find out if it has been emailed outside the firewall and take immediate steps to remediate the risks involved.”

Nuix is currently applying the same methodology to the EDRM Enron Data Set v2, which it will also republish at nuix.com/enron.

Nuix will host a Twitter chat to discuss the release of the cleansed EDRM Enron PST Data Set on Thursday, May 23rd 7:00pm BST. Nuix experts will describe the process of identifying unsecured financial, health and personally identifiable information in corporate data. Follow the hashtag #NuixChat and send in your questions beforehand to @nuix

Please make cache directory writable.

Submit an Article

Call for Articles

We are keen to publish new articles from all aspects of digital forensics. Click to contact us with your completed article or article ideas.

Featured Book

Learning iOS Forensics

A practical hands-on guide to acquire and analyse iOS devices with the latest forensic techniques and tools.

Meet the Authors

Angus Marshall

Angus Marshall is an independent digital forensic practitioner, author and researcher


Coming up in the Next issue of Digital Forensics Magazine

Coming up in Issue 42 on sale from February 2020:

Forensic Syntactical & Linguistic Investigation

Mark Iwazko presents a case study regarding a Forensic Syntactical & Linguistic investigation: Instructed by the Moscow General Council of one of the actual big four accountants. Read More »

Forensic Readiness: A Proactive Approach to Support Forensic Digital Analysis

An increasing number of criminal actions are inflicting financial and brand damage to organizations around the globe. An impressive number of such cases do not reach the courts, mainly because of the organization’s inefficiency to produce robust digital evidences that are acceptable in the courts of law. Read More »

Subscribe today

Using Error-Patterns for Attribution: An Applied Linguistics Technique

Corpus Linguistics within Second Language Acquisition has developed models of error patterns made by defined groups of second language learners. This knowledge base can be leveraged by a knowledgeable analyst to attribute content to a subset of authors. Read More »

Every Issue
Plus the usual Competition, Book Reviews, 360, IRQ, Legal

Click here to read more about the next issue