Finding the needle in the haystack: Startup simplifies legal eDiscovery

- February 24, 2014 5 MIN READ

Searching through millions of corporate documents – online and offline – just to find the few hundred that are relevant to a particular legal issue, is as painful as you’d imagine – like trying to find a needle in the haystack. Knowing the frustration all too well, lawyer, engineer and entrepreneur, Lachlan James, decided to come up with a better solution.

Launching in private beta next month, HaystackHQ aims to simplify the legal eDiscovery process. Relying primarily on large-scale visual analytics, the startup provides a browser-based eDiscovery solution to corporate litigators.

The inspiration

James is all too familiar with the process of trying to find the ‘needle in the haystack’. He was working as a corporate litigator in the late 1990s, and trying to dig up data relevant to a lawsuit was a painful and tedious process. And back then everything was paper-based.

He believes the problem is much worse now that we live in a digital environment: “Everything’s in emails, Word Docs and PDFs … What once involved tens of thousands of documents, now routinely involves hundreds of thousands, millions, or even hundreds of millions of documents. There had to be a better way.”

James is also an engineer, and it was this skillset that helped him built HaystackHQ. In fact, it was when he was working as an engineering intern for a large multinational company, that he built his first search engine to help people find books, articles, and more, within the company’s internal library. So he has always been drawn towards technology that simplifies eDiscovery.

Add to this diverse skillset, computers graphics and interactive information visualisations. After seven years of working in a venture capital firm, James decided to jump on the other side of the desk in 2008 and pursue startup opportunities in visual analytics.

“I had some ideas around visually representing complex and unstructured data like emails, wiki entries, and search results. But how did I know that anyone would ‘get it’ or find it useful?  I built an initial web-based version in early 2008 using Yahoo’s BOSS (Build your Own Search Service) search engine API to enable users to visualise and explore large numbers of search results,” says James.

“I created a demo video and put it on YouTube.  I had some great, initial traction.  But after a terrific initial spike, user numbers dropped off significantly.  The feedback: yes, people ‘got’ the visualisation; but, no, people did not find utility in it … for web searching anyway.”

During this time, the world was descending into The Great Recession (or Global Financial Crisis, as we call it in Australia). Given the decline in his numbers, raising funds was going to be a difficult task. So to get by, he did some strategy consulting with several Sydney-based businesses – including, the Mexican fast eatery, Guzman y Gomez.

It wasn’t until early 2013 that James reluctantly returned to his original idea of simplifying eDiscovery, targeting the legal sector.  He was reluctant because law firms get paid by the billable hour; the longer it takes, the more they get paid.  This means that they have no real incentive to be more efficient.

“Why would they want a tool that helps them be more efficient?  I met with corporate litigators and litigation support managers at law firms.  They immediately ‘got it’ and could see the utility.  And they wanted to be more efficient – the GFC had brought about a change in attitude in client corporations … Law firms were coming out of the dark ages.”

“Not only that, litigators could see that HaystackHQ’s visual analytics could also helped them communicate to clients the nature, scope and quality of their work, that they’d done a good job, that they were unlikely to have missed anything.  This was a turning point!”

What makes HaystackHQ disruptive?

James says that a majority of legal or eDiscovery review tools adhere rigidly to text-based representation, and all are complicated “with huge amounts of feature bloat that almost no-one ever uses.”

Basically, legal reviewers receive a mass of documents. They have to use eDiscovery software tools to index the documents, and then search through the documents, using a combination of keywords, time-slicing, and more. Similar to Factiva, users are returned with a long list of search results, which they have to wade through, reading each document in the list, and having to code them as being ‘relevant’ or ‘irrelevant’.

“It takes days and weeks to perform, often with teams of reviewers – including paralegals and junior lawyers. It’s mind numbing.  And assuming that – after reviewing 1000s of documents -you’re able to keep track of what you’re supposed to be looking for, what’s relevant, you’ll adjust your searches and follow a new line-of-enquiry to try to find what you’re looking for. It’s all pretty hit-n-miss!” says James.

HaystackHQ, on the other hand, takes those documents, indexes them, and learns the relationships and similarities between all the documents. James says it creates an interactive, colour-coded, 2D map, clustering up to millions of similar documents together. In response to a search, it also dynamically creates a timeline showing spikes in, for instance, email activity.

“Together with the ‘traditional’ list of results, HaystackHQ presents a very dynamic, visually interactive user interface, enabling reviewers to quickly isolate which parts of the document map and/or timeline are most relevant or “responsive” to the particular legal matter,” says James.

“The net result is to deliver (at least) two benefits: reduce the time taken to do eDiscovery; and increase the confidence in reviewers (and their clients) that they are likely to have found most – if not all – responsive or relevant documents, increasing the quality of eDiscovery.”

He adds that HaystackHQ also acts as a quality control or early warning system. Similar documents are located close to one-another on the map. If they are ‘coded’ differently – some relevant, others irrelevant – the reviewer will see the potential inconsistencies on the map immediately, seeing black dots (irrelevant docs) next to red dots (relevant docs).

Development, business model and marketing

As an ex-VC, James knew the obvious deal-breakers, and decided there were a bunch of proof-points he needs to clarify before raising investment capital.

In 2013, $4-5 billion was spent globally on eDiscovery software and services. Of that, eDiscovery software tools represents a $1.5 billion market growing at over 15 percent per annum, according to James. However, there are no numbers to indicate the exact size of the niche legal eDiscovery market.

“What is missing from these market estimates is the significant and current ‘non-consumer’ market. It is difficult to estimate its size, but anecdotally it’s likely to be something close to the existing market size again,” he says.

After James conceived of the idea, he spent time talking to a wider audience of eDiscovery practitioners and vendors to learn more about the market. Based on older versions of the software, he built an initial prototype and went back to the market.

At this point, it was clear to him that the technology had potential. He brought in an ex-coworker Ricky Robinson, who had more experience in system software design and development than James – to help out, and they’ve been coding since.

It was not along after that they applied for and got accepted into the Startmate accelerator programme for 2014, and received an ACT Innovation Connect (ICON) grant.

James also recently attended LegalTechNY, along with 15,000 other people. Attending the largest law technology conference in the world, has given him a much better understanding of the market and its needs.

To gain traction, James plans on leveraging the advice and blogging power of key opinion leading advisors, as well as attending conferences where they’ll have direct access to their target customers. HaystackHQ will also be looking to implement well-trodden online marketing strategies via LinkedIn and Google.

At the moment, they’re still trying to figure out their business model. James says, HaystackHQ will come in at least two ‘flavours’: “The first is a subscription-based web/fully-hosted version, where clients submit their data to be hosted, indexed and analysed. HaystackHQ will also come in a ‘behind the firewall’ version where clients provide the hardware onto which we install our software solution.”

“Similarly, we may also provide a HaystackHQ appliance.  How we charge for the latter two we’re not entirely sure, but is likely to follow a similar model to the web-based version.”

Though they’re still in the early stages of the business, things are looking positive – they’ve established, what appears to be, the right product-market fit, and now it’s all about finalising and executing.

The website is haystackhq.com. HaystackHQ will be launched in private beta in March.