Patents, Piracy, and Predictions

January 22, 2015

McCormick and Northwestern Law professors collaborate on legal analytics software to predict future patent disputes

By Amanda Morris

This article originally appeared in the Fall 2014 issue of Northwestern Engineering, the magazine of the McCormick School of Engineering and Applied Science.

Samuel Morse revolutionized global communication in 1838 when he patented the telegraph. Within a decade, more than 20,000 miles of telegraph wire spanned the continental United States. By 1866 a transatlantic line stretched all the way to Europe, introducing the world to near-instant messaging.

You might think Morse would have relaxed and basked in the glow of his success. Not so. He constantly watched over his shoulder for “patent pirates” and spent countless hours and a small fortune defending his patents in court.

“I have been so constantly under the necessity of watching the movements of the most unprincipled set of pirates I have ever known,” Morse said in an 1848 letter to a friend, “that all my time has been occupied in defense…”

Telecommunications may have advanced exponentially in the intervening 166 years, but little has changed in the shady world of patent infringement. Inventors today spend millions on insurance to protect their intellectual property. A 2011 American Intellectual Property Law Association survey revealed that patent-related legal costs can range from $650,000 on the low end to a jaw-dropping $5 million.

Peering into the future at Northwestern

What if inventors could see into the future and know whether or not their patents would be infringed? How much money could they save? How much more time would they have to innovate?

The answers may come sooner rather than later thanks to the collaborative effort of McCormick School of Engineering Professor Diego Klabjan and Northwestern Law Professor John O. McGinnis. Together with PhD student Papis Wongchaisuwat, they are developing a new model that not only identifies patents with a high probability of being litigated, but can also predict how many years into the future the dispute will occur.

“If you know a patent is more likely to be litigated, then you’ll pay closer attention to possible infringements,” said Klabjan, professor of industrial engineering and management sciences and director of McCormick’s Master of Science in Analytics program. “A patent that is not likely to be litigated might be put aside. It really boils down to how you allocate your resources, like your budget for infringement detection.”

An idea comes to life

Two years ago, McGinnis, professor of law, approached Klabjan for advice about developing a new legal analytics course. McGinnis has long been interested in data analytics and the growing role of machine intelligence in the legal profession. After several conversations, a research collaboration was sparked.

“Lawyers make predictions all the time,” McGinnis said. “Will this be a good case? Will we win? Making the correct prediction is a central challenge for the legal practice. If we can make an analytic framework for prediction, then we can see the outcome before going through the costs of litigation.”

The team started by examining publicly available data from the US Patent and Trademark Office. Then they retrieved litigation documents from Lex Machina, a company that tracks patent cases. They matched the data, focusing on the claims section of past patents and associated legal disputes. Klabjan and Wongchaisuwat used text analytics to pore over 10,000 patents to discover patterns and trends in those patents that were disputed and those that were not. They then extracted keywords and phrases that could indicate future litigation.

“We used historical patents to create our prediction model,” Klabjan said. “Then we compared our predictions to what really happened. The actual prediction is for the future, but you don’t know what will happen in the future, so you compare it against recent history.”

After establishing an algorithm that makes an accurate prediction, Klabjan built software that automatically analyzes patent documents. So far, the model is able to correctly predict 64 percent of litigated patents. After filing for a patent, inventors can run the software to discover when it might be litigated.

“Our software might say it’s likely to be litigated in three years,” Klabjan said. “So for the first two years, they know they are fine. They don’t have to do much. When they enter year three, they know they need to start monitoring for potential infringements.”

Refining the model

To further improve their prediction model, Klabjan and his team have started incorporating financial data from the US Securities and Exchange Commission (SEC). They are specifically reviewing revenue and profits from companies that own patents. According to Klabjan, large companies, such as Microsoft or Apple, are more likely to experience infringement cases because the amount of money involved is significant. SEC filings don’t follow a standard format, so integrating financial information has been slower. The team had to write special, tailored codes to parse and analyze each format.

Klabjan and McGinnis plan to continue collaborating on other legal analytics projects. Right now, the field is ripe for exploration. Few researchers work within legal analytics because of difficulty accessing legal documents. Klabjan formed an academic partnership with Lex Machina and gained access to the company’s materials. Otherwise, it could have cost tens of thousands of dollars.

“You can draw patterns from one document, but you cannot claim that the same patterns will occur in other documents,” Klabjan said. “For that you need a large volume of documents, which is hard to obtain.”

“We can use machine intelligence to forecast outcomes of patent litigation,” McGinnis said. “This is the future of law.”