Big Data is one of the key technological developments of today – its impact will have far reaching consequences.
In this blog, I’ll look at the potential impact on the patent system.
1. Big data is mathematics.
I wrote about Big Data before, and the potential effect Big Data will have on Intellectual Property Rights.
One of the reasons why people have difficulty in understanding the potential impact of Big Data, is because Big Data is really, well, big. But being “big” is actually not the key characteristic of Big Data I want to address. The key characteristic that is relevant for Big Data’s impact on Intellectual Property is its explosive growth.
The amount of data humanity and its machines will create in 2013 is enormous. Really enormous. Astronomical, gargantuan, Brobdingnagian, titanic. The numbers we talk about are in the area of zettabytes (equal to one sextillion, or 10 to the power 21 bytes, or 1 billion terabytes).
Yet, compared to amount of data we will create in 2020, that same amount created on 2013 is tiny. It’s puny, it’s almost negligible, a couple of percent.
Humans are not very good at understanding exponential growth; our brains evolved to deal quickly with potentially dangerous or beneficial situations while we were running around on the savannah, and to manage complicated social relations with about a hundred other individuals in our immediate surroundings. But understanding complex mathematics and exponential growth were not among the skills we needed to survive in those days.
So we don’t easily grasp the importance of Big Data, because it occurs outside our frame of easy reference. Put simply: we tend not to “get” it. One aspect we don’t get is the relation between Big Data and a key aspect of patents: the concept of prior art.
2. Big Data is prior art
The current patent system centers on the concept of novelty. It is an absolutely essential condition that a patent should only be granted for a new invention, something that did not exist before. That which exists before is called “prior art” by patent law. In theory, any knowledge in society that is publicly accessible is (or should be) considered as prior art.
Most of the information and data that are generated as a result of Big Data are therefore to be considered as “prior art”. Some data will disqualify, e.g. for being held secret – however, there is no reason to assume that any such subset or category will grow faster than the publicly available part of Big Data. Therefore, as a result of the exponential growth of Big Data, it is reasonable to conclude that virtually all of Big Data is to be considered as prior art.
This has a profound impact on the patent system. Prior art is growing by 100% every 12 to 18 months (and that time span is gradually getting shorter). Yet the patent system only doubles its output, the number of patents granted, roughly every 20-25 years (and that is in absolute numbers, thanks to hiring more people – the number of patents granted per patent examiner tends to decline rather than increase). So even if we all became patent examiners, it would still only take a limited number of years before Big Data/prior art becomes so big that virtually all patent applications are rejected, because its information was already publicly available. Of course, this assumes that patent examiners look at all possible prior art, and properly reject those patents that are not new.
The conclusion is clear: a patent system based on the concept of “novelty” cannot remain functional in a world that has exponential growth in prior art.
This graph shows an example of what happens when exponentially growing prior art meets the patent system. It plots the total number of patents granted in the US from 1963 to 2012 – the number went up from roughly 50,000 to just under 300,000. The number of patents doubled twice in that period; the first time it took 27 years, the second time 19 years. The “prior data” graph starts with 1 byte of data in 1963, and doubles every 24 months. It is true, of course, that humanity had more than 1 byte of data in 1963, but it is also true that 1 patent > 1 byte of data. Yet I think the overall conclusion remains valid.
3. Patents are data (but not Big Data).
There’s more. A patent is in essence a combination of data and one or more algorithms. Each patent describes a process that uses certain data, resulting in the qualification of being a useful and new invention, according to the patent examiner.
So, a patent is data. But not that much data: a typically patent application will be a couple of thousand words. In almost all cases, less than one megabyte of information. That’s not a lot in today’s world. The 300 odd thousand patents issued by the US patent office in 2013 will add probably less than 100 Gigabyte of information to the world; that’s a bit more than 10% of the hard drive capacity on the machine I use to write this blog with. Again, in the light of the exponential growth of Big Data, the amount of data in patents is, statistically, evolving towards an insignificant fraction compared to all data in the world.
In addition, a patent is an algorithm. However, a patent is a particular algorithm, in that the algorithm itself is frozen. The patent effectively takes a snapshot of the information of a technological invention (the algorithm and the data) and freezes it in time. Only that particular version of the algorithm is protected by the patent’s exclusive rights. The problem is, in the world of Big Data, frozen algorithms are the least valuable ones.
Big Data has value because it allows analysis and intelligence. This is where we apply algorithms on the data. Those algorithms allow us to make sense of all that data. But, like the search algorithms used by Google at the core of its technology, those algorithms need to be changed constantly. This is a direct, and necessary, consequence of the nature of Big Data.
Big Data is often described by its three characteristics of Volume, Velocity and Variety. Variety also means that the data, their volume, source, and the way they behave, change constantly. And the value of data lies mainly in the potential of access and analysis. This can only happen when the algorithms used for that access and analysis can be updated continuously, as a result of the Variety and the continued modification of data streams.
Therefore, the concept of a frozen algorithm, as contained in a patent, loses most of its potential value. In a way, this reflects the observation of technology cycles speeding up as a result of the growing importance of information technology as an all purpose technology. Patents are gradually becoming more and more useless if they can’t be continuously adapted. But such continuous adaptation would be contrary to their core identity as a frozen algorithm, which is granted a monopoly because it is “novel”.
4. How can the patent system react?
There are really only a very limited number of ways in which the patent system can react.
- Increasing the number of patents is an exercise in futility, compared to the exponential growth of prior art. There are simply not enough human patent examiners around.
- Automating patent review is not the right answer either, although one might think it to be a logical step. There are two problems with automated patent review. The first one is political; both patent attorneys and patent examiners would fight the concept of machines doing their work. The second problem is a logical one: any system of automated patent review will, inevitably, drive the rejection rate of applications to 100%; a system that is allowed, automatically, to search for prior art in all of the worlds data, will always find information that is “prior art”. That’s the inevitable consequence of the exponential growth of Big Data.
- Extending the scope of patents is the third option. This, again, is a political step. It would amount to allowing patents on technology that is not new (a practice much applied today by most patent offices, even though they shouldn’t), or to allow patents on broad concepts, on ideas, or on business models (rather than on specific technical inventions, as is the case today). However, regardless of the fact that this would be an economic disaster, we observe that, if there is any movement in the policy on granting patents these days, that movement is towards more restrictive criteria, not broader criteria.
- Another possibility is to allow patents with “moving targets” algorithms – describing the invention vaguely or in its general principles, so that it can apply to many generations of “novelty” arising out of the same principles.
- Or, finally, the solution used by the pharmaceutical industry, which is to re-patent the same invention over and over again, every time slightly modified. Enough to obtain a new patent, but not too much, so as not to require an actual new product. Yet, again, theoretically, any such patent should be rejected by a properly functioning patent office, if it was really properly reviewing prior art, based on the exponential growth of prior art.
In essence, the patent system’s only reaction to Big Data can be to issue more “bad patents”.
However, as we know, it’s those bad patents, together with the dysfunctional US patent litigation system, that are the root cause of the serious economic damage done by patent trolls (estimated by the White House to be in the range of 300 billion dollars). This, in turn, has started to cause political reaction, aiming at making the patent system less dysfunctional and less of a tax on innovation. The only way to do that is to aim at restricting the number of bad patents.
Something will have to give, the patent system or Big Data. My guess is it won’t be Big Data.