Amit Shaked, CEO and co-founder, Laminar.
In 2012, IDC found that 2.8 zettabytes of data has been created and replicated. To give some sense of the scale here: One terabyte is equal to 1,000 gigabytes. One zettabyte is equal to 1 trillion gigabytes.
That massive number took the world decades to reach and barely any time at all to quickly surpass; in fact, IDC estimates that by 2025, there will be 175 zettabytes in the datasphere—a growth rate of an almost incomprehensible magnitude.
The digital data revolution has driven previously unimaginable levels of innovation that have benefited both businesses and consumers. Of course, there are downsides to this arrangement, as anyone working in IT knows—namely, the increased cybersecurity risks.
Without downplaying those risks, though, I’d like to suggest that some of the concern around data and cybersecurity has been misallocated. Concerningly, organizations are mistakenly exposing essential information on their own without even realizing they’ve done so.
This growing risk is called the “innovation attack surface.” Rather than cyber adversaries using traditional attack surfaces and exploiting vulnerabilities to gain access to sensitive information illicitly, the innovation attack surface results from the organization’s most highly active cloud users (developers and data scientists) inadvertently creating risk when leveraging data for innovation.
It’s dynamics like this new attack vector that caused the Center for Internet Security (CIS) to recently move data protection up from number 13 on its list of critical security controls to number 3 in Version 8.
One of the greatest risks created by the innovation attack surface is shadow data—i.e., data whose existence or whereabouts are not actively known. Failing to account for shadow data can have disastrous consequences for an organization, yet many organizations have been lax on this front—accepting some quantity of unaccounted-for data as simply the cost of doing business.
This is a dangerous attitude, as evidenced by a number of high-profile breaches connected to shadow data. In fact, in our 2023 State of Public Cloud Data Security Report where we surveyed over 500 data security and governance professionals, 68% of respondents listed shadow data as the top challenge for protecting data in the cloud.
What Shadow Data Is
Shadow data exists outside of an organization’s main data store. Think of it this way: If your main data store is a well-organized closet whose contents you check on regularly, your shadow data is a dusty, forgotten corner of your basement where waterlogged boxes have piled up.
However, it is just as sensitive as the data actively patrolled on your main data store. This shadow data floats unacknowledged in the cloud for the simple reason that businesses don’t know where it is and that they lack the tools and resources to manage and secure it. It’s a crisis just waiting to happen if not adequately addressed.
How We Got Here: Cloud Transformation And Data Democratization
Cloud transformation refers to the dominant trend in enterprise computing in the last decade—namely, the transition (for the vast majority of businesses) from an on-premises app development and deployment system to an agile, distributed and services-lead architecture in the cloud.
Data democratization, meanwhile, encompasses the new and exciting ways that businesses of all kinds have begun to make use of data in their day-to-day operations. In 2023, virtually every company is an “information” company in the sense that, on a day-to-day basis, data is helping them optimize their services and make breakthroughs at a faster rate.
These two developments have led to a perilous widening of the “innovation attack surface.” Security teams might put up guardrails, but they cannot stop these developers from making changes—nor should they have to. It’s a terrible paradox; the more innovation that goes on, the more harm a business is exposed to through unattended shadow data.
Shrinking The Innovation Attack Surface
This is not an abstract concern. In just the last few years, we’ve seen what can happen when shadow data isn’t properly taken care of. In the summer of 2022, a misconfigured Amazon S3 bucket resulted in more than 1.5 million files of airport data being publicly exposed—without the airport’s knowledge. These files included employee personally identifiable information (PII) and other airport-sensitive data. Our research found that 21% of publicly exposed Amazon S3 buckets contain sensitive data.
To shrink the innovation attack surface and combat rising concerns of increased data living in the shadows, security leaders need to adopt strategies that lead them to full data visibility. After all, you cannot protect what you cannot see. Accounting for all of the data in a network, both known and unknown, should be the first step security teams take.
Once you know what data is located on your network, it is important to know who has access to what data and why—and where it is located. Only then can you understand how it is secured.
Then, as a natural step for more of a data-centric approach to security, your organization should shop around to see what cloud-native tools can be implemented into your network to always give that visibility and context to data.
Adding The Human Element
Outside of technology, it’s important to educate your team on the importance of data-centric approaches. Cybersecurity is a shared responsibility; no single solution alone can do it. Encourage best data-sharing practices and educate team members on the signs of abnormal behavior.
Your organization should be able to operate at the speed of innovation without fear of compromise. With the right tools and practices in place, data scientists and developers can shine a light on data in the shadows.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?