Lots of copies, little control: Today, most organizations do not handle data very efficiently. In Canada, CIOs are rallying around a new standard: zero-copy integration. This should eliminate data copying and help organizations comply with regulations like GDPR.
Can Canada solve data proliferation? With zero-copy integration, the country is at least making an attempt. Zero copy is a standard developed by the Data Collaboration Alliance together with Canadian government agencies and CIOs. The standard aims to ensure that organizations can introduce modern digital solutions without making managing the associated data a soup. Although Zero-Copy is a Canadian initiative, the framework provides an excellent foundation for digital transformation worldwide.
More needs for the same data
The zero-copy integration standard was born in a context where businesses and customers are increasingly demanding digital experiences that are ideally connected. This leads to an increase in applications that need access to the same data. To do this, data from sometimes hundreds of systems is copied to databases that are tailored to just a few applications. Developers do not only take this approach (often out of necessity) for external applications. Databases are also massively copied internally to support various applications.
“This wastes almost half of the IT budget for companies worldwide,” said Dan DeMers, president of the Data Collaboration Alliance. “It’s also the reason citizens and businesses no longer have control over their own data.” In 2020, we previously wrote that 82 percent of businesses have more than 10 copies of databases. In general, 65 percent of all data in databases is ambiguous.
But it doesn’t have to be like that. Today, there are several digital solutions that can avoid copying data in bulk to provide certain functions. Large and progressive organizations worldwide are already using it. Snowflake, for example, makes its living from a cloud-based solution where applications, organizations and people, both external and internal, can access data from the same central data set.
basic principles
Canadians are moving forward with a standard that encourages and frames such practices. “The notion of having to copy data in order to share it needs to go,” said Keith Jansa, executive director of the CIO Strategy Council in Canada. Zero-copy integration takes the form of a set of six principles that organizations, developers, IT administrators, and data architects must follow to avoid unnecessary copying.
- Data centricity is prioritized along with metadata for complex code.
- Modularity is preferred over monolithic design.
- Data management should run over a common data architecture, not over application-specific databases.
- Universal access control must be done through the data layer.
- Data governance runs through the data products and federated access and not through centralized teams.
- Data is shared, not copied, through permissions-based collaboration.
Get rid of app database
That deserves a little explanation. With data-centricity, zero-copy integration is designed to ensure organizations see data as their most important asset. Data is permanent, applications come and go. The data architecture thus takes precedence over application development and is actually separate from it. Data is in a virtual library, applications are viewed there.
Modularity builds on that. By developing a modular environment, it is easier to work with these core datasets. This is in contrast to rigid monolithic app development where having your own (copied) database is more difficult.
The third point is also an extension of this principle. App-specific databases should no longer exist when applications are modular and data is a core primary asset. There shouldn’t be a database for an app: all data deserves its place in the central database.
Federated Policy
This implies that security and policies must take place at the data level. Through the data tier, you grant access to accounts and applications. This prevents an application mishmash with a large number of accounts for different access rights to often the same (copied) data. Thanks to the central assignment of rights for all data, you as an organization retain an overview and control.
In this structure, it is possible to delegate control of data policy to the experts who know about it. This can be done by developing a policy where individual teams can work within their own privileges, for example when rolling out an application. The approach contrasts with centralized access management performed by a single data team that has ultimate control over all the details. Of course, someone has ultimate responsibility, but a federated system ensures teams lower down the rights ladder can still dye as they please within certain lines.
Finally, the zero-copy standard dictates that all access to data must be through policies and access rights. There is no longer any good reason to provide internal or external rights holders with a data copy that can become independent. The owner of the data thus retains final control at all times.
cloud first?
This certainly implies that zero-copy integration takes a cloud-first approach when it comes to remote access. This is the only way for external parties to access data efficiently. Applications that depend on the central database expect responsiveness. If the file is intended to view or modify data in the central repository, the temptation to quickly create a copy for an application-specific database beckons again, and the entire zero-copy integration plan fails.
Zero-copy integration has numerous advantages. Data does not take up unnecessary hard drive or SSD space with additional copies, conflicting versions of data no longer exist, access control is easy, the overview is maintained and audits for compliance become possible.
Canadians see the standard as interesting for businesses, governments and citizens alike. For companies, a modern data architecture offers a flexible way to deal with the digital transformation. The thoughtful separation of data and applications makes it easier to create new applications and discard old ones. When zero-copy integration becomes widespread, it will also create an architecture that allows data to be shared with external parties in a secure and controlled manner. Again we think of Snowflake’s ambitions, which enable companies with their data cloud to combine external datasets with their own data or vice versa: they can make their data available to third parties.
Solid
Looking at individual users, we see zero-copy as a standard that ties into Tim Berners-Lee’s Solid story. Solid is a data architecture based on data vaults. These vaults form a central data repository that remains under the control of the data owner. According to the zero-copy principle, (external) applications can use the data, but the vault remains its central storage location.
In this respect, the Flemish government is not lagging behind that of Canada. In Canada, a standard is being created with zero-copy integration, which will certainly have broad support from the IT world there, but we are experimenting extensively with Solid. Both initiatives attempt to provide a solution to different facets of the same problem.