The basics of Data Security and Governance
We have been in the era of data-lakes for a few years now, the term has been thrown around by numerous industries as a solution to all of our data problems - a place to store all of our data. But how do we ensure that we prevent failures and make it work, what is the way forward?
Everyone usually has great intentions, the notion of having all your data sitting in one central place is a noble goal, though are you at risk, or have you already built a data-swamp?
We are passionate about data governance and feel it is a key component in the data space, but it can be difficult to understand at times and has been described by many as downright boring. We’re a fun bunch at Vibrato and feel that a good analogy can help understand complex concepts. When talking about data-lakes, it was hard not to think of a lake as a habitat for fish (data) to live in, and we linked several concepts that relate with that, so if you're ready for a fishing trip, bait your lines, cause here we go...
What species of fish do we have?
The difference between a data lake and a data swamp is an understanding of the fish species in our habitat. Think semi-structured (
Knowing the species is great, but we want to drill down further - how do we know what family a fish comes from, or more so, what information the data contains?
Its a classification problem, and this really comes down to data being tagged or to be more specific having meta-data attached to it so that we are able to understand what information it contains.
We still have a problem, we have thousands of fish and want to know if we are at the right spot in the lake to catch the right one - we need a data
Where are the fish coming from?
When we talk of data lineage, we can envision a multitude of streams converging into a lake, and while streams, or to be precise, data pipelines, can be permanent or temporary, we want to be able to understand or track when and what fish are coming from which stream, so that we can better understand or perhaps audit the habitat.
Now while we can allow many different species of fish to live in our lake, we want to be able to understand the health of our fish, or more so the quality of our data. A data lake does not only store raw data,
As the fish populations grow in the lake, there are some fish - some data that is required only for a certain time. Having a data retention capability and
Who do we allow to fish?
Self-service analytics is critical to a successful business and a key aspect in becoming a data-driven organisation, though it does not come without concerns. Do we let everyone fish or trawl in the lake - do we carelessly let everyone do whatever
Providing fishing licenses from a fishing ranger, while annoying to the honest people looking to fish, protects the habitat from degradation - your business from being exposed to data leaks and numerous other risks, potentially causing business disruption, fines and the possibility of your brand being tarnished.
Why do we need to protect our fish?
Thinking of the most critical data security concerns, data breaches or more
Historically, there have been cases where both public and commercialised data sets were not properly anonymised, causing leakage of personal and sensitive information. The tradition of masking data with
Today, following best practice and compliance regulations is essential. GDPR is effective in the EU, and we will potentially see similar laws implemented throughout the world. PII (Personally Identifiable Information) must be encrypted at rest, in transit and anonymised before being utilised for analytical purposes, and the approach must
No matter where you are in your data journey; whether you're focused on protecting your data assets and minimising risk exposure, or you desire to increase efforts in strategic offensive activities, the pre-requisites still apply. If you have aspirations for data monetisation, creating data products or applying advanced analytical methods such as AI and Machine Learning to gain competitive advantage; data governance must be considered and is critical in future proofing your business, and providing a safe place to fish.