The Information Asset team had the opportunity to conduct a hands-on product evaluation of the Zaloni Bedrock platform. Net-net, we were very impressed with Bedrock and will discuss specific features in this posting.
Bedrock combines a number of features into a single integrated platform for Big Data Governance
There are a number of discrete vendors who offer different pieces of the Big Data Governance puzzle. However, Bedrock has combined a number of features such as data ingestion, enrichment, metadata management, data quality, workflow development and data masking into a single integrated platform for Big Data Governance on Hadoop.
Bedrock supports managed data ingestion
Bedrock offers an easy-to-use interface that supports file ingestion.
Bedrock homepage supports file ingestion into HDFS.
As a first step, we created a Landing Zone directory where we defined the patterns of files that will be ingested into Bedrock.
Create Landing Zone Directories and File Patterns in Bedrock.
We then created workflows to ingest files into HDFS. More about workflows later in this posting, they are key to the entire Bedrock platform. We can view the Ingest History for files into HDFS.
Ingest History in Bedrock.
Bedrock also supports integrated metadata management
We can also view the lineage of files in and out of HDFS. Bedrock supports much more complex data lineage within Hadoop but that is beyond the scope of this posting.
File Ingestion lineage with Bedrock.
Bedrock supports data quality management with integrated workflows to quarantine invalid records during data ingestion
Hadoop data quality is a critical but often overlooked task because it is difficult to execute in practice. Bedrock performs data quality checks and operations in three distinct steps:
- Create data quality rules
- Associate data quality rules with entities
- Execute data quality rules
In this example, we used the Sample_Employee.csv file, which contains basic employee information that contains null values for Date of Birth.
We defined a Not_Null_DOB data quality rule that checks for null dates of birth based on a pre-defined expression.
Not Null Date of Birth data quality rule in Bedrock.
We then selected the Sample_Employee entity in Bedrock, which would be subject to the data quality checks.
We then selected the Date of Birth field which would be associated with the data quality rule.
Select Date of Birth field.
We then created a workflow to check for null values using the Data Quality Action that was created earlier. We also defined the valid data file for records that pass the data quality rule and can be used for further processing. In addition, we also specified an invalid data file path that quarantines records that fail the data quality rule. Data stewards can then examine the conduct manual checks on the invalid data file path.
Configure Data Quality Action with valid and invalid data file paths.
After execution of the workflow, we can view files in the valid and invalid file paths. The screenshot below shows valid records where the date of birth is not null.
Valid records where the date of birth is not null.
The following screenshot shows the contents of the invalid data file with null dates of birth.
Invalid records with null dates of birth.
Bedrock supports data masking and tokenization
Bedrock supports data masking and tokenization as part of an integrated platform. As shown in the screenshot, we created rules to mask the Email ID field. This is as simple as checking the Masked? box and creating a masking pattern. In this case, we entered 4-9:*;14-20:$; to replace the characters at the positions 4-9 with ‘*’ and those from 14-20 with ‘$.’
Developing a data masking pattern with Bedrock.
We then created a workflow in the Design view and added the Token Masking Action. Set the properties as shown in the screenshot below. Save and Execute the workflow.
Workflow to execute the masking action.
After executing the workflow, we can view the HDFS files in the Ingest History within Bedrock.
HDFS Ingest History.
Finally, we can view the masked HDFS data in the screenshot below.
Masked HDFS data.