End-to-end data lineage with DAG MetaCenter & Cloudera Navigator

In this blog post, we will discuss how to establish end-to-end data lineage across SQL Server and Hadoop using DAG MetaCenter and Cloudera Navigator. Cloudera Navigator provides lineage within the Hadoop environment itself. However, if you want end-to-end data lineage (including non-Hadoop data sources), then you need to work with an enterprise metadata repository like […]

Continue Reading >

Importing Cloudera Navigator metadata into Collibra

The Information Asset team has been working with Cloudera Navigator and Collibra. Cloudera Navigator provides rich Hadoop metadata around artifacts like Hive tables and Sqoop jobs. Collibra provides tooling to govern these data artifacts. In this blog, we will discuss how we imported the metadata from Cloudera Navigator into Collibra so that it can be […]

Continue Reading >

Hands-on Big Data Governance with Cloudera Navigator

The Information Asset team brought Cloudera Navigator into our Big Data lab. Cloudera Navigator supports metadata capabilities within Hadoop. In Figure 1, we were able to view data lineage that includes a Sqoop job (Supplies), Cloudera, output file (part-m-00000), HDFS file (Contacts.csv), Hive table (contact_details) and a Hive job. Figure 1: Hadoop data lineage with […]

Continue Reading >

Integrating Oracle Enterprise Metadata Manager with Hadoop

The Information Asset team recently brought Oracle Enterprise Metadata Manager (OEMM) into our big data lab. Although OEMM supports metadata integration with several repositories, we wanted to test the Hadoop integration. Our use case was to harvest the Hive tables from Cloudera’s distribution of Apache Hadoop into OEMM. We first installed the drivers to connect […]

Continue Reading >

First Take – InfoSphere Stewardship Center

The Information Asset team has been working closely with the IBM InfoSphere tooling at a number of clients. We had a chance to view a demo of the new business process management (BPM) capabilities within IBM InfoSphere Stewardship Center. IBM InfoSphere Stewardship Center is a newly-released capability that is integrated with IBM InfoSphere Information Governance […]

Continue Reading >

Hands-on Big Data Governance with Waterline Data Science

We recently brought Waterline Data Science into the Information Asset Big Data Lab for hands-on testing. Waterline is a VC-funded startup. The company is run by some of my former IBM colleagues including Alex Gorelik and Oliver Claude, so I was interested in their newly-released product. Waterline has positioned itself as the “Amazon of Big […]

Continue Reading >

Hands-on Big Data Governance with Dataguise DgSecure

The Information Asset team brought Dataguise DgSecure into our Big Data lab. We love the product and are already recommending Dataguise to our clients to support their Big Data Governance programs. Define Policy As shown in Figure 1, we created a fine-grained policy called Demo_Policy by assembling multiple out-of-the-box expressions for Address, Social Security Number, […]

Continue Reading >