Some of our clients’ Data Science teams are using Alteryx for Data Blending (Data Munging). I downloaded a trial version of Alteryx Designer because I was eager to get my hands on the tool. I was also interested to understand how Data Governance might intersect with a day-in-the-life of a data scientist who uses Alteryx to blend data from different sources.
As shown in the screenshot below, I used a sample workflow and data in Alteryx Designer to blend a Transactions.xml file with a Customers.csv file and produced a summary of Sales by Customer Segment. The tool is very powerful in the hands of data scientists.
Alteryx Designer blends transactions and customer data.
From a Data Governance perspective, it would be good for the data scientist to view the definitions for key fields in a business glossary. If the data scientist created a new field, then they should be able to add the definition for that field into the business glossary. The data scientist will want light-weight Data Governance that does not create a lot of overhead.
I am also familiar with Data Governance tools like IBM Information Governance Catalog, Collibra and Informatica Metadata Manager. It would be interesting to see how the functionality of Data Blending and Data Governance tools converge over time.