Saturday, February 14, 2015

Expanding on Embarcadero's ER/Studio Data Architect HTML Reporting for Data Lineage

     I've been using ER/Studio Data Architect for a number of years now for data modeling. There are pluses and minuses to it, like everything, but overall I think its a good tool for what I need it to do. With that said, there is one area I would like to see them go in a different direction...data lineage. They have a tab where you can add data sources and create data lineage diagrams. Which is good..but I think their data lineage diagrams are too clunky, time consuming to build, hard to maintain and difficult to understand. Even on their product page for this you can see how busy it is. Users want data lineage to easily see data ancestry, gauge impact analysis and view transformations along the way. I think these diagrams are ill suited for all of these. However; there is one saving grace with this product in this respect and that is the data lineage feature at the column level in a column's definition. Because of this we can hijack the metadata entered here for a new HTML report that, I feel, will better illustrate data lineage.

     To accomplish this, we're going to need to make use of elements of their HTML report, provided with the product, as well as making use of their application's API. But before getting into the weeds, let's take the 1000ft view and look at the data model example I'm going to use to help explain how all this works. This ER/Studio model represents the databases involved in supporting a data warehouse architecture. You can see we have our staging database, data warehouse and data mart:

Figure 1. Data Model Tab of ER/Studio

     On the Data Lineage tab we have a sample database that will be used as a source of data for the data warehouse databases:

Figure 2. Data Lineage Tab of ER/Studio



     In the source system database we have a table called person that were going to pull data from in this example. It has 4 columns which include a numeric identifier and some descriptions for a person:


Figure 3. Person Table from Source Database



     The first destination for the data in this table will be the Staging database in the stage.Person table. In order to associate these 2 tables for lineage purposes, we need to allow The Source database to be available for lineage to the Staging database. On the Data Model tab right click The Staging database in the tree view and click on Data Movement Properties. This will bring up a form that will allow you to pick available sources for the Staging database. Click on the add button and select the SourceSystem database:




Figure 4. Data Movement Sources/Targets for Staging

      This will make SourceSystem table columns available to be used as a source by tables in the Staging database. In Staging, let's double click on the stage.Person table, then double click the PersonID column. On the bottom of this form you should see a Data Lineage tab(You may need to scroll to the right to see it). On that tab click the edit button to expose the Edit Source Mappings form for the column. This is where we can choose the source(s) for the PersonID column in stage.Person:


Figure 5. Edit Source Mappings for stage.Person
     
     Here we check the PersonID column from the source database person table. You'll notice in the transformation logic text box I put the owner of the table. This is because in the objects offered by the ER/Studio API, the one for data lineage does not contain this attribute. Because we can have multiple tables with the same name in the same database, but with different schemas, we need to give the application a way to uniquely identify this source column. For the transformation description, since we are just doing a straight copy from source, I put "None" for transformations performed. Click OK. This will bring you back to the Data Lineage tab. You'll notice the information on the Direct Sources sub-tab will now be filled in. In addition to this tab you should also see tabs labeled "Direct Targets" and "Secondary Targets". These you can ignore. Embarcadero implicitly expects you to fill in children for the column. For this solution this is not necessary, we will make use of recursion to derive ancestors and children for a particular column. All we need to worry about is entering a column's direct ancestor(s).

     This process can be repeated for the DataWarehouse, sourcing the Staging database for it's sources:

Figure 6. Person Data Vault Model
    
      Here we decompose the person table, using a data vault modeling approach, to create a person hub with associated satellites. For each of these tables we repeat the steps we did for the stage.Person table. Here is an example using the LastName column in the satellite.PersonLastName table:


Figure 7. Edit Source Mappings for satellite.PersonLastName
     Just like the last example we check the column we're sourcing, enter the source column schema name and in this example we have a rtrim transformation happening on the column.

     The last stop for data will be a conformed dimension in our data mart for person:


Figure 8. Person Dimension in DataMart Database
     
     You'll notice a new column has appeared that was not there before, FullName. This is to demonstrate what happens when a column has more than 1 source. On the Edit Source Mapping for this column we have to do something slightly different to record lineage:


Figure 9. Edit Source Mappings for dims.DimPerson.FullName

     The first thing that should stand out is the Transformation Logic text box. Here we have a comma separated list of schemas for the tables selected in the order they are added as sources. We have 3 tables, thus we enter the schemas they belong to in order they appear:


Figure 10. dims.DimPerson Table Editor
   
     Now the hard part is over. We did our data entry and are ready to start getting into how to publish our data lineage. But first, we need to produce an HTML report for this ER/Studio model, because our data lineage reporting solution needs elements produced by this. By right clicking on any of the databases on the Data Model tab we can get a menu that will allow us to produce reports:


Figure 11. Database Menu
     
     This will bring up a wizard that will guide you through generating reports for your model, make sure to select HTML report for your report type:

Figure 12. ER/Studio Data Architect Report Wizard


     When you complete the wizard and the reports are generated you should see this in your target directory:


Figure 13. HTML Report Root Folder

     
     In the C:\DataWarehouseReport directory, create a new subdirectory called DataLineage. This will be our home for our data lineage HTML report. In this directory create 3 files index.htm, top.htm and tree.htm with the following code:

index.htm:


<frameset rows="10%,90%">
<frame src="top.htm" name="top">
<frame src="tree.htm" name="bottom">
</frameset>



top.htm:



<html>

<body  bgcolor="#ffffff" topmargin=0 leftmargin=0 link="#0000cc" vlink="#006666">

<table width="744" cellpadding=1 cellspacing=1 bgcolor="#ffffff" border=0>

<tr valign="top">

<td width="1" align=left valign=center bgcolor="#ffffff"></td>

<td width="80" align=left valign=center bgcolor="#ffffff">

<a href="http://www.embarcadero.com/" target="_top"><img src="../Support/Gif/Embarcadero.gif" alt="Link" border=0></a>

</td>

<td bgcolor="#ffffff" width="600">

&nbsp;

<table width="650" bgcolor="#ffffff" border="0">

<tr bgcolor="31639c">

<td width="20" bgcolor="#ffffff">&nbsp;</td>

<TD ALIGN = LEFT WIDTH="600" bgcolor="#ffffff">

<B><FONT SIZE=5 FACE="Arial" color="darkblue">ER/Studio Data Architect HTML Data Lineage Report</FONT></B></TD>

</tr>

</table>

</td>

</tr>

</table>

</body>

</html>


 tree.htm:



<html>

<head>

<style>

    BODY {background-color: white}

    TD {font-size: 9pt;font-family: Arial;text-decoration: none;white-space: nowrap;}

    A {text-decoration: none;color: black}

</style>

<script src="../Support/ua.js"></script>

<script src="../Support/ftiens4.js"></script>

<script src="TreeNodes.js"></script>

</head>



<body topmargin=16 marginheight=16>

<script>initializeDocument()</script>

</html>
 
     Finally, we build the program that will pull the metadata out of ER/Studio Data Architect and create our data lineage report. For this I created a C# program using the .net 3.5 library since the dll provided by Embarcadero for COM will not work in any higher .net version. I built a C# console application for this, but it can just as easily be refactored for a form based project as well. In your project we need to make sure we add the dll so we have access to the COM objects necessary to pull this off. In visual studio go to Project-->Add Reference-->COM. Make sure ER/Studio Type Library is there:

 
Figure 14. Reference Manager


If not, THIS WILL NOT WORK. Embarcadero tech support can help you get this installed if it wasn't installed with ER/Studio. This program requires 2 arguments, The file path to the dm1 file for your ER/Studio diagram and the file path for where you want to push the data lineage HTML report to:

Figure 15. Command Line Arguments

Here is the code for the program:








using System;
using System.Collections.Generic;
using System.Text;
using ERStudio;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

namespace DataLineage
{
    class DataLineage
    {
        //Erstudio application object
        private ERStudio.Application _app;
        //Erstudio diagram object
        private ERStudio.Diagram _diag;
        //The file path to the erstudio diagram (local path)
        private string _outPutFilePath;

        //Class to hold attributes of column children
        private class Child
        {

            public int key;
            public int child;
            public string childName;
            public string transformDesc;

        }

        //Collection that contains column ancestors
        private Dictionary<int, ERStudio.AttributeObj> _Ancestors = new Dictionary<int, AttributeObj>();
        //Collection that contain column children
        private Lookup<int, Child> _Children;
        //Initial List of children needed to load lookup
        private List<Child> _Child = new List<Child>();

        static void Main(string[] args)
        {
            //The file path to the erstudio diagram (local path)
            string diagramPath = args[0];
            //The file path to the .js file for output
            string filePath = args[1];
            //Pass arguments
            DataLineage my_app = new DataLineage(diagramPath, filePath);

        }
        /// <summary>
        /// This handles the main program control flow
        /// </summary>
        /// <param name="diagramPath">The file path to the erstudio diagram (local path)</param>
        /// <param name="filePath">he file path to the .js file for output</param>
        public DataLineage(string diagramPath, string filePath)
        {
            this._app = OpenApp();
            this._diag = OpenDiagram(diagramPath);
            this._outPutFilePath = filePath;

            //Loads erstudio objects into memory
            LoadERStudioObjects();
            //Prints out the .js file
            PrintJavascriptNodes();
            //Closes ErStudio
            CloseApp();
        }

  
        /// <summary>
        /// Loads columns from the ERStudio model into C# dictionary
        /// </summary>
        private void LoadERStudioObjects()
        {
            Console.WriteLine("Loading ERStudio objects into memory");

            //Loop through models
            foreach (ERStudio.Model md in _diag.Models())
            {
                if (md.Name.ToString() != "Logical")
                {
                    Console.WriteLine("Loading " + md.Name + " tables");

                    //Loop through tables
                    foreach (ERStudio.Entity ent in md.Entities())
                    {
                        Console.WriteLine("Loading " + md.Name + "." + ent.TableName + " attributes ");
                        //Load the columns into a collection if they have lineage
                        foreach (ERStudio.AttributeObj attr in ent.Attributes())
                        {
                            //Load Ancestors
                            if(Convert.ToInt32(attr.DataMovementColumnLinks().Count) > 0)
                            {

                                _Ancestors.Add((md.Name + "." + ent.Owner +"." + ent.TableName + "." + attr.ColumnName).ToLower().GetHashCode(), attr);

                                int i = 0;

                                string[] transformationLogic = Regex.Replace(attr.SourceDirectTransformationLogic.ToString(), @"\r\n?|\n", " ").Split(',');
                                string transformationDesc = Regex.Replace(attr.SourceDirectTransformationDescription.ToString(), @"\r\n?|\n", " ").Replace(@"""", @"\""").Trim();

                                //Load Children
                                foreach (DataMovementColumnLink dcl in attr.DataMovementColumnLinks())
                                {
                                    Child cd = new Child();
                                    string owner = "";
                                    //Protection from wrong schema or no schema entry or no source column entry
                                    if (transformationLogic.Count() >= (1) && dcl.SrcTrgtColumnName !=null && dcl.SrcTrgtColumnName.ToString()!="")
                                    {
                                        try
                                        {
                                            owner = transformationLogic[i].Trim();
                                        }
                                        catch(Exception e)
                                        {
                                            Console.WriteLine("cant write ancestor");
                                        }
                                    }

                                    //Ancestor hash
                                    cd.key = (dcl.SrcTrgtModelName + "." + owner + "." + dcl.SrcTrgtTableName + "." + dcl.SrcTrgtColumnName).ToLower().GetHashCode();
                                    //Child hash
                                    cd.child = (dcl.PhysicalModelName + "." + ent.Owner + "." + dcl.PhysicalTableName + "." + dcl.PhysicalColumnName).ToLower().GetHashCode();
                                    //Child Name
                                    cd.childName = dcl.PhysicalModelName + "."+ ent.Owner + "." + dcl.PhysicalTableName + "." + dcl.PhysicalColumnName;
                                    //Child transformation description
                                    cd.transformDesc = transformationDesc;
                                    _Child.Add(cd);
                                    i++;
                                }
                            }
                        }
                    }

                }

            }

            //Load children into a lookup for faster searches
            _Children = (Lookup<int, Child>)_Child.ToLookup(c => c.key);
        }


        /// <summary>
        /// This method prints out the javascript nodes
        /// </summary>
        private void PrintJavascriptNodes()
        {
            //Create/overwrite existing file
            if (File.Exists(_outPutFilePath))
            {
                File.Delete(_outPutFilePath);
            }

            //Write javascript to file
            using (System.IO.StreamWriter file = new System.IO.StreamWriter(_outPutFilePath, true))
            {

                file.WriteLine(" <!-- This frameset document includes the FolderTree script.  Script found in: http://www.treeview.net Author: Marcelino Alves Martins -->");
                file.WriteLine(" ICONPATH = '../Support/'");
                file.WriteLine(" USEICONS = 1");
                file.WriteLine("{");
                file.WriteLine("foldersTree = gFld('', '')");
                file.WriteLine("foldersTree.iconSrc = ICONPATH + 'Gif/globe.gif'");
                file.WriteLine("Diag_Node = insFld(foldersTree, gFld('Data Lineage', ''))");
                file.WriteLine("Diag_Node.iconSrc = ICONPATH + 'Gif/data_lineage.gif'");
                file.WriteLine("Diag_Node.iconSrcClosed = ICONPATH + 'Gif/data_lineage.gif'");

                //Add database nodes to file
                PrintDatabaseNode(file);
                //Add data source nodes to file
                PrintOtherSources(file);

                file.WriteLine("}");
            }

        }



        /// <summary>
        /// Prints out to specified file the javascript nodes for databases
        /// </summary>
        /// <param name="file">The file to print to</param>
        private void PrintDatabaseNode(System.IO.StreamWriter file)
        {

            foreach (ERStudio.Model md in _diag.Models())
            {


                if (md.Name.ToString() != "Logical")
                {

                    Console.WriteLine("Writing " + md.Name + " node");
                    file.WriteLine("Model_Node_" + md.GUID.ToString() + " = insFld(Diag_Node, gFld(" + "\"" + md.Name + "\"" + ", \"\"))");
                    if (md.Dimensional.ToString() == "True")
                    {
                        file.WriteLine("Model_Node_" + md.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/dimensionalmodel.gif\"");
                        file.WriteLine("Model_Node_" + md.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/dimensionalmodel.gif\"");
                    }
                    else
                    {
                        file.WriteLine("Model_Node_" + md.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/physical.gif\"");
                        file.WriteLine("Model_Node_" + md.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/physical.gif\"");
                    }

                    PrintTableNode(file, md);
                }

            }

        }

        /// <summary>
        /// Prints out javascript nodes for tables for a given database
        /// </summary>
        /// <param name="file">The file to print to</param>
        /// <param name="md">The erstudio database model</param>
        private void PrintTableNode(System.IO.StreamWriter file, ERStudio.Model md)
        {
            Console.WriteLine("Writing table javascript nodes for " + md.Name);


            //Pull all tables that exist in Model md
            List<ERStudio.Entity> results = new List<ERStudio.Entity>();

            foreach (ERStudio.Entity e in md.Entities())
            {
                results.Add(e);
            }

            //Order the list by schema then by table name
            IOrderedEnumerable<ERStudio.Entity> orderedResults = results.OrderBy(s => s.Owner).ThenBy(s => s.TableName);


            foreach (ERStudio.Entity t in orderedResults)
            {
                file.WriteLine("Table_Node_" + t.GUID.ToString() + " = insFld(Model_Node_" + md.GUID.ToString() + ", gFld(" + "\"" + t.Owner + "." + t.TableName + "\"" + ", \"\"))");

                switch (t.DimModelTableType.ToString())
                {
                    case "DIMENSION":
                         file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/dimension.gif\"");
                         file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/dimension.gif\"");
                        break;
                    case "FACT":
                         file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/fact.gif\"");
                         file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/fact.gif\"");
                        break;
                    case "SNOWFLAKE":
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/snowflake.gif\"");
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/snowflake.gif\"");
                        break;
                    case "BRIDGE":
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/bridge.gif\"");
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/bridge.gif\"");
                        break;
                    case "NAVIGATION":
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/navigation.gif\"");
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/navigation.gif\"");
                        break;
                    case "UNDEFINED":
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/undefined.gif\"");
                        file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/undefined.gif\"");
                        break;
                    default:
                         file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/entity.gif\"");
                         file.WriteLine("Table_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/entity.gif\"");
                        break;
                }

               

                PrintColumnNode(file, t, md.Name.ToString());

            }

        }

        /// <summary>
        /// Prints out the javascript nodes for other sources
        /// </summary>
        /// <param name="file">The file to print to</param>
        private void PrintOtherSources(System.IO.StreamWriter file)
        {

            file.WriteLine("Other_Folder_Node = insFld(Diag_Node, gFld(" +"\"Other Sources\"" + ", \"\"))");
            file.WriteLine("Other_Folder_Node.iconSrc = ICONPATH + \"Gif/other.gif\"");
            file.WriteLine("Other_Folder_Node.iconSrcClosed = ICONPATH + \"Gif/other.gif\"");

            //Loop through models
            foreach (ERStudio.DataLineageSource ds in _diag.DataLineageSources())
            {

                Console.WriteLine("Writing " + ds.Name + " node");
                file.WriteLine("Other_Node_" + ds.GUID.ToString() + " = insFld(Other_Folder_Node, gFld(" + "\"" + ds.Name + "\"" + ", \"\"))");
                if (ds.Type.ToString() == "Flat File")
                {
                    file.WriteLine("Other_Node_" + ds.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/flat_file.gif\"");
                    file.WriteLine("Other_Node_" + ds.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/flat_file.gif\"");
                }
                else
                {
                    file.WriteLine("Other_Node_" + ds.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/relational.gif\"");
                    file.WriteLine("Other_Node_" + ds.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/relational.gif\"");
                }

                PrintOtherSourceTableNode(file, ds);
            }


        }

        /// <summary>
        /// Prints out javascript nodes for tables for a given database
        /// </summary>
        /// <param name="file">The file to print to</param>
        /// <param name="ds">The erstudio database model</param>
        private void PrintOtherSourceTableNode(System.IO.StreamWriter file, ERStudio.DataLineageSource ds)
        {
            Console.WriteLine("Writing table javascript nodes for " + ds.Name);


            //Pull all tables that exist in DataLineageSource ds
            List<ERStudio.DataLineageTable> results = new List<ERStudio.DataLineageTable>();

            foreach (ERStudio.DataLineageTable e in ds.DataLineageTables())
            {
                results.Add(e);
            }

            //Order the list by schema then by table name
            IOrderedEnumerable<ERStudio.DataLineageTable> orderedResults = results.OrderBy(s => s.Owner).ThenBy(s => s.Name);


            foreach (ERStudio.DataLineageTable t in orderedResults)
            {
                file.WriteLine("OtherTable_Node_" + t.GUID.ToString() + " = insFld(Other_Node_" + ds.GUID.ToString() + ", gFld(" + "\"" + t.Owner + "." + t.Name + "\"" + ", \"\"))");
                file.WriteLine("OtherTable_Node_" + t.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/entity.gif\"");
                file.WriteLine("OtherTable_Node_" + t.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/entity.gif\"");

                PrintOtherColumnNode(file, t, ds.Name.ToString());

            }

        }

        /// <summary>
        /// Prints out javascript nodes for columns for a given table
        /// </summary>
        /// <param name="file">The file to print to</param>
        /// <param name="t">The erstudio table</param>
        /// <param name="modelName">The name of the database for the table</param>
        private void PrintColumnNode(System.IO.StreamWriter file, ERStudio.Entity t, string modelName)
        {

            Console.WriteLine("Writing " + t.TableName.ToString() + " column javascript nodes");

            //Pull all columns that exist in Table t
            List<ERStudio.AttributeObj> results = new List<ERStudio.AttributeObj>();

            foreach (ERStudio.AttributeObj a in t.Attributes())
            {
                results.Add(a);
            }

            //Order the list by column name
            IOrderedEnumerable<AttributeObj> orderedResults = results.OrderBy(s => s.ColumnName);

            foreach (AttributeObj c in orderedResults)
            {
                int columnHash = (modelName + "." + c.GetParent().Owner + "."+ c.GetParent().TableName + "." + c.ColumnName).ToLower().GetHashCode();


                file.WriteLine("Column_Node_" + c.GUID.ToString() + " = insFld(Table_Node_" + t.GUID.ToString() + ", gFld(" + "\"" + c.ColumnName +"\"" + ", \"\"))");

          
              
                if (c.PrimaryKey.ToString() == "True")
                {

                    if (c.ForeignKey.ToString() == "False")
                    {
                     
                        file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/pkattr.gif\"");
                        file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/pkattr.gif\"");
                    }
                    else
                    {
                       
                        file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/pkfkattr.gif\"");
                        file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/pkfkattr.gif\"");

                    }
                }
                else if (c.ForeignKey.ToString() == "True")
                {
                    file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/foreign.gif\"");
                    file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/foreign.gif\"");
                }
                else
                {
                    file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/attr.gif\"");
                    file.WriteLine("Column_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/attr.gif\"");
                }

                file.WriteLine("Ancestor_Folder_Node_" + c.GUID.ToString() + " = insFld(Column_Node_" + c.GUID.ToString() + ", gFld(" + "\"Ancestors\"" + ", \"\"))");
                file.WriteLine("Ancestor_Folder_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/bluefldr.gif\"");
                file.WriteLine("Ancestor_Folder_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/blueclsd.gif\"");

                //Print ancestors to javascript file
                PrintAncestors(file, c, "Ancestor_Folder_Node_" + c.GUID.ToString());

                file.WriteLine("Child_Folder_Node_" + c.GUID.ToString() + " = insFld(Column_Node_" + c.GUID.ToString() + ", gFld(" + "\"Children\"" + ", \"\"))");
                file.WriteLine("Child_Folder_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/bluefldr.gif\"");
                file.WriteLine("Child_Folder_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/blueclsd.gif\"");

                Console.WriteLine("Searching for children for " + c.ColumnName.ToString());

                //Print children to javascript file
                PrintChildren(file, columnHash, "Child_Folder_Node_" + c.GUID.ToString());

            }

        }


        /// <summary>
        /// Prints out javascript nodes for columns for a given other source table
        /// </summary>
        /// <param name="file">The file to print to</param>
        /// <param name="t">The other source table</param>
        /// <param name="modelName">The name of the database for the table</param>
        private void PrintOtherColumnNode(System.IO.StreamWriter file, ERStudio.DataLineageTable t, string modelName)
        {

            Console.WriteLine("Writing " + t.Name.ToString() + " column javascript nodes");

            //Pull all columns that exist in Table t
            List<ERStudio.DataLineageColumn> results = new List<ERStudio.DataLineageColumn>();

            foreach (ERStudio.DataLineageColumn a in t.DataLineageColumns())
            {
                results.Add(a);
            }

            //Order the list by column name
            IOrderedEnumerable<DataLineageColumn> orderedResults = results.OrderBy(s => s.ColumnName);

            foreach (DataLineageColumn c in orderedResults)
            {
                int columnHash = (modelName + "." + t.Owner + "." + t.Name +"." + c.ColumnName).ToLower().GetHashCode();

                file.WriteLine("OtherColumn_Node_" + c.GUID.ToString() + " = insFld(OtherTable_Node_" + t.GUID.ToString() + ", gFld(" + "\"" + c.ColumnName + "\"" + ", \"\"))");


                if (c.PrimaryKey.ToString() == "True")
                {
                    
                     file.WriteLine("OtherColumn_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/pkattr.gif\"");
                     file.WriteLine("OtherColumn_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/pkattr.gif\"");
                }
                else
                {
                    file.WriteLine("OtherColumn_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/attr.gif\"");
                    file.WriteLine("OtherColumn_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/attr.gif\"");

                }

                file.WriteLine("OtherChild_Folder_Node_" + c.GUID.ToString() +" = insFld(OtherColumn_Node_" + c.GUID.ToString() + ", gFld(" + "\"Children\""+ ", \"\"))");
                file.WriteLine("OtherChild_Folder_Node_" + c.GUID.ToString() +".iconSrc = ICONPATH + \"Gif/bluefldr.gif\"");
                file.WriteLine("OtherChild_Folder_Node_" + c.GUID.ToString() +".iconSrcClosed = ICONPATH + \"Gif/blueclsd.gif\"");

                Console.WriteLine("Searching for children for " + c.ColumnName.ToString());

                //Print children to javascript file
                PrintChildren(file, columnHash, "OtherChild_Folder_Node_" + c.GUID.ToString());

            }

        }


        /// <summary>
        /// Returns a dictionary of ancestor id's for passed collection
        /// </summary>
        /// <param name="links">Collection of ancestors</param>
        /// <returns>Dictionary of ancestor ids</returns>
        private Dictionary<int, object> GenerateAncestorList(DataMovementColumnLinks links)
        {
            Dictionary<int, object> ancestorMarker = new Dictionary<int,object>();
            int j = 0;

            foreach (DataMovementColumnLink ancestor in links)
            {
                ancestorMarker.Add(j, ancestor.GUID);
                j++;

            }
            return ancestorMarker;

        }

        /// <summary>
        /// Prints out ancestors for a given column
        /// </summary>
        /// <param name="file">The file to print to</param>
        /// <param name="c">The erstudio column</param>
        /// <param name="ancestorNodeGUID">The identifier for the parent ancestor node so we can nest</param>
        private void PrintAncestors(System.IO.StreamWriter file, ERStudio.AttributeObj c, string ancestorNodeGUID)
        {

            string ancestorName;
            Dictionary<int, object> ancestorMarker = GenerateAncestorList(c.DataMovementColumnLinks());

            //If there are source column(s) documented
            if (Convert.ToInt32(c.DataMovementColumnLinks().Count) > 0)
            {
                Console.WriteLine("Searching for ancestors for " + c.ColumnName.ToString());

                string[] transformationLogic = Regex.Replace(c.SourceDirectTransformationLogic.ToString(), @"\r\n?|\n", " ").Split(',');
                string transformationDesc = Regex.Replace(c.SourceDirectTransformationDescription.ToString(), @"\r\n?|\n", " ").Replace(@"""", @"\""").Trim();
                int i = 0;

                //Write transformation description
                file.WriteLine("Transform_Ancestor_Node_" + c.GUID.ToString() + " = insFld(" + ancestorNodeGUID + ", gFld(" + "\"Transformation: " + transformationDesc + "\"" + ", \"\"))");
                file.WriteLine("Transform_Ancestor_Node_" + c.GUID.ToString() + ".iconSrc = ICONPATH + \"Gif/Transformation.gif\"");
                file.WriteLine("Transform_Ancestor_Node_" + c.GUID.ToString() + ".iconSrcClosed = ICONPATH + \"Gif/Transformation.gif\"");

                //Loop through ancestry
                while (i < c.DataMovementColumnLinks().Count)
                {
                    //Return object based on id from dictionary
                    DataMovementColumnLink ancestor = c.DataMovementColumnLinks().get_Item(ancestorMarker[i]);

                    //Find current ancestor column object in list
                    ERStudio.AttributeObj result;
                    string owner = "";


                    //Protection from wrong schema or no schema entry
                    if (transformationLogic.Count() >= (i + 1))
                    {
                        owner = transformationLogic[i].Trim();
                    }


                    ancestorName = ancestor.SrcTrgtModelName.ToString() + "."+ owner + "." + ancestor.SrcTrgtTableName.ToString() + "." + ancestor.SrcTrgtColumnName.ToString();

                    Console.WriteLine("Found " + ancestorName);

                    file.WriteLine("Ancestor_Node_" + c.GUID.ToString() + i.ToString() + " = insFld(" + ancestorNodeGUID + ", gFld(" + "\"Ancestor Column: " + ancestorName + "\"" + ", \"\"))");
                    file.WriteLine("Ancestor_Node_" + c.GUID.ToString() + i.ToString() + ".iconSrc = ICONPATH + \"Gif/TransSelectInto.gif\"");
                    file.WriteLine("Ancestor_Node_" + c.GUID.ToString() + i.ToString() + ".iconSrcClosed = ICONPATH + \"Gif/TransSelectInto.gif\"");


                    if (_Ancestors.TryGetValue(ancestorName.ToLower().GetHashCode(), out result))
                    {

                        //If the ancestor of ancestor also has an ancestor
                        if(Convert.ToInt32(result.DataMovementColumnLinks().Count) > 0)
                        {
                            //Recursively find ancestor of current ancestor
                            PrintAncestors(file, result, "Ancestor_Node_" + c.GUID.ToString() + (i).ToString());
                        }

                    }

                    i++;

                }
            }

        }

        /// <summary>
        /// Prints out children for a given column
        /// </summary>
        /// <param name="file">The file to print to</param>
        /// <param name="columnHash">The hashvalue of the column name</param>
        /// <param name="childNodeGUID">The identifier for the parent child node so we can nest</param>
        private void PrintChildren(System.IO.StreamWriter file, int columnHash, string childNodeGUID)
        {
            int i = 0;

            //If there are children column(s) documented
            foreach (Child cd in _Children[columnHash])
            {
                //Write transformation description
                file.WriteLine("Transform_Child_Node_" + ((columnHash.ToString()) + (i.ToString())).Replace("-", "z") + " = insFld(" + childNodeGUID + ", gFld(" + "\"Transformation: " + cd.transformDesc + "\"" +", \"\"))");
                file.WriteLine("Transform_Child_Node_" + ((columnHash.ToString()) + (i.ToString())).Replace("-", "z") + ".iconSrc = ICONPATH + \"Gif/Transformation.gif\"");
                file.WriteLine("Transform_Child_Node_" + ((columnHash.ToString()) + (i.ToString())).Replace("-", "z") + ".iconSrcClosed = ICONPATH + \"Gif/Transformation.gif\"");

                file.WriteLine("Child_Node_" + ((columnHash.ToString()) + (i.ToString())).Replace("-", "z") + " = insFld(" + childNodeGUID + ", gFld(" +"\"Child Column: " + cd.childName + "\"" + ", \"\"))");
                file.WriteLine("Child_Node_" + ((columnHash.ToString()) + (i.ToString())).Replace("-", "z") + ".iconSrc = ICONPATH + \"Gif/TransGroupBy.gif\"");
                file.WriteLine("Child_Node_" + ((columnHash.ToString()) + (i.ToString())).Replace("-", "z") + ".iconSrcClosed = ICONPATH + \"Gif/TransGroupBy.gif\"");


                //Recursively find children of current child
                PrintChildren(file, cd.child, "Child_Node_" + ((columnHash.ToString()) + (i.ToString())).Replace("-", "z"));

                i++;

            }


        }


        /// <summary>
        /// Closes the Erstudio application
        /// </summary>
        private void CloseApp()
        {
            //this.app.Quit();
            _app.CloseDiagram(_diag.FileName);
            _app.Quit();
        }

        /// <summary>
        /// Opens the Erstudio application
        /// </summary>
        /// <returns>An instance of the Erstudio application</returns>
        private ERStudio.Application OpenApp()
        {
            ERStudio.Application app = new ERStudio.Application();
            //app.HideWindow();
            return app;

        }

        /// <summary>
        /// Opens a specific Erstudio diagram
        /// </summary>
        /// <param name="FileName">The file path to the diagram</param>
        /// <returns>An Erstudio diagram object</returns>
        private ERStudio.Diagram OpenDiagram(string FileName)
        {
            ERStudio.Diagram diag = _app.OpenFile(FileName);

            return diag;
        }

        /// <summary>
        /// Opens an Erstudio model
        /// </summary>
        /// <param name="ModelName">The name of the Model</param>
        /// <returns>An Erstudio model object</returns>
        private ERStudio.Model OpenModel(string ModelName)
        {
            ERStudio.Model mod = _diag.Models().Item[ModelName];
            return mod;
        }

    }
}





     An important thing to know is that ER/Studio Data Architect should be closed when kicking this off, or else you will get an exception. Keep in mind that sometimes it may stay in your task manager and the process may need to be killed. After the program finishes running you can open file:///C:/DataWarehouseReport/DataLineage/index.htm to view your data lineage report:


Figure 16. DataWarehouse Data Lineage

     Here is a screenshot showing data lineage for the satellite.PersonFirstName.FirstName column. The tree view shows the line of ancestry for this column, all the way back to the source system, as well as all the children of this column for impact analysis. Its in a collapsible tree that's easy to navigate and easy to understand:

Figure 17. Foreign Key Lineage

     
     You can also derive lineage from foreign keys as well and see the same lineage for that same column in the table where it is a primary key.

Figure 18. Data Mart Data Lineage


     In this screenshot you can see the multiple ancestors for the dims.Person.FullName column, showing that you can access more than 1 ancestor in your report if present:


Figure 19. Staging Data Lineage
          
      This screenshot shows the data lineage for Staging database, where it has 1 ancestor from the source system and a whole lot of children. 

 
Figure 20. Data Source Data Lineage

     The last screenshot shows the children of the source system. So when source system developers tell you they are making changes to the FirstName column in the Person table, you can quickly assess the impact in the tree view.

     The one thing I would love Embarcadero to add to their API would be the SrcTrgtTableOwnerName to the DataMovementColumnLinkClass:


Figure 21. DataMovementColumnLinkClass
     
     That would free up the Transformation Logic text box and the owner could be handled in code behind the scenes. Petition Embarcadero...they may do it, if enough people ask for it.