Components > Intelligent structure models > Use case
  

Use case

You work in an operations group for a streaming media service company. Your team wants to process web logs from your server farms to obtain operations analytics and to identify maintenance issues.
Your back-end system collects data regarding server access and system load in your server farms. Your team wants to identify the operations that have created the most server load in the past few weeks. You want to store data afterwards for auditing purposes.
Before your data analysts can begin working with the data, you need to parse the data. However, the logs are semi-structured, and after server upgrades the log file structure might change slightly and some of the information might take a different format. With a standard transformation, this would cause data loss or log processing failures.
If the input data contains headers, Intelligent Structure Discovery supports data drift to different locations. If the input data does not contain headers, Intelligent Structure Discovery identifies additional data at the end of the input.
Your initial log files have the following structure:
05967|2014-09-19|04:49:50.476|51.88.6.206|custid=83834785|cntry=Tanzania|city=Mtwango|movie={b1027374-6eec-4568-8af6-6c037d828c66|"Touch of Evil"}|paid=true
01357|2014-11-13|18:07:57.441|88.2.218.236|custid=41834772|movie={01924cd3-87f4-4492-b26c-268342e87eaf|"The Good, the Bad and the Ugly"}|paid=true
00873|2014-06-14|09:16:14.522|134.254.152.84|custid=58770178|movie={cd381236-53bd-4119-b2ce-315dae932782|"Donnie Darko"}|paid=true
02112|2015-01-29|20:40:37.210|105.107.203.34|custid=49774177|cntry=Colombia|city=Palmito|movie={ba1c48ed-d9ac-4bcb-be5d-cf3afbb61f04|"Lagaan: Once Upon a Time in India"}|paid=false
00408|2014-06-24|03:44:33.612|172.149.175.30|custid=29613035|cntry=Iran|city=Bastak|movie={3d022c51-f87f-487a-bc7f-1b9e5d138791|"The Shining"}|paid=false
03568|2015-01-07|11:36:50.52|82.81.202.22|custid=27515249|cntry=Philippines|city=Magallanes|movie={ad3ae2b4-496e-4f79-a6dd-202ec932e0ae|"Inglourious Basterds"}|paid=true
After server upgrades, some log files have the following structure:
0448|2015-04-07|01:50:5.35|27.248.247.174|custid=613068|cntry=Iran|city=Sarĕb|movie={50fb37b-621-484e-a565-2b5c1cbdc43|"Network"}|paid=false|ua=Mozilla/5.0 (Windows NT 5.1)
02780|2014-12-28|08:14:58.685|17.2.236.233|custid=731|cntry=Greece|city=Néa Róda|movie={1876aea0-3cb5-4c7a-22f-d33f233210|"Full Metal Jacket"}|paid=true|ua=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
03353|2015-04-20|21:02:40.532|143.48.11.171|custid=83736441|cntry=Russia|city=Mozhaysk|movie={67272f85-bfc-418a-82ea-a7c4ae6b028a|"Gangs of Wasseypur"}|paid=true|ua=Mozilla/5.0 (iPad; CPU OS 5_1 like Mac OS X)
04073|2014-10-25|15:33:03.442|87.235.48.100|custid=861028|cntry=Indonesia|city=Lamalera|movie={4a511f3-6367-4017-874e-50a46f5ea567|"Shutter Island"}|paid=false|ua=Mozilla/5.0 (X11; Linux x86_64)
02170|2015-02-1|23:36:40.271|25.14.204.46|custid=1240203|cntry=Albania|city=Lukovë|movie={2047efa-22c6-431c-87d4-ca73af1034|"The Grapes of Wrath"}|paid=false|ua=Mozilla/5.0 (Windows NT 6.1)
The data format varies, and some of the data has drifted to a different location.
The following image shows the data variations:
This image shows differences in the expected input format for log data. The date format differs in different input files, and some data has drifted to a different location.
Instead of manually creating individual transformations, your team can create an intelligent structure model to determine the relevant data sets. You create an intelligent structure in Intelligent Structure Discovery and automatically identify the structure of the data.
The following image shows the intelligent structure that you create:
This image shows the intelligent structure that you create from a web log input file with a hierarchy of nodes. In the top row, table is the parent of element. In the second row, element is the parent of number, datetime, IP, custid, cntry, city, movie, and paid. In the third row, datetime is the parent of date and time, custid is the parent of custid, cntry is the parent of cntry, city is the parent of city, movie is the parent of movie, and paid is the parent of paid. In the fourth row, movie is the parent of movie. In the fourth row, movie is the parent of GUID and value.
When you examine the data, you realize that the first element in the model, number, actually represents the user transaction identification. You change the element name to transactionId.
The following image shows the updated intelligent structure:
This image shows the intelligent structure after you rename the number node to transactionId. The intelligent structure in the Visual Model tab and the output data relates to each node in the Relational Output tab.
After you save the intelligent structure as an intelligent structure model, you create a Structure Parser transformation and assign the model to it. You can add the transformation to a Data Integration mapping with a source, target, and other transformations. After the mapping fetches data from a source connection, such as Amazon S3 input buckets, the Structure Parser processes the data with an intelligent structure model. The transformation passes the web log data to downstream transformations for further processing, and then to a target, such as Amazon S3 output buckets.