Developer Transformation Guide > Classifier Transformation > Classifier Analysis Example
  

Classifier Analysis Example

You are a data steward at a software company that released a new smartphone application. The company wants to understand the public response to the application and the media coverage it receives. The company asks you and your team to analyze social media comments about the application.
You decide to capture data from twitter feeds that discuss smartphones. You use the twitter application programming interface to filter the twitter data stream. You create a data source that contains the twitter data you want to analyze.
Because the twitter feeds contain messages in multiple languages, you must identify the language used in each message. You decide to use a Classifier transformation to analyze the languages. You create a mapping that identifies the languages in the source data and writes the twitter messages to English and non-English data targets.

Create the Classifier Mapping

You create a mapping that reads a data source, classifies the languages in the data, and writes the data to targets based on the languages they contain.
The following image shows the mapping in the Developer tool:
The mapping contains a data source object, a Classifier transformation, a Router transformation, and two data target objects. The mapping writes data to either target object based on the language used in the data.
The mapping you create contains the following objects:
Object Name
Description
Read_tweet_user_lang
Data source.
Contains the twitter messages
Classifier
Classifier transformation.
Identifies the languages used in the twitter messages.
Router
Router transformation.
Routes the twitter messages to data target objects according to the languages they contain.
Write_en_tweets_out
Data target.
Contains twitter messages in English.
Write_other_tweets_out
Data target.
Contains non-English-language twitter messages.

Input Data Sample

The following data fragment shows a sample of the twitter data that you analyze in the mapping:
Twitter Message
RT @GanaphoneS3: Faltan 10 minutos para la gran rifa de un iPhone 5...
RT @Clarified: How to Downgrade Your iPhone 4 From iOS 6.x to iOS 5.x (Mac)...
RT @jerseyjazz: The razor was the iPhone of the early 2000s
RT @KrissiDevine: Apple Pie that I made for Thanksgiving. http://t.com/s9ImzFxO
RT @sophieHz: Dan yang punya 2 kupon undian. Masuk dalam kotak undian yang berhadiah Samsung
RT @IsabelFreitas: o galaxy tem isso isso isso e a bateria à melhor que do iPhone
RT @PremiusIpad: Faltan 15 minutos para la gran rifa de un iPhone 5...
RT @payyton3: I want apple cider
RT @wiesteronder: Retweet als je iets van Apple, Nike, Adidas of microsoft hebt!

Data Source Configuration

The data source contains a single port. Each row on the port contains a single twitter message.
The following table describes the configuration of the data source:
Port Name
Port Type
Precision
text
n/a
200

Classifier Transformation Configuration

The Classifier transformation uses a single input port and output port. The transformation input port reads the text field from the data source. The output port contains the language identified for each twitter message in the text field. The Classifier transformation uses ISO country codes to identify the language.
The following table describes the configuration of the Classifier transformation:
Port Name
Port Type
Precision
Strategy
text_input
Input
200
Classifier1
Classifier_Output
Output
2
Classifier1

Router Transformation Configuration

The Router transformation uses two input ports. It reads the twitter messages from the data source and the ISO country codes from the Classifier transformation. The Router transformation routes the data on the input ports to different output ports based on a condition that you specify.
The following image shows the Router transformation port groups and port connections:
The mapping contains a data source object, a Classifier transformation, a Router transformation, and two data target objects. The Router transformation is expanded in the mapping editor to display the input ports and two output port groups.
The following table describes the configuration of the Router transformation:
Port Name
Port Type
Port Group
Precision
Classifier_Output
Input
Input
2
text
Input
Input
200
Classifier_Output
Input
Default
2
text
Input
Default
200
Classifier_Output
Input
En_Group
2
text
Input
En_Group
200
You configure the transformation to create data streams for English-language messages and for messages in other languages. To create a data stream, add an output port group to the transformation. Use the Groups options on the transformation to add the port group.
To determine how the transformation routes data to each data stream, you define a condition on a port group. The condition identifies a port and specifies a possible value on the port. When the transformation finds an input port value that matches the condition, it routes the input data to the port group that applies the condition.
Define the following condition on the En_Group:
ClassifierOutput='en'
Note: The Router transformation reads data from two objects in the mapping. The transformation can combine the data in each output group because it does not alter the row sequence defined in the data objects.

Data Target Configuration

The mapping contains a data target for English-language twitter messages and a target for messages in other languages. You connect the ports from a Router transformation output group to a data target.
The following table describes the configuration of the data targets:
Port Name
Port Type
Precision
text
n/a
200
Classifier_Output
n/a
2

Classifier Mapping Outcome

When you run the mapping, the Classifier transformation identifies the language of each twitter message. The Router transformation writes the message text to data targets based on the language classifications.
The following data fragment shows a sample of the English-language target data:
ISO Country Code
Twitter Message
en
RT @Clarified: How to Downgrade Your iPhone 4 From iOS 6.x to iOS 5.x (Mac)...
en
RT @jerseyjazz: The razor was the iPhone of the early 2000s
en
RT @KrissiDevine: Apple Pie that I made for Thanksgiving. http://t.com/s9ImzFxO
en
RT @payyton3: I want apple cider
The following data fragment shows a sample of the target data identified for other languages:
ISO Country Code
Twitter Message
es
RT @GanaphoneS3: Faltan 10 minutos para la gran rifa de un iPhone 5...
id
RT @sophieHz: Dan yang punya 2 kupon undian. Masuk dalam kotak undian yang berhadiah Samsung Champ.
pt
RT @IsabelFreitas: o galaxy tem isso isso isso e a bateria à melhor que do iPhone
es
RT @PremiusIpad: Faltan 15 minutos para la gran rifa de un iPhone 5...
nl
RT @wiesteronder: Retweet als je iets van Apple, Nike, Adidas of microsoft hebt! http://t.co/Je6Ts00H