GETTING STARTED

This section contains simple basic tutorials how to use HGrid247

1.Introduction

This section describes the information and tools that will be used to create new projects and workflows in HGrid247. Workflow that must be done is basic. Where in the input file will be filtered by word, then will be calculated count to count the number of each word.

2.System Requirement

  • Windows/Linux Operating System
  • Java 1.8
  • 4 GB RAM
  • 10 GB disk
  • HGrid247-2.3.5
  • Hadoop system
    • Hadoop 2.x
    • Single Node Hadoop on virtual machine
    • Hadoop cluster

3.Input Data

In this workflow, it will use input file will be a paragraph, which will be filtered based on it.

Sample data:

The First Input File

 

1. Triangle (Triangle)

image-triangle

Triangular or triangular sometimes written in English Triangle is a form of three sides of a straight line and three corners. Number three angles in a triangle on a plane is 180 degrees. Triangle has various forms;

2. Square (Square)

square image

Square or in English is called Square is a flat two-dimensional wake formed by four ribs of the same length and have four angles, all of which is a right angle. This wake formerly known as squares. The sum of all angle is 360 degrees. Rectangular own formula I will write the next articles.

 

4.Step by Step

The example of basic workflow is WordCount, which count the number of words in a paragraph.

4.1.Create Project

Create Project
Click File, then click New Project.

Type the project name. Then click Save.

4.2.Create New Workflow

On the left tab, click project name that we have created before. Right click on the project name under the Source Package. Then click New Workflow.

Type the workflow name.

Click Next, then click Finish.

Klik plus sign “+” on Assembly Process Palette and Source/Sink Palette to show the assembly icons

It will appear like this.

To use assembly, click the icon then drag and drop to workflow area.

4.3.Create Input

First, click Hfs_source_1 on Source/Sink Palette that will used as the input file, then drag and drop to workflow area.

Right click on Hfs_source_1, then click Edit Label and change the node label with “input”.

 


Then, click OK.

4.4.Add Transformator_read_record

Add a transformator from Assembly Process Palette to input field.

Right click on Transformator_1, then click Edit Label and change the node label with “read record”.

Then, click OK.

Connect the input with the read_record by put the cursor on the input, then click and drag a line to read_record.

Then, double click on read_record to open it. It will appear like this.

Right click on OutputRecord to add field, then click Add Output Field(s).

Type the field name. Then, click OK.

Connect the Input Record and Output Record with SplitTranspose. SplitTranspose is on the transpose submenu.

SplitTranspose is used to change the column order for line by choosing the column value based on string delimiter that is determined on parameter.
Double click on SplitTranspose. On the Splitter String, type “\\W+” that used to separate word value into words that saved to a field, so it can be counted. Click OK.

Connect Input Record with SplitTranspose, then continue to Output Record. Put the cursor on the side of the field name, then drag a line to SplitTranspose.

Then, click OK.

4.5.Add Filter

Add Filter_1 on the workflow area to filter the output, and connect it with the read_record.

Right click on Filter_1, then click Edit Label and change the node label with “Filter”.

Double click on Filter to open Filter, then set these values:
Field: WORD
Operation: Equal
Value: triangle (for example, the word that filtered is “triangle”)

Then, click OK.

4.6.Add Transformator_count

Add Transformator_2 to add field, connect it with Filter.

Right click on Transformator_2, then click Edit Label and change the node label with “count”.

Open count and add Output Field, type “count” as the field name, and select integer as the data type. This field will be used to count the number of words.
Then connect it with ConstantValue with “count” field. ConstantValue is used to set a value, on this ConstantValue, set the value with 1.

Then, click OK.

4.7.Add GroupBy_Group_word

Add GroupBy_1 to workflow area, to group the field. Connect it with count transformator.

Right click on Group_1, then click Edit Label and change the node label with “Group_word”.

Double click to open it. On the Group Fields, choose the WORD field. It means, it will be group the data on the WORD field.

Then, on the Sort Field, select COUNT field. On the Order column, select Ascending. It means the data will be sorted from smallest to the largest number.

Then, click OK.

4.8.Add Aggregator_sum_record

Add Aggregator_1 assembly, connect it with Group_word.

Right click on Aggregator_1, then click Edit Label and change the node label with “sum_record”.

Double click on sum_record. Connect the count field on the Input Record with the count field on the Output Record using Sum.

Then, click OK.

4.9.Add Output

Add Hfs_sink_1 to show the result.

Right click on Hfs_sink_1, then click Edit Label and change the node label with “output”.

4.10.Click save icon on menu bar

4.11.Set the workflow directory

4.12.Click MapReduce Generate jar

Click MapReduce Generate jar from workflow icon

If success, it will show an information like this.

4.13.Copy the jar file and input file to server

Then run the jar on the server with this command:


hadoop fs –put  
 

Make sure the writing of the filename. It is a case-sensitive. It will give you an error if the filename is wrong.
Example:

4.14.Run the jar file on the HDFS


hadoop jar  . <pathHDFS/input_name> <pathHDFS/output_name>
 

4.15.Open the Output Directory

After the process is done, open the output directory to check if the output file exist with this command:


hadoop fs –ls <pathHDFS/output_name>
 

If it is appear like picture above, it means the output file is exist. And to check if the content is match with the filter, use this command:


hadoop fs –cat <pathHDFS/output_name/file_name>
 

or


hadoop fs –cat <pathHDFS/output_name/h*>
 

If the output appears, then the process is done.

Suggest Edit

© 2017 Labs247. All rights reserved. Terms & Conditions | Policies
Apache Hadoop and associated open source project names are trademarks of the Apache Software Foundation. For a complete list of trademarks, click here.