Retrieving Row Key from HBase table in Talend

1.2k Views Asked by At

I'm a novice to talend. I'm trying to read data from hbase and make some transformations on the data in expression builder using big data batch and write the output to a file. enter image description here

now i want to get the row key of the table and apply transformations on it like below,

(concat('-',cast(cus.key as string))) as id

Here key is the rowkey of hbase table which i'm drawing data from.

And im attaching the snapshot of the mapping tab.

enter image description here

So when i basically run my job, the key of the hbase table should be picked up so that the above transformation cast(cus.key as string) should be applied on the rowkey and stored as a column id.

I want to know whether do we have any easy method to get the rowkey from the hbase table?

Thanks in advance.

2

There are 2 best solutions below

2
On

First of all you need to create a custom rowkey (in the hbaseoutput option) when you load your data in Hbase.

You can use some ID field in order to make it unique like "key"+user_id.

Follow this : Here

At the same time you do that, store the same value ("key"+user_id) in a column that you name row_key_technical (for example)

Now you can use the rowkey like a normal column in your table. So with an thbaseinput you can retreive the rowkey store in the technical column and do whatever you want.

You need to do it in two time.

I'm not sure this is the only solution but it's one. Mybe someone have a better solution ;) .

0
On

You can force your HbaseInput component to fetch the rowkey of the Hbase table. Do the following, go the location where you have the tHbaseInput class exists.

C:\Program Files (x86)\Talend-Studio\studio\plugins\org.talend.designer.components.mrprovider_6.2.1.20160704_1411\components\tHBaseInput

And in the tHBaseInput_mrcode_main_only java jet class, There will be a method validateResult(), like below

    public boolean validateResult(org.apache.hadoop.hbase.client.Result result,
                    <%=recordStruct%> value) throws IOException {
                org.apache.hadoop.hbase.io.ImmutableBytesWritable rowKey = new org.apache.hadoop.hbase.io.ImmutableBytesWritable();
                rowKey.set(result.getRow());
                lastSuccessfulRow = rowKey.get();

                byte[] rowResult = null;
                String temp = null;

                <%
                for (int i = 0; i < mapping.size(); i++) {
                    Map<String, String> map = mapping.get(i);
                    String family_column= map.get("FAMILY_COLUMN");
                    IMetadataColumn column = mainColumns.get(i);
                    String columnName = column.getLabel();
                    String defaultValue = column.getDefault();
                    String typeToGenerate = JavaTypesManager.getTypeToGenerate(column.getTalendType(), column.isNullable());
                    JavaType javaType = JavaTypesManager.getJavaTypeFromId(column.getTalendType());
                    String patternValue = column.getPattern() == null || column.getPattern().trim().length() == 0 ? null : column.getPattern();
                    boolean isPrimitiveType = JavaTypesManager.isJavaPrimitiveType(javaType, column.isNullable());
                    String toAssign = "value." + columnName;

                    %>

                    rowResult = result.getValue(
                            org.apache.hadoop.hbase.util.Bytes.toBytes(<%=family_column%>),
                            org.apache.hadoop.hbase.util.Bytes.toBytes("<%=column.getOriginalDbColumnName()%>"));
                    temp = org.apache.hadoop.hbase.util.Bytes.toString(rowResult);

Modify the above method to below

public boolean validateResult(org.apache.hadoop.hbase.client.Result result,
            <%=recordStruct%> value) throws IOException {
        org.apache.hadoop.hbase.io.ImmutableBytesWritable rowKey = new org.apache.hadoop.hbase.io.ImmutableBytesWritable();
        rowKey.set(result.getRow());
        lastSuccessfulRow = rowKey.get();

        byte[] rowResult = null;
        String temp = null;
        value.key = org.apache.hadoop.hbase.util.Bytes.toString(lastSuccessfulRow);
        <%
        for (int i = 0; i < mapping.size(); i++) {
            Map<String, String> map = mapping.get(i);
            String family_column= map.get("FAMILY_COLUMN");
            IMetadataColumn column = mainColumns.get(i);
            String columnName = column.getLabel();
            String defaultValue = column.getDefault();
            String typeToGenerate = JavaTypesManager.getTypeToGenerate(column.getTalendType(), column.isNullable());
            JavaType javaType = JavaTypesManager.getJavaTypeFromId(column.getTalendType());
            String patternValue = column.getPattern() == null || column.getPattern().trim().length() == 0 ? null : column.getPattern();
            boolean isPrimitiveType = JavaTypesManager.isJavaPrimitiveType(javaType, column.isNullable());
            String toAssign = "value." + columnName;

            %>
            if(!"key".equalsIgnoreCase("<%=column.getOriginalDbColumnName()%>"))

Once done, delete the file "ComponentsCache.javacache" in C:\Program Files (x86)\Talend-Studio\studio\configuration. And restart the talend open studio. Now your tHbaseInput Component will fetch the row key from Hbase table. This may not be advisable for every case, but if you are using talend open studio to generate jobs and deploy the jars elsewhere, this might be helpful.

Thanks to my project manager.