Hbase Mapreduce Job using wrong table name in maper

40 Views Asked by At

I have some crawled content in Hbase table (via Nutch). I have written to process a table and output its stats into a new table via mapreduce job. Following is the code snippet of MR job.

NutchJob job = NutchJob.getInstance(getConf(), "customJob");

// === Map ===
DataStore<String, WebPage> pageStore = StorageUtils.createWebStore(
    job.getConfiguration(), String.class, WebPage.class);
Query<String, WebPage> query = pageStore.newQuery();
query.setFields(StorageUtils.toStringArray(FIELDS)); // Note: pages without
                                                     // these fields are
                                                     // skipped
LOG.info( "Table before mapper: " + job.getConfiguration().get(Nutch.CRAWL_ID_KEY ) );

GoraMapper.initMapperJob(job, pageStore, Text.class, WebPage.class,
        TableCopy.Mapper2.class, true);

job.setNumReduceTasks(1);


job.getConfiguration().set(Nutch.CRAWL_ID_KEY, "txt" );
LOG.info( "Table before reducer: " + job.getConfiguration().get(Nutch.CRAWL_ID_KEY ) );

DataStore<String, WebPage> hostStore = StorageUtils.createWebStore(
        job.getConfiguration(), String.class, WebPage.class);


GoraReducer.initReducerJob(job, hostStore, MarkerUpdateReducer2.class);

job.waitForCompletion(true);

There are two tables one given at common line and second is hard coded ("txt") in this case. My intention is to create reducer datastore with some new table name so that I can store data there. But what happens is that in mapper the table "txt" is processed and as there is not data in this table, job do noting. Following is the log snippet

019-10-15 15:38:05,007 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-10-15 15:38:07,028 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'a_webpage'Assuming they are the same.
2019-10-15 15:38:07,647 INFO  marker.TableCopy - Table before mapper: a
2019-10-15 15:38:07,738 INFO  marker.TableCopy - Table before reducer: txt
2019-10-15 15:38:07,775 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:08,316 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,401 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,453 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,491 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,604 INFO  marker.TableCopy - map table: txt
2019-10-15 15:38:09,869 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.

I have printed table name in setup method. It is shown text as given in above logs "map table: txt". Actual table ins "a"

0

There are 0 best solutions below