I have some crawled content in Hbase table (via Nutch). I have written to process a table and output its stats into a new table via mapreduce job. Following is the code snippet of MR job.
NutchJob job = NutchJob.getInstance(getConf(), "customJob");
// === Map ===
DataStore<String, WebPage> pageStore = StorageUtils.createWebStore(
job.getConfiguration(), String.class, WebPage.class);
Query<String, WebPage> query = pageStore.newQuery();
query.setFields(StorageUtils.toStringArray(FIELDS)); // Note: pages without
// these fields are
// skipped
LOG.info( "Table before mapper: " + job.getConfiguration().get(Nutch.CRAWL_ID_KEY ) );
GoraMapper.initMapperJob(job, pageStore, Text.class, WebPage.class,
TableCopy.Mapper2.class, true);
job.setNumReduceTasks(1);
job.getConfiguration().set(Nutch.CRAWL_ID_KEY, "txt" );
LOG.info( "Table before reducer: " + job.getConfiguration().get(Nutch.CRAWL_ID_KEY ) );
DataStore<String, WebPage> hostStore = StorageUtils.createWebStore(
job.getConfiguration(), String.class, WebPage.class);
GoraReducer.initReducerJob(job, hostStore, MarkerUpdateReducer2.class);
job.waitForCompletion(true);
There are two tables one given at common line and second is hard coded ("txt") in this case. My intention is to create reducer datastore with some new table name so that I can store data there. But what happens is that in mapper the table "txt" is processed and as there is not data in this table, job do noting. Following is the log snippet
019-10-15 15:38:05,007 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-10-15 15:38:07,028 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'a_webpage'Assuming they are the same.
2019-10-15 15:38:07,647 INFO marker.TableCopy - Table before mapper: a
2019-10-15 15:38:07,738 INFO marker.TableCopy - Table before reducer: txt
2019-10-15 15:38:07,775 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:08,316 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,401 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,453 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,491 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,604 INFO marker.TableCopy - map table: txt
2019-10-15 15:38:09,869 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
I have printed table name in setup method. It is shown text as given in above logs "map table: txt". Actual table ins "a"