I am using version 5.1 of Active Pivot, but plan to upgrade to 5.2. I would like to read data in using the CsvSource and receive realtime updates.
How do I go about using hadoop with active pivot?
718 Views Asked by Hal At
1
There are 1 best solutions below
Related Questions in ACTIVEPIVOT
- What are the argument premuations of the DrillDownMember function (SQL Server definition)?
- How to build a simple moving average measure
- Authorization by ActivePivot
- Unaccounted used memory after JVM shutdown
- ActivePivot: Parsing Multidimensional Cellset data into JSON for React-vis Charts
- How to aggregate on a column while grouping by several columns values using CoPPer?
- Python ActivePivot MDX get result as CSV
- ActivePivot retrieve CSV output by sending MDX query through python?
- How do I load data from a Hadoop Avro source into ActivePivot store?
- ActivePivot with a rules engine
- Rendering ActivePivot in a web page
- How to load MongoDB data source to ActivePivot cube?
- How can i get my aggregated exposure by identifiers across a hierarchy?
- Sum and count aggregate functions in dynamic pivot table
- SUM of dynamic Columns in PIVOT table in SQL Server
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Introduction
This article explains few things on how to read data from Hadoop into Active Pivot. This was tested with Active Pivot 5.1 and 5.2. In short, you have 2 ways to fill the gap:
Using a mounted HDFS, this makes your HDFS similar to a disk
Using Hadoop Java API
Using a mounted HDFS
You can mount your HDFS easily with certain Hadoop distributions. (Ex: mounting HDFS with Cloudera CDH 5 was easy to do.)
After doing so you will have a mount point on your Active Pivot server linked to your HDFS and it will behave like a common disk. (At least for reading, writing has some limitations)
For instance if you had csv files on your HDFS, you would be able to directly use Active Pivot Csv Source.
Using Hadoop Java API
Another way is to use Hadoop Java API: http://hadoop.apache.org/docs/current/api/
Few main classes to use:
org.apache.hadoop.fs.FileSystem- Used for common operations with Hadoop.org.apache.hadoop.conf.Configuration- Used to configure the FileSystem object.org.apache.hadoop.hdfs.client.HdfsAdmin- Can be use to watch events (Ex: new file added to HDFS)Note: Watching for events is available for Hadoop 2.6.0 and higher. For previous Hadoop you could either build your own or use a mounted HDFS with an existing FileWatcher.
Dependencies
You will need few Hadoop dependencies.
Beware there can be conflicts between Hadoop dependencies and Active Pivot ones on Jaxb.
In the following pom.xml, a solution to this was to exclude Jaxb dependencies from Hadoop dependencies.
Properties
You will need to define at least 2 properties:
Hadoop address (Ex: hdfs://localhost:9000)
HDFS path to your files (Ex: /user/quartetfs/data/)
If your cluster is secured then you will need to figure out how to access it remotely in a secured way.
Example of reading a file from Hadoop
Hadoop Source
When you are able to read from your HDFS you can now write your Hadoop Source as you would for other sources.
For instance you could create a HadoopSource implementing ISource.
And you could start it in your SourceConfig where you would retrieve your properties from your environment.
Watching for events (Ex: new files)
If you want to retrieve files as they are stored on HDFS you can create another class watching for events.
An example would be the following code in which you would have your own methods handling certain events. (Ex in the following code: onCreation(), onAppend())
What I did for my onCreation method (not shown) was to store newly created files into a concurrent queue so my HadoopSource could retrieve several file in parallel.
-
If I have not been clear enough on certain aspects or if you have questions feel free to ask.