I am very new to Qubole.We recently migrated Oracle ebiz data to Saleforce.We have both Ebiz and Salesforce data in the Qubole Data Lake.There are some discrepancies between Ebiz and Salesforce.What is the technology I can use on Qubole to find these discrepancies?
1
There are 1 best solutions below
Related Questions in QUBOLE
- Pyspark error- Invalid argument, not a string or column
- How to view log file in qubole
- How do you write a presto query to split a string into its own column
- Presto Pivoting Data
- need regexp_extract help, beginner
- Data comparisons in Qubole
- Insert overwrite doesn't delete all the old data files
- Retrieve value in an array of an array with struct
- Query Qubole data in Python
- How to safely insert parameters into a SQL query and get the resulting query?
- Exclude records with certain values in Qubole
- How to connect UiPath to Qubole Hive cluster and run a query
- How to get Python in Qubole to save CSV and TXT files to Azure data lake?
- Result-set inconsistency between hive and hive-llap
- How to change the timeout value when running commands on QDS
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
This is approach I am using to compare two tables. Aggregate all metrics in two tables group by all dimensions, then compare using FULL JOIN, it will return all joined and not joined records from both tables. In such way you can found absent data in both tables and differences in metrics.
For example like this, using Hive:
Also you can easily compare in Excel instead of filtering in the WHERE.
Metrics are everything that can be aggregated. You can use some dimensions as metrics also like this
count(distinct user) as user_cntandgroup by date, site_namefor example. Query with full join will show differences. If some dimensions used in join condition can be null, use nvl() to match such rows like in my example. Of course do not use too many dimensions in the groupby, you can skip some of them and drill down only after finding discrepancies on aggregated level.After you got discrepancy in aggregations, you can drill down and compare rows not aggregated, filtered by some metrics.
See also: https://stackoverflow.com/a/67382947/2700344