The following author has published many papers from different institutes. I can link some of the alias names with a cartesian join. Is it possible to map all aliases to one primary author?
with t1 as
(
select 'jones-william-j' as alias,'jones-w' as last_name_init,'The Institute Of Cancer Research, London'as institute_name,1 as pub_count
union all select 'jones-w-j','jones-w','University Of California',42
union all select 'jones-w-j','jones-w','Western University',2
union all select 'jones-willam-j','jones-w','University Of California',2
union all select 'jones-william','jones-w','University Of California',12
union all select 'jones-william-j','jones-w','University Of California',429
union all select 'jones-william-j','jones-w','University Of San Diego',1
union all select 'jones-william-j','jones-w','Va San Diego Healthcare System',2
union all select 'jones-william-j','jones-w','Western University',7
)
select * from t1;
| alias | last_name_init | institute_name | pub_count |
|---|---|---|---|
| jones-william-j | jones-w | The Institute Of Cancer Research, London | 1 |
| jones-w-j | jones-w | University Of California | 42 |
| jones-w-j | jones-w | Western University | 2 |
| jones-willam-j | jones-w | University Of California | 2 |
| jones-william | jones-w | University Of California | 12 |
| jones-william-j | jones-w | University Of California | 429 |
| jones-william-j | jones-w | University Of San Diego | 1 |
| jones-william-j | jones-w | Va San Diego Healthcare System | 2 |
| jones-william-j | jones-w | Western University | 7 |
The cartesian join give the following 2 authors jones-w-1 and jones-w-2, is it possible to merge these into one author_id jones-w-1?
with t1 as
(
select 'jones-william-j' as alias,'jones-w' as last_name_init,'The Institute Of Cancer Research, London'as institute_name,1 as pub_count
union all select 'jones-w-j','jones-w','University Of California',42
union all select 'jones-w-j','jones-w','Western University',2
union all select 'jones-willam-j','jones-w','University Of California',2
union all select 'jones-william','jones-w','University Of California',12
union all select 'jones-william-j','jones-w','University Of California',429
union all select 'jones-william-j','jones-w','University Of San Diego',1
union all select 'jones-william-j','jones-w','Va San Diego Healthcare System',2
union all select 'jones-william-j','jones-w','Western University',7
)
select distinct
a.last_name_init||'-'||DENSE_RANK() OVER (ORDER BY a.last_name_init,a.institute_name ASC) author_id,
a.alias,
a.last_name_init,
a.institute_name,
a.pub_count
from t1 a,
t1 b
where 1=1
and a.institute_name = b.institute_name
and a.last_name_init = b.last_name_init
and a.alias <> b.alias;
The result is as follows.
| author_id | alias | last_name_init | institute_name | pub_count |
|---|---|---|---|---|
| jones-w-1 | jones-w-j | jones-w | University Of California | 42 |
| jones-w-1 | jones-willam-j | jones-w | University Of California | 2 |
| jones-w-1 | jones-william | jones-w | University Of California | 12 |
| jones-w-1 | jones-william-j | jones-w | University Of California | 429 |
| jones-w-2 | jones-w-j | jones-w | Western University | 2 |
| jones-w-2 | jones-william-j | jones-w | Western University | 7 |
The desired result would be.
| author_id | alias | last_name_init | institute_name | pub_count |
|---|---|---|---|---|
| jones-w-1 | jones-w-j | jones-w | University Of California | 42 |
| jones-w-1 | jones-willam-j | jones-w | University Of California | 2 |
| jones-w-1 | jones-william | jones-w | University Of California | 12 |
| jones-w-1 | jones-william-j | jones-w | University Of California | 429 |
| jones-w-1 | jones-w-j | jones-w | Western University | 2 |
| jones-w-1 | jones-william-j | jones-w | Western University | 7 |