Select query with any two or n parameters

43 Views Asked by At

My Postgres table holds details of places where people visit. I am using a select query to select the locations.

SELECT maid,hlat,hlon,wlat,wlon FROM peopledata WHERE (  ( placecategory ILIKE '%Grocerypharmacydairy%' OR   placecategory ILIKE '%Marketarea%' OR   placecategory ILIKE '%Shoppingmall%'  ) OR( geobehavior ILIKE '%Grocerypharmacydairy_Visitor%' OR   geobehavior ILIKE '%Marketarea_Visitor%' OR   geobehavior ILIKE '%Shoppingmall_Visitor%'  ) OR( gender ILIKE '%both%' ) AND( age between '25' and '60'  )) AND polygonid IN (2333)

In the above query, I have three placecategory and 3 geobehavior parameters. Through a different input, I am also capturing details of how many placecategory or geobehavior should be matched. For example,

placecategory : 2
geobehavior : 3

If the n value for placecategory is 2, I want to select records matching 'any' two of these parameters. For example, it could be

Grocerypharmacydairy AND Marketarea  OR
Grocerypharmacydairy AND Shoppingmall OR
Marketarea AND Shoppingmall

Depending on the number of locations specified, I want to fetch the records that match at least the n-parameters specified.

1

There are 1 best solutions below

0
Julius Tuskenis On

So the question boils down to "how to find an intersection of 2 sets of text" (and find the rank of it).

First we have to extract categories from the placecategory field. For the sake of example lets say it contains "ShoppingMall, SuperMarket, Pharmacy, Bar". PostgreSQL 8.3+ has regexp_split_to_table for that.

SELECT regexp_split_to_table('ShoppingMall, SuperMarket, Pharmacy, Bar', ', ') as category

Returns:

category    |
------------+
ShoppingMall|
SuperMarket |
Pharmacy    |
Bar         |

Now lets assume you want to get the intersection with "Pharmacy, Bar, Barberry"

SELECT 
  count(*) as intersection_rank
FROM
  (SELECT DISTINCT 
     regexp_split_to_table(
       'ShoppingMall, SuperMarket, Pharmacy, Bar', 
       ', ') as val ) as category_from_peopledata
  JOIN 
  (SELECT DISTINCT 
     regexp_split_to_table(
       'Pharmacy, Bar, Barberry', 
       ', ') as val) as required_categories ON category_from_peopledata.val = required_categories.val

The intersection is "Pharmacy, Bar" so the result is as expected:

intersection_rank|
-----------------+
                2|

I use DISTINCT to guard against the duplicates in categories or the required_categories. If you ensure the categories does not have duplicates in the list - you may remove DISTINCT.

You may now use the query in WHERE clause to check the number of categories that matches and filter the peopledata table.

The same principle goes for the geobehavior attribute.

CAUTION

Just be aware, that if you expect the peopledata table to grow big, you will face performance issues. Searching for the records with "Bar" in placecategory will be performed using seqscan.