Selecting 10 items based on N number types

301 Views Asked by At

I am writing a program that has to select 10 items randomly from a database . the criteria is that its type should be based on users choice .

So if the user chooses type 1 the program has to choose 10 items of type 1

if the user chooses type 1 and 2 the program has to choose 5 items from type 1 and 5 items from type 2

if the user chooses type 1 , 2 and 4 the program has to choose 3 items for each type and one of the types must have and extra item to get 10 items and which type gets an extra question has to be random.

So This has to be done for N number of types that the user can choose from .

Any suggestions on the way this can be achieved ?

By the way I am trying to build a Random generator so the result has to be random each time.

Ps: I am coding in python , but any language code is fine .

PS: I am already able to select items using SQL WHERE Clause , That is not the real problem.

the problem is once I have gotten items of each of the selected types , I have to select exactly 10 items with the above proportions , (i.e) : if 4 types are chosen 2 items for each type and and exactly 2 extra items in any two of the types .

3

There are 3 best solutions below

2
On

One way is to scan the DB in a single pass, building up your ten selecions. Here's some pseudo code:

  • Query the whole database for records meeting the user's criteria
  • Build-up the first ten encountered in a results list
  • If any additional records arise, decide randomly whether to include it (the n-th item should be included with a probability of P(10/n)): if random() < 10.0 / n: ...
  • If it it is to be included, randomly choose a previous selection to swap out: i = randrange(10)

In Python:

result = []
for n, row in enumerate(cursor.execute(user_query), 1):
    if n <= 10:
        result.append(row)
    elif random() < 10.0 / n:
        i = randrange(10)
        result[i] = row

Alternatively, if the database non-queryable, is huge, and has a high percentage of records expected to match the user criteria, you could randomly select records from the entire DB until you've found ten unique records that match the criteria:

result = set()
while len(result) < 10:
     record = random.choice(entire_db)
     if record not in result and matches(record, user_criteria):
          result.add(record)

The latter algorithm is one of the two used by Python's own random.sample() function.

If you can run queries though and the number of matching records can fit in memory, the whole thing boils down to:

random.sample(list(cursor.execute(user_query)), 10)
0
On

Basically you need to get it into this form:

SELECT item
FROM items
WHERE type = ...
ORDER BY RANDOM()
LIMIT 10

See Select random from an SQLite table. This should work for PostgreSQL too.

This well let the database do most of the work for you, and all you need to do is figure out how to pass your types in as arguments, which is dependent on your database library.

For multiple, I'd probably just run several queries or dynamically generate one with multiple unions.

You can do that like this:

SELECT item
FROM items
WHERE type = type1
ORDER BY RANDOM()
LIMIT 5
UNION ALL
SELECT item
FROM items
WHERE type = type2
ORDER BY RANDOM()
LIMIT 5

Which can be generated by Python like this:

types = ("type1", "type2", "type3")
limit = 10 // len(types) # be careful, for 11 or more types, this will set the limit to 0
sql = """
    SELECT item
    FROM items
    WHERE type = ?
    ORDER BY RANDOM()
    LIMIT %s
""" % (limit)
unioned_sql = " UNION ALL ".join([sql for i in range(len(types))])
result = cursor.execute(unioned_sql, types)

The syntax might be wrong -- I haven't used the DB API in a while. It should give the idea though.

You mentioned that a problem is that you need to select exactly 10 items, including extras if the proportion doesn't work out to exactly 10, so maybe it would be better to hard-code the limit at 10 (select up to 100 items), and then trim the list after getting it from the DB.

0
On

Some things you might find useful:


Python

random.choice(seq)

Return a random element from the non-empty sequence seq.

random.sample(population, k)

Return a k length list of unique elements chosen from the population
sequence. Used for random sampling without replacement.

Usage:

>>> import random
>>> random.choice([1, 2, 3, 4, 5])  # Choose a random element
3
>>> random.sample([1, 2, 3, 4, 5],  3)  # Choose 3 elements
[4, 1, 5]

SQL

SQL WHERE Clause

The WHERE clause is used to extract only those records that fulfill a specified criterion.

Usage:

SELECT * FROM Items
WHERE Type=1