How can I optimize the code for big QStringList?

258 Views Asked by At

The following code does not seem to finish in tolerable time if list2 contains more than 200,000 entries.

    QStringList list1; 
    QStringList list2=getalist();
    int count=list2.count();
    for(int j=0;j<count;j++)
    {
        QString e=list2[j];
        if(!list1.contains(e,Qt::CaseInsensitive))
        {
            list1<<e;
        }
    }

What is the bottleneck and how can I optimize it?

2

There are 2 best solutions below

1
On BEST ANSWER

The bottleneck is the choice of QStringList as the container for that task. It is a QList<QString> and I believe it uses linear search to implement the function contains.

The best solution would be to use a tree-based container like std::set or a hash based container like QSet (yes, it it hash based contrary to the std::set). From Qt documentation:

QSet is one of Qt's generic container classes. It stores values in an unspecified order and provides very fast lookup of the values. Internally, QSet is implemented as a QHash.

2
On

Your code appears to update list1 to be the union of all unique elements in list1 and list2 subject to a case insensitive compare. So something like (untested)...

QStringList list1;
QStringList list2 = getalist();

struct case_insensitive_compare {
    bool operator() (const QString &l, const QString &r) const
    {
        return l.compare(r, Qt::CaseInsensitive) < 0;
    }
};

std::set<QString, case_insensitive_compare> set1(list1.begin(), list1.end());
set1.insert(list2.begin(), list2.end());
list1 = QStringList(set1.begin(), set1.end());