Database design question regarding performance

90 Views Asked by At

I need help with deciding on a DB design approach. We're building a translation tool with Hanami (Ruby web framework) and thus ROM. We are facing the design decision of having one DB (Postgresql) table for translation records, where each record is for one source and one target language combination. However, source and target may be any language: EN-DE, FR-EN.

The other possibility would be DB table per language pair.

We currently have about 1.500.000 legacy records. We will not reach 2.000.000 soon, but still, we need to consider it.

We are inclined to the first option, but would it be feasible in terms of querying and performance? The main difference being, for option one matching languages must be queried first, and then the query for corresponding translation string is triggered.

Would there be a significant difference in performance for between both options?

Thank you

seba

2

There are 2 best solutions below

3
On BEST ANSWER

The first approach will be the most flexible since you will be able to add language combinations in future without schema changes. The second approach would mean you add a table for every language combination which would both be a maintenance nightmare and complex code to query multiple tables (which can also mean dynamic queries resulting in poor performance)

PostgreSQL should be able to handle 1500000 records like a breeze provided you have enough hardware and have done proper performance configurations. I have worked with PostgreSQL tables with 50 million rows and it performs well.

0
On

You can do normalizing your db schema and avoiding redundant data.

In many cases it is convenient to have several small tables instead of one huge table. But it depends on our system and there is not a single possible solution.

We must also use the indexes responsibly. We should not create indexes for each field or combination of fields, since, although we do not have to travel the entire table, we are using disk space and adding overhead to write operations.

Another very useful tool is the management of connection pool. If we have a system with a lot of load, we can use this to avoid saturating the connections in the database and to be able to reuse them.