I'm working on an application that is location specific -- think of it as a store locator where store owners enter their address information and other users can only see nearby stores within a certain range. However, it's a little different in the sense that an exact location is not required, only the city/state is required (weird, I know). I have thought about the schema for storing locations, and have decided on this one.
Locations
id -- int
formatted_address -- varchar(200)
is_point_of_interest -- bool
name -- varchar(100) -- NULL
street_number -- varchar(10) -- NULL
street -- varchar(40) -- NULL
city -- varchar(40)
state -- varchar(40)
state_code -- varchar(3)
postal_code -- varchar(10)
country -- varchar(40)
country_code -- varchar(3)
latitude -- float(10,6)
longitude -- float(10,6)
last_updated_at -- timestamp
Here are some notes about the application:
- I want to keep the door open for international locations
- I plan to use a geocoding service to search for and validate the locations specified by the store owner
- I truly only need the lat/lon, but the other data is necessary for displaying store information
- The formatted_address field will contain the fully formatted address -- e.g., Giants Stadium, 50 NJ-120, East Rutherford, NJ 07073, USA -- to allow for easier searching of stored locations
- There will possibly be a lot of duplicate fields, because each row may have a different level of granularity -- for instance,
123 Main Street, City, State 12345
is different fromMain Street, City, State 12345
because one has a specified street number and the other doesn't
I understand that the schema is not very normalized, but I also do not see the need to normalize it any more because locations are very complex, which is why I'm relying on a stable geocode service (google). Also, I plan to allow freeform text input/search, so theres no need for any dropdown lists.
Does anybody see anything wrong or have any improvements, taking into consideration what I've mentioned? I can see this table growing rather large.
I do not think so. Here is my two-minute synopsis:
This very badly normalized. At least
city
->country
should be moved out to a different table (and normalized from there). I believe postal codes can cross city boundaries though (or I am very badly misremembering); I am not aware of such a city that crosses a state boundary.formatted_address
is an "optimization" and should likely be a computed field: that is, all the data to re-create it should exist elsewhere. (This means that it doesn't need to worried about now.)Happy designing.
The simple "more-normalized" form just doing the above proposed:
Of course, CITIES can be further normalized, and so could a POSTALS table: I don't know enough about postal codes, or the application domain though.
postal_code
acts as part of an implicit compound-surrogate-FK so it's not super terrible as it is there. However, moving it into a separate table could easily allow verification and integrity constraints.Edit: Normalizing a POSTALs table would be best, as only a very samll number of postal codes are valid for a given city: I am not sure the relation between a postal code and a city, though, so I can't recommend how to do this. Perhaps look at existing schemas used?