SQL alchemy - specifying relationships when foreign key may not exist

4.4k Views Asked by At

I need to set up two tables in a database and I'm struggling to decide how to design the tables in SQL Alchemy.

Table 1 contains raw address data, and the source of the address. Raw addresses may appear more than once if they come from different sources.

Table 2 contains geocoded versions of these addresses. Each address appears only once. Addresses should only appear in this table if they appear at least once in Table 1

When new addresses come into the system, they first will be inserted into Table 1. I will then have a script that looks for records in Table 1 that are not in Table 2, geocodes them and inserts them into Table 2.

I have the following code:

class RawAddress(Base):
    __tablename__ = 'rawaddresses'

    id = Column(Integer,primary_key = True)
    source_of_address = Column(String(50))

    #Want something like a foreign key here, but address may not yet exist
    #in geocoded address table
    full_address = Column(String(400)) 


class GeocodedAddress(Base):
    __tablename__ = 'geocodedaddresses'

    full_address = Column(String(400), primary_key = True)
    lat = Column(Float)
    lng = Column(Float)

Is there a way of establishing the relationship between the full_address fields in SQL Alchemy? Or perhaps I've got the design wrong - maybe every time I seen a new raw address I should add it to the GeocodedAddress table, with a flag saying whether it's geocoded or not?

Thanks very much for any help with this.

1

There are 1 best solutions below

0
On BEST ANSWER

Taking into account your comments, the code which would allow such data storage as well as insert/update process should do the job. Few comments before it:

  • Foreign Keys can be NULL, so your FK idea still works.
  • You can define the relationship on any model, and name the other side with backref

Code:

# Model definitions

class RawAddress(Base):
    __tablename__ = 'rawaddresses'

    id = Column(Integer, primary_key=True)
    source_of_address = Column(String(50))

    full_address = Column(
        ForeignKey('geocodedaddresses.full_address'),
        nullable=True,
    )


class GeocodedAddress(Base):
    __tablename__ = 'geocodedaddresses'

    full_address = Column(String(400), primary_key=True)
    lat = Column(Float)
    lng = Column(Float)

    raw_addresses = relationship(RawAddress, backref="geocoded_address")

now:

# logic

def get_geo(full_address):
    " Dummy function which fakes `full_address` and get lat/lng using hash(). "
    hs = hash(full_address)
    return (hs >> 8) & 0xff, hs & 0xff


def add_test_data(addresses):
    with session.begin():
        for fa in addresses:
            session.add(RawAddress(full_address=fa))


def add_geo_info():
    with session.begin():
        q = (session
             .query(RawAddress)
             .filter(~RawAddress.geocoded_address.has())
             )
        for ra in q.all():
            print("Computing geo for: {}".format(ra))
            lat, lng = get_geo(ra.full_address)
            ra.geocoded_address = GeocodedAddress(
                full_address=ra.full_address, lat=lat, lng=lng)

and some tests:

# step-1: add some raw addresses
add_test_data(['Paris', 'somewhere in Nevada'])
print("-"*80)

# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)

# step-1: again with 1 new, 1 same
add_test_data(['Paris', 'somewhere in Chicago'])
print("-"*80)

# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)


# check: print all data for Paris geo:
gp = session.query(GeocodedAddress).filter(GeocodedAddress.full_address == 'Paris').one()
assert 2 == len(gp.raw_addresses)
print(gp.raw_addresses)