Using the Database#

Load, query, join, write, and copy tables.


Introduction#

In this tutorial, we’ll make a database connection, load or import tables, and work with them.

To begin, we’ll make a database connection.

import pisces as ps

# connect with Oracle
session = ps.db_connect(user='scott', backend='oracle', server='my.server.edu', 
                        port=1521, instance='mydb')

# connect with sqlite 
session = ps.db_connect(backend='sqlite', instance='/path/to/mydb.sqlite')

You’ll need to substitute in your own username, of course.

db_connect will prompt you for your password and returns a SQLAlchemy Session instance, which manages database transactions. It can also take a single fully-qualified database URL (e.g. oracle://scott:tiger@my.server.edu:1521/mydb or sqlite:///path/to/mydb.sqlite), which is passed to sqlalchemy.create_engine. The returned session is bound to the database through the .bind attribute, which is a SQLAlchemy Engine instance that specifies the database location and drivers.


Loading Tables#

Tables can be loaded in one of three ways:

  1. Imported from your own table definition modules

  2. Arbitrary tables reflected from the database

  3. Arbitrary tables reflected from the database, with enhancements

Number 1 is the recommended way, as it returns a class with useful methods like to_trace for waveform tables, default value instantiation, and easy flat file conversion. Number 2 returns a class with none of the above enhancements. It is good for quick table loading when you have no prior knowledge of the table or you don’t have a table prototype to use for it. Number 3 returns a class with default value instantiation and easy flat file writing, but no other Pisces-specific class methods.

Import your defined tables#

Import tables/classes from modules where you previously defined them, as in the [previous tutorial page](Using the Database). This is the preferred way to load tables, as it offers the most flexibility and functionality.

import from mytables import Site, Wfdisc, Origin

Arbitrary tables#

If you only know the table name (and account), you can load a table using get_tables:

# Oracle account.table syntax
Site, Affil, Origin = ps.get_tables(session.bind, ['user.site', 'user.affiliation', 
                                                    'user.origin'])

# SQLite syntax, no account
Site, Affil, Origin = ps.get_tables(session.bind, ['site', 'affiliation', 'origin'])

Loaded tables are just Python classes defined using SQLAlchemy’s Object Relational Mapper (ORM). Table classes loaded together share a .metadata. MetaData is a Python representation of underlying SQL objects, and can be used to issue create/add/drop statements to the database. To add tables loaded later to the same MetaData, use the metadata= flag with the existing instance from one of the earlier-loaded tables:

Wfdisc, Assoc = ps.get_tables(session.bind, ['user.wfdisc', 'user.assoc'], metadata=Site.metadata)

# check the table names in the metadata
print Site.metadata.tables.keys()

Tables present in your MetaData can be seen in its .tables dictionary:

['user.affiliation',  'user.site',  'user.wfdisc',  'user.origin',  'user.assoc']

Tables must have a primary key#

SQLAlchemy’s ORM needs a primary key for the session to keep track of rows. If a database table is actually just a view of other tables, or a primary key was never specified, we can specify the primary key in a dictionary and the primary_keys= keyword.

try:
    Sitechan, = ps.get_tables(session.bind, ['user.sitechan'])
except ArgumentError:
    # failed because user.sitechan has no primary key.  we can force one with primary_keys
    Sitechan, = ps.get_tables(session.bind, ['user.sitechan'], 
                                primary_keys={'user.sitechan': ['chanid']})

Arbitrary enhanced tables#

When the base= option is used with a Declarative Base class that comes with Pisces, loaded classes get the following enhancements:

  • The default values for all fields in a table row (class instance) are filled whether or not the underlying database defines default values.

  • The class can form its own string representation (flat file row).

  • Instances of the class can iterate over values, and in the correct order.

Note: this will only work if the field names in the table being loaded have been used with this base before.

# load table classes, matching column names with those of a known schema

from pisces.schema.css3 import Base

Site, Affil, Origin = ps.get_tables(session.bind, ['global.site','global.affiliation', 'global.origin'], base=Base)

Base is the parent class for all pre-defined table prototypes in a given schema that comes with Pisces.


Querying Tables#

Querying and editing tables in Pisces is the same as using SQLAlchemy’s ORM. Follow SQLAlchemy’s tutorial to learn more about how this works.

# query all TA stations installed in 2009
q = session.query(Site).filter(Site.ondate.between(2009001, 2009365))
ta_sites = q.filter(Site.sta == Affil.sta).filter(Affil.net == 'TA').all()

# query for a subset of western US earthquakes, sorted by mb
q = session.query(Origin).filter(Origin.lat.between(35, 45)).filter(Origin.mb > 4)
wus_quakes = q.filter(Origin.lon.between(-115, -105)).order_by(Origin.mb).all()

# plot 'em
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
m = Basemap(llcrnrlon=-130, llcrnrlat=25, urcrnrlon=-80, urcrnrlat=60, resolution='i')
m.etopo()

for ista in ta_sites:
    x, y = m(ista.lon, ista.lat)
    m.plot(x, y, 'b^')

for io in wus_quakes:
    x, y = m(io.lon, io.lat)
    m.plot(x, y, 'r*')

plt.title("{} TA stations and {} quakes mb > 4".format(len(ta_sites), len(wus_quakes)))

western US

Query-builders#

Pisces has a set of intuitive query-building functions for common seismic queries in pisces.request.

import pisces.request as req

# repeat the previous query for a subset of western US earthquakes
wus_quakes = req.get_events(session, Origin, region=(-115, -105, 35, 45), mag={'mb': (4, None)})

# to sort by mb, ask for the query back and do your sort
wus_quakes = req.get_events(session, Origin, region=(-115, -105, 35, 45), mag={'mb': (4, None)}, asquery=True).order_by(Origin.mb).all()

Pisces uses NumPy/ObsPy to do distance subsets, which are done out-of-database and can be memory intensive. They can be done in-database, if you have a stored function “xkm_distance”, for example, that calculates lateral distances.

# stations <= 200 km from lat 42, lon -110, out-of-database
sites = req.get_stations(session, Site, km=(42, -110, 0, 200))

# in-database "xkm_distance(lat1, lon1, lat2, lon2)" function
from sqlalchemy import func
sites = req.get_stations(session, Site, asquery=True).filter(func.xkm_distance(Site.lat, Site.lon, 42, -110).between(0, 200)).all()

Editing tables#

# add Albuquerque ANMO and the Chelyabinsk bolide
ANMO = Site(sta='ANMO', lat=34.9459, lon=-106.4572, elev=1.85)
bolide = Origin(orid=1, lat=55.15, lon=61.41, mb=2.7, etype='xm')
session.add_all([ANMO, bolide])
session.commit()

# edit a Site, delete an Origin
session.query(Site).filter(Site.sta == 'MK31').update({'lat': 42.5})
session.query(Origin).filter(Origin.orid = 1001).delete()
session.commit()
session.close()

Note that not all fields/attributes were specified when creating “ANMO” or “bolide”, but their default values are filled in.