A Data Scientist Approach-Running Postgres SQL in Docker

Database is the backbone of any modern application

In this short tutorial, I explain the steps to set up postgresSQL local instance running in docker and using python to interact with the Data base. Below steps have been taken to set up the process.

  1. Setting up docker for postgresql
  2. Connecting postgress with Pgadmin4
  3. Interacting with Postgresql using Python

I have used Ubuntu 18.04 and Python 3.7


1. Postgres in Docker

There are many database frameworks available to choose for data scientists. There are many factors to be considered while choosing the best framework for your intended applications. Some of them are

  • Do your application needs Relational DB?
  • What would be the size of DB and tables count?
  • Where the DB will reside?
  • Is open source matters?
  • Are the tools available to interact with DB from your front end?

Since my application is a dummy one and I wanted to use postgres, choice was easier. The next decision is whether to use docker environment or not. Again it depends on your application, however docker choice is always better when you have plan to scale up your application.

I am using postgres docker developned and maintained by bitnami. Follow below link and set up the postgres running in docker.

https://github.com/bitnami/bitnami-docker-postgresql

Below is my docker-compose.yml file.

docker-compose= “””

version: ‘2’

services: postgresql12: image: ‘bitnami/postgresql:12’ ports: - ‘5432:5432’ volumes: - ‘postgresql_data_12:/bitnami/postgresql12’ environment: - POSTGRESQL_USERNAME=postgres - POSTGRESQL_PASSWORD=pwd0123456789 - POSTGRESQL_DATABASE=my_database - ALLOW_EMPTY_PASSWORD=yes

volumes: postgresql_data_12: driver: local

”“”

To run docker-compose follow below setps.

  1. Install docker if not done before (google it)
  2. Install docker-compose if not done before
  3. Run docker-compose up from the directory where docker-compose.yml file exists.

As you can see, docker is running locally on port 5432. This information is usefull when we connect this istance in python.

2. Connecting to Postgres

Once your docker is up and running, the next task is to connect to running instance. There are many options, below are some of them.

  1. Using PgAdmin4
  2. Using psql Client tool
  3. Using Python

Follow below Article to connect to local postgres instance for option 1 and 2.

https://medium.com/better-programming/connect-from-local-machine-to-postgresql-docker-container-f785f00461a7

Since, I work mostly with python therefore option # 3 is my favourite one.

3. Interacting with Postgres using Python

In this section, we learn how to interact with local running postgres using python.

Code Credits : https://auth0.com/blog/sqlalchemy-orm-tutorial-for-python-developers/

Follow the above link with detailed description of below code line by line.

# To install sqlalchemy
! pip install sqlalchemy

# Import objects
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

from datetime import date

# Settings as shown in docker-compose.yml
engine = create_engine('postgresql://postgres:pwd0123456789@localhost:5432/my_database')
Session = sessionmaker(bind=engine)

Base = declarative_base()

from sqlalchemy import Column, String, Integer, Date, Table, ForeignKey, Boolean
from sqlalchemy.orm import relationship, backref
# from base import Base
movies_actors_association = Table(
    'movies_actors', Base.metadata,
    Column('movie_id', Integer, ForeignKey('movies.id')),
    Column('actor_id', Integer, ForeignKey('actors.id'))
)

class Movie(Base):
    __tablename__ M= 'movies'

    id = Column(Integer, primary_key=True)
    title = Column(String)
    release_date = Column(Date)
    actors = relationship("Actor", secondary=movies_actors_association)

    def __init__(self, title, release_date):
        self.title = title
        self.release_date = release_date

class Actor(Base):
    __tablename__ = 'actors'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    birthday = Column(Date)

    def __init__(self, name, birthday):
        self.name = name
        self.birthday = birthday

class Stuntman(Base):
    __tablename__ = 'stuntmen'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    active = Column(Boolean)
    actor_id = Column(Integer, ForeignKey('actors.id'))
    actor = relationship("Actor", backref=backref("stuntman", uselist=False))

    def __init__(self, name, active, actor):
        self.name = name
        self.active = active
        self.actor = actor

class ContactDetails(Base):
    __tablename__ = 'contact_details'

    id = Column(Integer, primary_key=True)
    phone_number = Column(String)
    address = Column(String)
    actor_id = Column(Integer, ForeignKey('actors.id'))
    actor = relationship("Actor", backref="contact_details")

    def __init__(self, phone_number, address, actor):
        self.phone_number = phone_number
        self.address = address
        self.actor = actor

Inserting Data Into DB

Follow below code block to insert data into our database

from datetime import date

# 2 - generate database schema
Base.metadata.create_all(engine)

# 3 - create a new session
session = Session()

# 4 - create movies
bourne_identity = Movie("The Bourne Identity", date(2002, 10, 11))
furious_7 = Movie("Furious 7", date(2015, 4, 2))
pain_and_gain = Movie("Pain & Gain", date(2013, 8, 23))

# 5 - creates actors
matt_damon = Actor("Matt Damon", date(1970, 10, 8))
dwayne_johnson = Actor("Dwayne Johnson", date(1972, 5, 2))
mark_wahlberg = Actor("Mark Wahlberg", date(1971, 6, 5))

# 6 - add actors to movies
bourne_identity.actors = [matt_damon]
furious_7.actors = [dwayne_johnson]
pain_and_gain.actors = [dwayne_johnson, mark_wahlberg]

# 7 - add contact details to actors
matt_contact = ContactDetails("415 555 2671", "Burbank, CA", matt_damon)
dwayne_contact = ContactDetails("423 555 5623", "Glendale, CA", dwayne_johnson)
dwayne_contact_2 = ContactDetails("421 444 2323", "West Hollywood, CA", dwayne_johnson)
mark_contact = ContactDetails("421 333 9428", "Glendale, CA", mark_wahlberg)

# 8 - create stuntmen
matt_stuntman = Stuntman("John Doe", True, matt_damon)
dwayne_stuntman = Stuntman("John Roe", True, dwayne_johnson)
mark_stuntman = Stuntman("Richard Roe", True, mark_wahlberg)

# 9 - persists data
session.add(bourne_identity)
session.add(furious_7)
session.add(pain_and_gain)

session.add(matt_contact)
session.add(dwayne_contact)
session.add(dwayne_contact_2)
session.add(mark_contact)

session.add(matt_stuntman)
session.add(dwayne_stuntman)
session.add(mark_stuntman)

# 10 - commit and close session
session.commit()
session.close()

Extracting data from DB


# other imports and sections...

# 5 - get movies after 15-01-01
movies = session.query(Movie) \
    .filter(Movie.release_date > date(2015, 1, 1)) \
    .all()

print('### Recent movies:')
for movie in movies:
    print(f'{movie.title} was released after 2015')
print('')

# 6 - movies that Dwayne Johnson participated
the_rock_movies = session.query(Movie) \
    .join(Actor, Movie.actors) \
    .filter(Actor.name == 'Dwayne Johnson') \
    .all()

print('### Dwayne Johnson movies:')
for movie in the_rock_movies:
    print(f'The Rock starred in {movie.title}')
print('')

# 7 - get actors that have house in Glendale
glendale_stars = session.query(Actor) \
    .join(ContactDetails) \
    .filter(ContactDetails.address.ilike('%glendale%')) \
    .all()

print('### Actors that live in Glendale:')
for actor in glendale_stars:
    print(f'{actor.name} has a house in Glendale')
print('')
### Recent movies:
Furious 7 was released after 2015

### Dwayne Johnson movies:
The Rock starred in Pain & Gain
The Rock starred in Furious 7

### Actors that live in Glendale:
Dwayne Johnson has a house in Glendale
Mark Wahlberg has a house in Glendale

Conclusions

In this brief tutorial, we learned how to setup local postgres server using docker and interacting with it using python.

Since, postgres is running in docker, we can deploy this docker to any cloud with minimal changes. We have to replace the localhost with ip address of target machine and everything will work as it is.

Avatar
Amjad Raza, PhD
Quantitative Researcher

My research interests include data driven quantitative decision, machine learning, deep learning, blockchain and forecasting.