Dockerizing a large MSSQL Server database

Andrew Malone
Smartbox engineering
6 min readNov 8, 2019

--

Picture this, you land yourself in a team that use locally or cloud-hosted MSSQL server databases for development. Now imagine all the developers on your team are sharing one hosted development database and all of them are making significant schema or data changes in tandem. That is where it can get messy. You will undoubtedly experience conflicts due to changes made by another while you’re working away on your bug, feature or investigations. It’s at this point you need to get your own development environment and be releasing your changes to testing environments when you’re finished developing.

All aboard the docker train.

Make sure you consider exactly what kind of setup your team will need prior to getting started. For the application I work on, when investigating the best solution for containerizing our database, I determined that my instance cannot be seeded with fake or test data. The developers require real production data(not customer data) to be able to test, debug, develop and investigate issues. This solution posed the following problems.

  1. The database is very large(approx. 40GB) so restoring the database will require some time to create the docker container.
  2. The developer will need to be able to tear down their container and create a new container with a more up-to-date copy of production as they please.

In your case you need to ask yourself a few questions:

  • What is the size of my database?
  • Is the initial setup required to be easy for team members to spin up their environment?
  • Where will I be getting the database file from to create the database in the docker container?
  • Will I need the most up-to-date copy of the production database when creating my container?

Objective

Dockerize a large MSSQL server database from a hosted database backup. Allow a developer to simply create a local development environment that mirrors the production database.

High-level overview of architecture

Get your database backup file

If you need to get a backup of a database and wish to use the .bak file to create a database in your container, I recommend setting up a job that does the following:

  1. Truncate your tables that contain logs or other unnecessary tables information to decrease the database size.
  2. Shrink the database.
  3. Create a backup that is kept somewhere that will be available to your container e.g a shared drive on your network or a cloud-hosted repository.

In my instance, I have a job that is putting a .bak file onto a shared drive that I will be able to retrieve using smbclient.

The Docker project

Docker-compose yaml file

version: '3.7'services:db:
container_name: YOURCONTAINERNAME
build:
context: ./Docker/
args:
- USERNAME=$USERNAME
- PASSWORD=$PASSWORD
- SA_PASSWORD=$SA_PASSWORD
environment:
- SA_PASSWORD=${SA_PASSWORD}
- ACCEPT_EULA=${ACCEPT_EULA}
ports:
- '1433:1433'

As you can see we are using a docker compose file. This is a YAML file that is used for defining and running multi-container docker applications e.g a database and a .NET application both containerised and running within docker.

  • Build is where you define the context (directory with your dockerfile) and the arguments you want to provide to the build.
  • Environment allows you to pass environment variables e.g ACCEPT_EULA which you will need for MSSQL server.
  • Ports are required to be defined so that you can map your host machine to a port on the container. This will be the port used when you want to connect to the DB using SQL Server Management Studio.

When your docker-compose file is created, you will be able to create your container by navigating to the directory where your docker-compose file is located and running the commanddocker-compose up.

Dockerfile

FROM mcr.microsoft.com/mssql/server:2017-latestARG USERNAME
ARG PASSWORD
ARG SA_PASSWORD
RUN apt-get update && apt-get install -y smbclient && \
smbclient //<IP-ADDRESS>/<DIRECTORY>/ -U EXAMPLE\\$USERNAME@example.com%$PASSWORD -c 'cd FULL ; lcd tmp/ ; get PIM.bak'
RUN apt-get update && \
apt-get install -yq curl apt-transport-https && \
# Get official Microsoft repository configuration
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
curl https://packages.microsoft.com/config/ubuntu/16.04/mssql-server-2017.list | tee /etc/apt/sources.list.d/mssql-server.list && \
apt-get update && \
apt-get install -y mssql-server-fts && \
# Cleanup the Dockerfile
apt-get clean && \
rm -rf /var/lib/apt/lists
COPY entrypoint.sh /tmp
COPY restore-database.sql /tmp
CMD exec /bin/bash /tmp/entrypoint.sh $SA_PASSWORD

Here’s the fun stuff. We are doing the following:

1. Getting the latest MSSQL Server image.

2. Passing arguments from our docker compose file into the dockerfile and using them during the process of the container being built.

3. Next, we install SMB client. This will be used to connect to the windows share directory where our .bak file is located and perform a file transfer.

4. We run `smbclient` and define the address of the share drive and the drive within it. We then provide our username and password as arguments and then we run a command to `change directory` into the directory where our backup file is located and also setting the directory we will be placing the file into within the container.

5. Installing Full-Text search for SQL Server.

6. We copy a bash script and a .sql script into the container

7. We then run a command within the container to start up SQL Server and then when it has started, we run the restore script within the container using `SQLCMD`.

Restore database script

RESTORE DATABASE [YOURDB] FROM DISK = '/tmp/<YOURDB>.bak'WITH FILE = 1,MOVE '<YOURDB>' TO '/var/opt/mssql/data/<YOURDB>.mdf',MOVE '<YOURDBData>' TO '/var/opt/mssql/data/<YOURDBData>.ndf',MOVE '<YOURDB_log>' TO '/var/opt/mssql/data/<YOURDB_log>.ldf'GO

This file will be run using SQLCMD in within our container to restore the database

Bash script to Start SQL Server and restore the database

#!/bin/bash#start SQL Serversh -c "echo 'Sleeping 20 seconds before running setup script'sleep 20echo 'Starting setup script'/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P ${1} -i /tmp/restore-database.sqlecho 'Finished setup script'exit" &exec /opt/mssql/bin/sqlservr

One of the problems you may have is that SQL server may not be completely finished starting when you try and run sqlcmd. This is why we include a sleep operation in our command. In our dockerfile we are passing the password of as an argument to the bash script and triggering the restore-database.sql script.

.Env file

We are using a .env file to pass our credentials as variables to the docker-compose file. Docker compose will automatically look for a .env file.

SA_PASSWORD=<YOUR-DB-PASSWORD>ACCEPT_EULA=YUSERNAME=<YOUR-USERNAME>PASSWORD=<YOUR-PASSWORD>

Folder structure

Folder structure of the application
- docker-compose.yml- .env file- Docker(FOLDER)- Dockerfile- entrypoint.sh- restore-database.sql

The folder structure is important. As you defined a build directory in your docker-compose file, you need to ensure that all the files that your dockerfile uses are in the directory you defined in your ‘build’.

Conclusion

When creating your dockerized database I recommend using a small sample database to keep your debugging cycles as short as possible. I used the Microsoft AdventureWorks2017 sample database and in the dockerfile, I copied the sample database into the /tmp folder of my container. When I was ready and successfully restored the sample database in the MSSQL Server container, I switched the restore-database.sql file command over to using the backup file retrieved by smbclient.

Check out the Github repository for the project

https://github.com/atmalone/DockerMSSQLServer

--

--