Data Security and Privacy: Data at-rest encryption approaches

Narayana Pattipati
Myntra Engineering
Published in
9 min readJun 26, 2023

--

Introduction

Any organisation which collects, stores and processes sensitive data about users, vendors and employee is mandated by country specific laws to protect the data. Sensitive data such as personal health information (PHI), personally identifiable information (PII) includes financial information, medical records, social security or ID numbers, names, birthdates, and contact information.

While data privacy is focused on defining policies (e.g. who has access to specific data), data protection is focused on tools, technologies and methods to employ the policies and protect sensitive data. Data protection regulations govern how certain data types are collected, stored, transmitted, processed and used.

Data-at-rest encryption is one of the key data protection compliance requirements for an organisation to protect privacy by securing sensitive data including PII and PHI. This blog describes strategies and approaches organisations can adopt to achieve the data at-rest encryption, along with trade-offs.

Data Encryption

Basics of Data Encryption

  • Encryption involves converting human-readable plaintext into incomprehensible text, known as ciphertext.
  • Encryption key is typically a random string of bits generated specifically to convert plaintext into ciphertext and vice-versa. Encryption keys are created with algorithms designed to ensure that each key is unique and unpredictable.
  • Key Vault or Key Management System (KMS) provides key management and access control for the encryption keys, tokens, passwords, certificates, and API keys.
  • Symmetric Encryption involves a single key to encrypt and decrypt the data. It is faster.
  • Asymmetric encryption uses different keys for encryption and decryption. It is slower compared to symmetric encryption
  • Envelope Encryption is the practice of encrypting plaintext data with a data key (typically known as DEK — data encryption key), and then encrypting the data encryption key using another key (typically known as KEK — Key encryption Key or MEK — Master Encryption Key).
  • Key Rotation is the practice of retiring earlier keys and replacing them with new sets of keys frequently as per information security policies.

Different states of data

Data owned by an organisation can be classified into the following three states.

  • Data at-rest
  • Data in-motion / in-transit
  • Data in-use

And the data never resides in one state permanently during its life cycle, till it is purged completely. The data moves from one state to another state. For example, when data is persisted to a database (which further stores on a storage medium), it is said to be in at-rest state. When a client requests data from a database or wants to persist the data to a database, the data is transferred over the network, and its state during the transfer is said to be data-in-motion. And data in-use refers to active data which is stored in a non-persistent digital state typically in computer random-access memory, CPU caches, or CPU registers.

Different states of data

Data in-motion / in-transit encryption

Data in-motion refers to the data that is traversing over a network. The data can be traversed between an application and storage (such as database, data warehouse, cloud storage etc.), between two services, between nodes of a distributed system (distributed databases, compute clusters etc.). The protection of data in motion is typically done by securing transport layers using SSL / TSL or mTLS.

Data in-use encryption

Data in-use refers to active data which is stored in a non-persistent digital state typically in computer random-access memory, CPU caches, or CPU registers.

Data at-rest encryption

Data at-rest refers to data persisted in computer storage in any digital form. For example, data persisted in databases, data warehouses, cloud storage (e.g. Azure Blob storage, AWS S3, Google Cloud Storage), archives, tapes or off-site backups is in at-rest state. This data is currently inactive and is not moving between devices or two network points; and no application, service, tool, or user is using this data at the moment.

Typically, the following stores are under the scope of data at-rest encryption:

  • Databases
  • Database backups
  • Data warehouses
  • Blob storage (including hot, cool, archive access tiers)
  • Data Lake (typically built on blob storage in modern data architectures)
  • Archives of the data from the above stores

Requirements for data at-rest encryption

Key functional requirements to be considered:

  • Encrypt data-at-rest irrespective of the type of data store — databases, blob storage, data warehouse, data archives, data backups
  • The encryption keys shall be secured in a Key Vault
  • Key rotation shall be performed as per the security policies
  • Ability to decrypt data which was encrypted with older version of keys (especially backups and snapshots)
  • Database replica instances shall have the data encrypted
  • Database backups / snapshots shall be encrypted (along with the version of keys used to encrypt the data at the time of backup)
  • Data replication between master and replicas (within and across the regions) shall work seamlessly
  • Data archived shall be encrypted
  • The tools for database backup / snapshots and restoration shall work seamlessly
  • The tools for ingesting data into downstream systems such as data platforms shall work seamlessly. The tools used for ingestion (e.g. change data capture (CDC)) are typically open source or 3rd party tools and it may not be practical to implement decryption
  • The encrypted data shall be made available to users for debugging with sufficient controls

Key non-functional requirements to be considered:

  • Data at rest encryption shall not have a significant impact on the query performance (latency and throughput). Up to 5–6% degradation is acceptable as encryption and decryption is expected to involve additional compute.
  • Data at rest encryption shall not have any significant impact on the database server scaling (concurrent requests, number of connections etc.)
  • No significant impact on the Database backups and restoration performance
  • No significant impact on the change data capture(CDC) performance while ingesting data into downstream systems for analytics and machine learning
  • The data encrypted at-rest shall always be available for queries (uptime 99.99). There shall never be a scenario where data is not available due to corruption or unavailability of encryption keys. This excludes outages due to database downtime or infrastructure downtime.
  • The data encrypted at-rest shall never be corrupted or lost

Approaches to achieve data at-rest encryption

There are different approaches to achieve data at-rest encryption in an organisation.

Application Layer Encryption (ALE):

  • Application or service encrypts the data, stores the encrypted data into database and storage .
  • For processing and serving, application fetches the encrypted data from database and storage, decrypts for processing and serving to other applications or services.
  • Application takes care of key management and rotation.

Database Encryption

  • Database encrypts the entire database or parts of it and handles key management and rotation

Filesystem Encryption

  • Selected file systems (files and folders) are encrypted before stored into the storage.
  • Typically done by the OS with the help of agents that intercept reads and write and perform encryption and decryption.
  • Keys are managed by the OS
  • Even if the system is booted, the data can not be retrieved from the filesystem without the decryption keys.

Disk Encryption

  • Entire disk is encrypted
  • Data is encrypted as it is written to the disk and decrypted as it is read off the disk
  • Cloud providers typically encrypt managed disks (OS, data) using AES Encryption
  • Protects against theft of physical storage media. The data can not be retrieved from storage media without decryption keys as the entire disk is encrypted.
  • The OS or users with access to the VMs or K8S pods can access PII, when the encrypted disks are attached to a VM or K8S Pods.
Different approaches to achieve data at-rest encryption

While encryption algorithms can be same (e.g. AES 256) across different approaches, each approach has pros and cons, which are detailed in he later sections against multiple dimensions.

Database and application layer encryption are popular approaches to protect data at-rest. The pros and cons of each of these approaches discussed in the later sections.

Database encryption

Many databases such as MySQL, Microsoft SQL Server (Azure Data Warehouse), Oracle, Postgres, MongoDB, Cassandra support Transparent Data Encryption (TDE). TDE enables data-at-rest encryption by encrypting files of the database. Data is encrypted automatically, in real time, prior to writing to storage and decrypted when read from storage.

TDE has been around for around a decade and is used as default security standard for data-at-rest encryption to support various compliance requirements like PCI DSS, GDPR, HIPPA and securing PII data.

TDE Key Architecture

TDE implementation can vary from database to database. Typical key architecture is as follows:

  • TDE uses envelope encryption typically
  • The data encryption key (DEK) is used to encrypt data itself (column, table or table space depending on the implementation)
  • The DEK itself is encrypted by Master Encryption Key (MEK) or Key Encryption Key (KEK)
  • The MEK or KEK is stored in a Key Vault
  • The DEK is stored on the database itself
  • The MEK is rotated periodically as the information security policies or if keys are compromised; and the DEKs are not rotated; The DEKs are re-encrypted which ensures the data need not be re-encrypted.
The TDE Key architecture
The TDE Key architecture (source link)

Key features of database encryption

  • The TDE is transparent to the application completely; no code or schema changes
  • When user or application queries the database, the database management system (DBMS) fetches encrypted data from storage, decrypts and responds back with plaintext data
  • Key management and rotation is handled by the database
  • Advanced Encryption Standard (AES) is supported by databases (e.g. MySQL, MongoDB, Cassandra)
  • DB encryption keys are secured/protected in Key Vault (MySQL, MongoDB, DataStax DSE)
  • Bin logs (MySQL, Cassandra), Oplog (MongoDB) are also encrypted.
  • Database replication (active or passive), backup / restore and archives are encrypted and work seamlessly.
  • Change Data Capture (CDC) via different tools to ingest data into data platform works seamlessly.

Application Layer encryption (ALE)

Application Layer encryption is typically custom implementation to handle encryption, decryption, key management, key rotation, providing plaintext data for debugging handled completely by the application.

  • Application performs encryption / decryption of data in its process space by bringing data from database (e.g. updates)
  • Application secures encryption keys in Key Vault
  • Key rotation will be completely handled by the application
  • A separate SDK is typically needed to handle common encryption, key management and rotations functions with support different programming languages such as Java, Go, Python, Node etc.
  • PII columns alone can be selectively encrypted
  • Database agnostic approach
  • Application and schema changes and query tuning are needed
  • Schema change includes encrypted column, DEK, version of DEK and other metadata, hashed columns corresponding to the PII columns etc. The hashed columns are used for querying data and indexes.

Comparison of Database & Application Encryption approaches

As outlined in the requirements section, encryption of data at-rest has a significant impact on the systems that store and process PII data. Hence, it is important to consider the entire lifecycle of PII data and systems that store and process it, instead of restricting the scope to just the application and its corresponding database. Criteria for comparing database and application layer encryption includes technical, functional and nonfunctional capabilities and operationalisation of the selected solution.

Database Encryption vs Application layer encryption (ALE)

Conclusion

Data encryption at-rest can be achieved using one or more approaches described above. Database Encryption and Application Layer Encryption (ALE) are two of the most popular approaches to achieve data at-rest encryption capability. The previous section describes pros and cons the two approaches, on multiple criteria, in detail.

Organisations need to evaluate the spread of the sensitive data to be protected, systems under scope, technology stack and consider one or more approaches in consultation with Information security teams.

Apart from data at-rest encryption, organisations need to consider the some of the additional complementary mechanisms to protect sensitive data.

  • Data in-motion encryption to protect sensitive data transmitted over a network
  • Role based access controls for users
  • Service to service authentication and authorization
  • Authentication and authorization of services accessing sensitive data from databases, messaging systems (e.g. Kafka) and cloud storage (e.g. Azure Blob Storage, AWS S3 and Google Cloud Storage)

Thank you Shankar Umamaheshwaran, ajay.sharma@myntra.com for the review and feedback!

--

--