This post continues where S3 Data Loss Prevention with Encryption left. Using encryption will help to avoid acciendially exposing data from S3 to internet, but it would still be very easy to leak, the data you are processing, on purpose to internet/external parties.

To process data you need to decrypt it, so encryption doesn’t help here. Technically some processing could be done using encrypted data but I leave homomorphic encryption out-of-scope of this post and focus on how one can prevent sending data out from the VPC/account on which processing is done.

First thing is to block (unrestriced) access to internet. This is easy, just remove internet gateway from VPC. Unfortunately this will also block access to AWS S3 API, that is in the internet, making it impossible to access any S3 bucket from VPC. Luckily there is a solution for this, S3 endpoint(s) will restore S3 access.

Next thing is to configure bucket policy that will allow access only from your VPC. If access is not controlled, one could write a lambda function out-side of VPC to read decrypted content and copy it to anywhere in the internet.

  "Statement": [
    {
      "Sid": "Access-to-specific-VPC-only",
      "Principal": "*",
      "Action": "s3:*",
      "Effect": "Deny",
      "Resource": ["arn:aws:s3:::examplebucket",
                   "arn:aws:s3:::examplebucket/*"],
      "Condition": {
        "StringNotEquals": {
          "aws:sourceVpc": "vpc-111bbb22"
        }
      }
    }
  ]

Things are starting to look pretty safe by now, but we are not done yet. As EC2 or Lambda inside VPC can access S3 (and read decrypted data) it could also write a copy of it into another bucket hosted on another AWS account. This can not be denied with IAM policy as you don’t have any control over resources owned by foreign AWS account.

But you can attach endpoint policy to S3 endpoint and white-list available buckets. Note that simply allowing access to S3 in endpoint policy isn’t sufficient but you must also have IAM and bucket level permissions.

  "Statement": [
    {
      "Sid": "Access-to-specific-bucket-only",
      "Principal": "*",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": ["arn:aws:s3:::examplebucket",
                   "arn:aws:s3:::examplebucket/*"]
    }
  ]

Now S3 access has been limited to pre-defined buckets and external internet access has been denied. Users who don’t have permissions for S3 and/or network configuration can not send data out from the account but can still read and write from within the VPC.

In summary I did

  1. Blocked access to internet from VPC.
  2. Configured a bucket policy to allow access only from known VPC(s).
  3. Granted access to S3 API with S3 endpoint.
  4. Blocked access to foreign buckets via S3 endpoint using endpoint policy.

While above solution works, it can be difficult to maintain. If you would need to grant access to a bucket from all vpcs in account, or part of AWS Organization, bucket policy will become hard to maintain. Another challenge is maintaining endpoint policy white-list of buckets as there is no way define “all buckets in given account”.