Skip to content
Get Started for Free

Querying S3 Tables with Snowflake

In this tutorial, you will explore how to connect Snowflake to AWS S3 Tables locally using LocalStack. S3 Tables is a managed Apache Iceberg table catalog that uses S3 storage, providing built-in maintenance features like automatic compaction and snapshot management.

With LocalStack’s Snowflake emulator, you can create catalog integrations that connect to S3 Tables and query Iceberg tables without needing cloud resources. This integration allows you to:

  • Create catalog integrations to connect Snowflake to S3 Tables.
  • Query existing Iceberg tables stored in S3 Tables buckets.
  • Leverage automatic schema inference from external Iceberg tables.

Start your LocalStack container with the Snowflake emulator enabled.

Terminal window
export LOCALSTACK_AUTH_TOKEN=<your_auth_token>
localstack start --stack snowflake

Before configuring Snowflake, you need to create S3 Tables resources using the AWS CLI. This includes a table bucket and a namespace.

Create a table bucket to store your Iceberg tables.

Terminal window
awslocal s3tables create-table-bucket --name my-table-bucket
Output
{
"arn": "arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket"
}

Create a namespace within the table bucket to organize your tables.

Terminal window
awslocal s3tables create-namespace \
--table-bucket-arn arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket \
--namespace my_namespace
Output
{
"tableBucketARN": "arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket",
"namespace": [
"my_namespace"
]
}

To query data from Snowflake using CATALOG_TABLE_NAME, the S3 Tables table must have a defined schema and contain data. Use PyIceberg to create a table with schema and populate it with data.

First, install the required Python packages:

Terminal window
pip install "pyiceberg[s3fs,pyarrow]" boto3

Create a Python script named setup_s3_tables.py with the following content:

import pyarrow as pa
from pyiceberg.catalog.rest import RestCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType, LongType
# Configuration
LOCALSTACK_URL = "http://localhost.localstack.cloud:4566"
S3TABLES_URL = "http://s3tables.localhost.localstack.cloud:4566"
TABLE_BUCKET_NAME = "my-table-bucket"
NAMESPACE = "my_namespace"
TABLE_NAME = "customer_orders"
REGION = "us-east-1"
# Create PyIceberg REST catalog pointing to S3 Tables
catalog = RestCatalog(
name="s3tables_catalog",
uri=f"{S3TABLES_URL}/iceberg",
warehouse=TABLE_BUCKET_NAME,
**{
"s3.region": REGION,
"s3.endpoint": LOCALSTACK_URL,
"client.access-key-id": "000000000000",
"client.secret-access-key": "test",
"rest.sigv4-enabled": "true",
"rest.signing-name": "s3tables",
"rest.signing-region": REGION,
},
)
# Define table schema
schema = Schema(
NestedField(field_id=1, name="order_id", field_type=StringType(), required=False),
NestedField(field_id=2, name="customer_name", field_type=StringType(), required=False),
NestedField(field_id=3, name="amount", field_type=LongType(), required=False),
)
# Create table in S3 Tables
catalog.create_table(
identifier=(NAMESPACE, TABLE_NAME),
schema=schema,
)
print(f"Created table: {NAMESPACE}.{TABLE_NAME}")
# Reload the table to get the latest metadata
table = catalog.load_table((NAMESPACE, TABLE_NAME))
# Populate table with sample data
data = pa.table({
"order_id": ["ORD001", "ORD002", "ORD003"],
"customer_name": ["Alice", "Bob", "Charlie"],
"amount": [100, 250, 175],
})
table.append(data)
print("Inserted sample data into table")
# Verify table exists
tables = catalog.list_tables(NAMESPACE)
print(f"Tables in namespace: {tables}")

Run the script to create the table and populate it with data:

Terminal window
python setup_s3_tables.py
Output
Created table: my_namespace.customer_orders
Inserted sample data into table
Tables in namespace: [('my_namespace', 'customer_orders')]

Connect to the locally running Snowflake emulator using an SQL client of your choice (such as DBeaver). The Snowflake emulator runs on snowflake.localhost.localstack.cloud.

You can use the following connection parameters:

ParameterValue
Hostsnowflake.localhost.localstack.cloud
Usertest
Passwordtest
Accounttest
Warehousetest

Create a catalog integration to connect Snowflake to your S3 Tables bucket. The catalog integration defines how Snowflake connects to the external Iceberg REST catalog provided by S3 Tables.

CREATE OR REPLACE CATALOG INTEGRATION s3tables_catalog_integration
CATALOG_SOURCE=ICEBERG_REST
TABLE_FORMAT=ICEBERG
CATALOG_NAMESPACE='my_namespace'
REST_CONFIG=(
CATALOG_URI='http://s3tables.localhost.localstack.cloud:4566/iceberg'
CATALOG_NAME='my-table-bucket'
)
REST_AUTHENTICATION=(
TYPE=AWS_SIGV4
AWS_ACCESS_KEY_ID='000000000000'
AWS_SECRET_ACCESS_KEY='test'
AWS_REGION='us-east-1'
AWS_SERVICE='s3tables'
)
ENABLED=TRUE
REFRESH_INTERVAL_SECONDS=60;

In the above query:

  • CATALOG_SOURCE=ICEBERG_REST specifies that the catalog uses the Iceberg REST protocol.
  • TABLE_FORMAT=ICEBERG indicates the table format.
  • CATALOG_NAMESPACE='my_namespace' sets the default namespace to query tables from.
  • REST_CONFIG configures the connection to the LocalStack S3 Tables REST API endpoint.
  • REST_AUTHENTICATION configures AWS SigV4 authentication for the S3 Tables service.
  • REFRESH_INTERVAL_SECONDS=60 sets how often Snowflake refreshes metadata from the catalog.

Create an Iceberg table referencing S3 Tables

Section titled “Create an Iceberg table referencing S3 Tables”

Create an Iceberg table in Snowflake that references the existing S3 Tables table using CATALOG_TABLE_NAME. The schema is automatically inferred from the external table.

CREATE OR REPLACE ICEBERG TABLE iceberg_customer_orders
CATALOG='s3tables_catalog_integration'
CATALOG_TABLE_NAME='my_namespace.customer_orders'
AUTO_REFRESH=TRUE;

In the above query:

  • CATALOG references the catalog integration created in the previous step.
  • CATALOG_TABLE_NAME specifies the fully-qualified table name in the format namespace.table_name.
  • AUTO_REFRESH=TRUE enables automatic refresh of table metadata.
  • No column definitions are needed as the schema is inferred from the existing S3 Tables table.

You can now query the Iceberg table like any other Snowflake table. The schema (columns) are automatically available from the external table.

SELECT * FROM iceberg_customer_orders;
Output
+----------+---------------+--------+
| order_id | customer_name | amount |
+----------+---------------+--------+
| ORD001 | Alice | 100 |
| ORD002 | Bob | 250 |
| ORD003 | Charlie | 175 |
+----------+---------------+--------+

In this tutorial, you learned how to integrate AWS S3 Tables with Snowflake using LocalStack. You created S3 Tables resources, populated a table with data using PyIceberg, configured a catalog integration in Snowflake, and queried Iceberg tables stored in S3 Tables buckets using CATALOG_TABLE_NAME.

The S3 Tables integration enables you to:

  • Query data stored in S3 Tables using familiar Snowflake SQL syntax.
  • Leverage automatic schema inference from external Iceberg catalogs.
  • Develop and test your data lakehouse integrations locally without cloud resources.

LocalStack’s Snowflake emulator combined with S3 Tables support provides a complete local environment for developing and testing multi-platform data analytics workflows.