[![CodeCov][cov-img]][cov] [![GoDoc][doc-img]][doc] [![Github release][release-img]][release] [![Go Report Card][report-card-img]][report-card] [![lic][license-img]][license] [![made][made-img]][made]
:package: athenadriver - A fully-featured AWS Athena database driver for Go
:shell: athenareader - A moneywise command line utililty to query athena in command line.
Overview
(This project is a sandbox project and the development status is STABLE.)
athenadriver
is a fully-featured AWS Athena database driver for Go developed at Uber Technologies Inc.
It provides a hassle-free way of querying AWS Athena database with Go standard
library. It not only provides basic features of Athena Go SDK, but
addresses some SDK's limitation, improves and extends it. Moreover, it also includes
advanced features like Athena workgroup and tagging creation, driver read-only mode and so on.
The PDF version of AthenaDriver document is available at :scroll:
Features
Except the basic features provided by Go database/sql
like error handling, database pool and reconnection, athenadriver
supports the following features out of box:
- Support multiple AWS authorization methods :link:
- Full support of Athena Basic Data Types
- Full support of Athena Advanced Type for queries with Geospatial identifiers, ML and UDFs
- Full support of ALL Athena Query Statements, including DDL, DML and UTILITY :link:
- Support newly added
INSERT INTO...VALUES
- Athena workgroup and tagging support including remote workgroup creation :link:
- Go sql's prepared statement support :link:
- Go sql's
DB.Exec()
anddb.ExecContext()
support :link: - Query cancelling support :link:
- Override default query timeout limits :link:
- Mask columns with specific values :link:
- Database missing value handling :link:
- Read-Only mode - disable database write in driver level :link:
- Moneywise mode :moneybag: - print out query cost(USD) for each query
- Query with Athena Query ID(QID) - (the ultimate money saver! :money_with_wings: )
- Pseudo commands from database/sql interface:
get_driver_version
,get_query_id
,get_query_id_status
,stop_query_id
,get_workgroup
,list_workgroups
,update_workgroup
,get_cost
,get_execution_report
etc :link: - Builtin logging support with zap :link:
- Builtin metrics support with tally :link:
athenadriver
can extremely simplify your code. Check athenareader out as an example and a convenient tool for your Athena query in command line.
How to set up/install/test athenadriver
Prerequisites - AWS Credentials & S3 Query Result Bucket
To be able to query AWS Athena, you need to have an AWS account at Amazon AWS's website. To
give it a shot, a free
tier account is enough. You also need to have a pair of AWS access key ID
and secret access key
.
You can get it from AWS Security Credentials section of Identity and Access Management (IAM).
If you don't have one, please create it. The following is a screenshot from my temporary free tier account:
In addition to AWS credentials, you also need an s3 bucket to store query result. Just go to
AWS S3 web console page to create one.
In the examples below, the s3 bucket I use is s3://myqueryresults/
.
In most cases, you need the following 4 prerequisites:
- S3 Output bucket
access key ID
secret access key
- AWS region
For more details on athenadriver
's support on AWS credentials & S3 query result bucket, please refer to section
Support Multiple AWS Authorization Methods.
Installation
Before Go 1.17, go get
can be used to install athenadriver:
go get -u github.com/uber/athenadriver
Starting in Go 1.17, installing executables with go get
is deprecated. go install
may be used instead.
go install github.com/uber/athenadriver@latest
Tests
We provide unit tests and integration tests in the codebase.
Unit Test
All the unit tests are self-contained and passed even in no-internet environment. Test coverage is 100%.
$ cd $GOPATH/src/github.com/uber/athenadriver/go
✔ /opt/share/go/path/src/github.com/uber/athenadriver [uber|✚ 1…12]
21:35 $ go test -coverprofile=coverage.out github.com/uber/athenadriver/go && \
go tool cover -func=coverage.out |grep -v 100.0%
ok github.com/uber/athenadriver/go 9.255s coverage: 100.0% of statements
Integration Test
All integration tests are under examples
folder.
Please make sure all prerequisites are met so that you can run the code on your own machine.
All the code snippets in examples
folder are fully tested in our machines. For example,
to run some stress and crash test, you can use examples/perf/concurrency.go
. Build it first:
$cd $GOPATH/src/github.com/uber/athenadriver
$go build examples/perf/concurrency.go
Run it, wait for some output but not all, and unplug your network cable:
$./concurrency > examples/perf/concurrency.output.`date +"%Y-%m-%d-%H-%M-%S"`.log
58,13,53,54,78,96,32,48,40,11,35,31,65,61,1,73,74,22,34,49,80,5,69,37,0,79,
2020/02/09 13:49:29 error [38]RequestError: send request failed
caused by: Post https://athena.us-east-1.amazonaws.com/: dial tcp:
lookup athena.us-east-1.amazonaws.com: no such host
...
2020/02/09 13:49:29 error [89]RequestError: send request failed
caused by: Post https://athena.us-east-1.amazonaws.com/: dial tcp:
lookup athena.us-east-1.amazonaws.com: no such host
You can see RequestError
is thrown out from the code. The active Athena queries failed because the network is down.
Now re-plugin your cable and wait for network coming back, you can see the program automatically reconnects to Athena, and resumes to output data correctly:
72,25,92,98,15,93,41,7,8,90,81,56,66,2,18,84,87,63,
44,45,82,99,86,3,52,76,71,16,39,67,23,12,42,17,4,
How to use athenadriver
athenadriver
is very easy to use. What you need to do it to import it in your code and then use the standard Go database/sql
as usual.
import athenadriver "github.com/uber/athenadriver/go"
The following are coding examples to demonstrate athenadriver
's features and how you should use athenadriver
in your Go application.
Please be noted the code is for demonstration purpose only, so please follow your own coding style or best practice if necessary.
Get Started - A Simple Query
The following is the simplest example for demonstration purpose. The source code is available at dml_select_simple.go.
package main
import (
"database/sql"
drv "github.com/uber/athenadriver/go"
)
func main() {
// Step 1. Set AWS Credential in Driver Config.
conf, _ := drv.NewDefaultConfig("s3://myqueryresults/",
"us-east-2", "DummyAccessID", "DummySecretAccessKey")
// Step 2. Open Connection.
db, _ := sql.Open(drv.DriverName, conf.Stringify())
// Step 3. Query and print results
var url string
_ = db.QueryRow("SELECT url from sampledb.elb_logs limit 1").Scan(&url)
println(url)
}
To make it work for you, please replace OutputBucket
, Region
, AccessID
and
SecretAccessKey
with your own values. sampledb
is provided by Amazon so you don't have to worry about it.
To Build it:
$ go build examples/query/dml_select_simple.go
Run it and you can see output like:
$ ./dml_select_simple
https://www.example.com/articles/553
Support Multiple AWS Authentication Methods
athenadriver
uses access keys(Access Key ID and Secret Access Key) to sign programmatic requests to AWS.
When if the AWS_SDK_LOAD_CONFIG environment variable was set, athenadriver
uses Shared Config
, respects AWS CLI Configuration and Credential File Settings and gives it even higher priority over the values set in athenadriver.Config
.
Use AWS CLI Config For Authentication
When environment variable AWS_SDK_LOAD_CONFIG
is set, it will read aws_access_key_id
(AccessID) and aws_secret_access_key
(SecretAccessKey)
from ~/.aws/credentials
, region
from ~/.aws/config
. For details about ~/.aws/credentials
and ~/.aws/config
, please check here.
But you still need to specify correct OutputBucket
in athenadriver.Config
because it is not in the AWS client config.
OutputBucket
is critical in Athena. Even if you have a default value set in Athena web console, you must pass one programmatically or you will get error:
No output location provided. An output location is required either through the Workgroup result configuration setting or as an API input.
The sample code below enforces AWS_SDK_LOAD_CONFIG is set, so athenadriver
's AWS Session will be created from the configuration values from the shared config (~/.aws/config
) and shared credentials (~/.aws/credentials
) files.
Even if we pass all dummy values as parameters in NewDefaultConfig()
except OutputBucket
, they are overridden by
the values in AWS CLI config files, so it doesn't really matter.
// To use AWS CLI's Config for authentication
func useAWSCLIConfigForAuth() {
os.Setenv("AWS_SDK_LOAD_CONFIG", "1")
// 1. Set AWS Credential in Driver Config.
conf, err := drv.NewDefaultConfig(secret.OutputBucketProd, drv.DummyRegion,
drv.DummyAccessID, drv.DummySecretAccessKey)
if err != nil {
return
}
// 2. Open Connection.
db, _ := sql.Open(drv.DriverName, conf.Stringify())
// 3. Query and print results
var i int
_ = db.QueryRow("SELECT 456").Scan(&i)
println("with AWS CLI Config:", i)
os.Unsetenv("AWS_SDK_LOAD_CONFIG")
}
If your AWS CLI setting is valid like mine, this function should output:
with AWS CLI Config: 456
The above authentication method also works for querying Athena in AWS Lambda. In lambda, you don't have to provide access ID, key and region, and you don't need AWS CLI config files either. You just need to specify the correct output bucket. Please check the AWS Lambda Go same code here.
Use athenadriver
Config For Authentication
When environment variable AWS_SDK_LOAD_CONFIG
is NOT set, you may explicitly define credentials by passing valid (NOT dummy) accessID
, secretAccessKey
, region
, and outputBucket
into athenadriver.NewDefaultConfig()
.
The sample code below ensure AWS_SDK_LOAD_CONFIG
is not set, then pass four valid parameters into NewDefaultConfig()
:
// To use athenadriver's Config for authentication
func useAthenaDriverConfigForAuth() {
os.Unsetenv("AWS_SDK_LOAD_CONFIG")
// 1. Set AWS Credential in Driver Config.
conf, err := drv.NewDefaultConfig(secret.OutputBucketDev, secret.Region,
secret.AccessID, secret.SecretAccessKey)
if err != nil {
return
}
// 2. Open Connection.
db, _ := sql.Open(drv.DriverName, conf.Stringify())
// 3. Query and print results
var i int
_ = db.QueryRow("SELECT 123").Scan(&i)
println("with AthenaDriver Config:", i)
}
The sample output:
with AthenaDriver Config: 123
The full code is here at examples/auth.go.
Use AWS SDK Default Credentials Resolution for Authentication
If environment variable AWS_SDK_LOAD_CONFIG
is NOT set and credentials are not supplied in the athenadriver
configuration, the AWS SDK will look up credentials using its default methodology described here: https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html#specifying-credentials.
Region
and OutputBucket
bucket still need to be explictly defined.
The sample code below ensures AWS_SDK_LOAD_CONFIG
is not set, then creates a athenadriver config with OutputBucket and Region values set.
// To use AWS SDK Default Credentials
func useAthenaDriverConfigForAuth() {
os.Unsetenv("AWS_SDK_LOAD_CONFIG")
// 1. Set OutputBucket and Region in Driver Config.
conf := drv.NewNoOpsConfig()
conf.SetOutputBucket(secret.OutputBucketDev)
conf.SetRegion(secret.Region)
// 2. Open Connection.
db, _ := sql.Open(drv.DriverName, conf.Stringify())
// 3. Query and print results
var i int
_ = db.QueryRow("SELECT 123").Scan(&i)
println("with AthenaDriver Config:", i)
}
The sample output:
with AthenaDriver Config: 123
Full Support of All Data Types
As we said, athenadriver
supports all Athena data types.
In the following sample code, we use an SQL statement to SELECT
som simple data of all the advanced types and then print them out.
package main
import (
"context"
"database/sql"
drv "github.com/uber/athenadriver/go"
)
func main() {
// 1. Set AWS Credential in Driver Config.
conf, err := drv.NewDefaultConfig("s3://myqueryresults/",
"us-east-2", "DummyAccessID", "DummySecretAccessKey")
if err != nil {
panic(err)
}
// 2. Open Connection.
dsn := conf.Stringify()
db, _ := sql.Open(drv.DriverName, dsn)
// 3. Query and print results
query := "SELECT JSON '\"Hello Athena\"', " +
"ST_POINT(-74.006801, 40.70522), " +
"ROW(1, 2.0), INTERVAL '2' DAY, " +
"INTERVAL '3' MONTH, " +
"TIME '01:02:03.456', " +
"TIME '01:02:03.456 America/Los_Angeles', " +
"TIMESTAMP '2001-08-22 03:04:05.321 America/Los_Angeles';"
rows, err := db.Query(query)
if err != nil {
panic(err)
}
defer rows.Close()
println(drv.ColsRowsToCSV(rows))
}
Sample output:
"Hello Athena",00 00 00 00 01 01 00 00 00 20 25 76 6d 6f 80