Automate Athena SQL Queries using Python boto3.

  • Like
  • Report this story

In this blog, you will learn how to setup boto3 and automate your data engineering workflow using Athena.
  • What is Amazon Athena?
    • With the help of Amazon Athena, you can query data instantly. Get results in seconds and pay only for the queries you run.
    • It is easy to analyze data in Amazon S3 using SQL.
    • Amazon Athena simply points to your data in Amazon S3, defines the schema, and start querying using standard SQL.
  • S3 file system
    • S3 allows users to easily manage data and access control.
    • You can use it to store and protect any amount of data.
    • It also allows you to query CSV files using Amazon Athena.
    • You can store and query CSV and Parquet files from S3 storage.
  • Boto3 installation
    • Type cmd in the search window
    • Enter the "pip install boto3" command.
    • Open Jupyter notebook and test the installation with the "import boto3" command.
  • Creating Athena tables using boto3
    • Create a new directory in the S3 bucket and place your raw files in it.
      • ex: bucket_name/new_directory/Raw_input_files.csv
    • Before you run your first query, you need to set up the location for query results in S3.
      • Example: s3://query-results-bucket/folder/
    • Run below code to create a table in Athena using boto3.
  • import boto3                                    # python library to interface with S3 and athena.
    s3 = boto3.resource('s3')                       # Passing resource as s3 
    client = boto3.client('athena')                 # and client as athena
    database = 'database_name'                      # Data base name
    query=""" create external table data_base_name.table1 (
    'ID' Int,
    'Name' string,
    'Address' string)
    Location "s3://query-results-bucket/input_folder/";
    s3_output = 's3://query-results-bucket/output_folder/'  # output location
    response = client.start_query_execution(QueryString=query,QueryExecutionContext={  'Database': database},
                             ResultConfiguration={ 'OutputLocation': s3_output})

Athena S3 S3 file System Automate SQL Queries in Athena. Boto3 Create external table in athena python jupyter notebook

Related Stories

  • Likes  0
  • Comments  0
  • Be the first to comment.