Skip to content

Latest commit

 

History

History
51 lines (45 loc) · 1.92 KB

File metadata and controls

51 lines (45 loc) · 1.92 KB

Purpose

This application is a small tool to gather storage statistics about tables and datasets for your BigQuery GCP project.

The application uses __TABLES__ table to gather storage statistics about each table in a dataset. Application loops through all datasets one by one.

PLEASE NOTE: This application stores all of these stats in a dataset named utils and a table named daily_storage_stats. If they don't exist in a given project, application creates them for you.

Steps to run this application

  1. Create a service account with following permissions:
    • BigQuery Data Editor
    • BigQuery Data Reader
    • BigQuery Job user
  2. Create and download a service account key file in a JSON format
  3. Download a source code to your GCE instance or a local machine and go to root of the directory.
  4. Run following steps to create and activate a virtual environment
    virtualenv venv
    source venv/bin/activate
  5. Now install dependencies
    pip install -r requirements.txt
  6. Once these requirements are installed, use following command to run your application.
    python main.py --project_id=google.com:testdhaval --service_account_file=bqwriter.json
    --project_id = project id for which we are capturing storage size for each table --service_account_file = location of the key file for your service account.
  7. Once this script finishes, you can run following query:
    select * from `{PROJECT_ID}.utils.daily_storage_stats`
    {PROJECT_ID} - project_id you provided as an argument when you executed your application.

****Please note:***** You can schedule this application to run on a nightly basis in a cron job to get a daily snapshot of your storage usage for every table and build dashboard to analyze this information.