Prerequisites

About us

  • Johan Euphrosine, Developer Programs Engineer
  • Takashi Matsuo, Developer Advocate
  • Nate White, App Engine
  • Jim Caputo, BigQuery

Better App Engine logs analysis

  • Scanning live logs to perform query is slow
  • Running Map Reduce is overkill for one-offs
  • Need a simple and efficient way to do interactive queries

How?

  • Read App Engine logs using Logs API
  • Write them as a CSV file to Google Cloud Storage using the File API
  • Ingest the CSV file using Big Query REST API
  • Orchestrated using Pipeline API
  • Demo

Logs API

  • Read logs from Google App Engine infrastructure
  • Request logs
  • Application logs
  • Retention Policy

Level 1: Fetch logs using Logs API

  • Fetch last 5 minutes of request logs
  • Render RequestLog attributes as plain text in CSV format in the response body

Level 1: Fetch logs using Logs API

Google Cloud Storage

  • Geographicaly distributed object storage
  • Run on Google infrastructure
  • Store files in buckets
  • Define ACLs on resources
  • Used a staging area for loading file in other Google APIs

Level 2: Prerequisites

  • Go to developers.google.com/console
  • In the Team section, add YOUR_APP_ID@appspot.gserviceaccount.com as a teammate
  • Change the permission for YOUR_APP_ID@appspot.gserviceaccount.comto Can edit
  • In the Google Cloud Storage section, click Storage Manager
  • Create a new bucket
  • Edit config.py
  • Replace YOUR_BUCKET_NAME with your bucket name

Level 2: Write logs to Google Cloud Storage

  • Create a file named requests.csv in your bucket
  • Write last 5 minutes of request logs to this file in CSV format
  • Render a HTML link to the file in the response body

Level 2: Write logs to Google Cloud Storage

Google Big Query

  • Interactive analysis on Big Data
  • Process TB of data in seconds
  • Cloud based service, accessible via REST API
  • Publicly released earlier this year

Level 3: Prerequisite

  • Go to bigquery.cloud.google.com
  • Make sure your API Project is selected
  • Create a new dataset
  • Edit config.py
  • Replace YOUR_DATASET_ID with your dataset name
  • Replace YOUR_TABLE_ID with a table name of your choice

Level 3: Ingest logs into Big Query

  • Use Google API Python Client to build an authenticated BigQuery service instance
  • Insert a new load job for gs://your-bucket-name/requests.csv
  • Check the status for the corresponding job id
  • Query logs on bigquery.cloud.google.com

Level 3: Ingest logs into Big Query

Pipeline API

  • Chain mapreduce operation, API requests, and background tasks
  • Python DSL based on generators
  • Rely on App Engine Task Queue API

Level 4: Orchestrate background operations using Pipeline API

  • Create a MapperPipeline with LogInputReader, FileOutputWriter and a mapper function to convert request logs object to CSV
  • Create a pipeline that takes gs:// files as argument and ingest them using a Big Query load job
  • Create a pipeline that polls the status of the load job until it is done
  • Create a handler that launch the pipeline and redirect to the pipeline UI

Level 4: Orchestrate background operations using Pipeline API

Bonus Level: Query logs using Big Query API

  • Create an HTML form with a textarea and a submit button
  • Insert a query jobs with the content of the textarea
  • Render query results as an HTML table
  • Make it pretty using a CSS framework (bootstrap, ...)
  • Use the channel API to get live feedback on the pipeline operations
  • Review code.google.com/p/log2bq

yield Thank("You!")