Playing with G Suite MDM mobile device data using Go

Our desktop support & G Suite admin folks needed a simple, fast command-line tool to query basic info about our company’s mobile devices (which are all managed using G Suite’s built-in MDM).

So I wrote one.

Since this tool needs to be run via command-line, it can’t use any interactive or browser-based authentication, so we need to use a service account for authentication.

Pre-requisites (GCP & G Suite):

  • Create a service account in your G Suite domain (Google Cloud Console –> IAM & Admin –> Service Accounts)
  • Be sure that you have checked the “Enable G Suite Domain-wide Delegation” box as this will give your new service account the appropriate permissions to query the G Suite Admin API
  • Download the service account JSON credentials file. Open it up and find the client_id. Make a note of it!
  • Now we need to authorize the client_id of your service account for the necessary API scopes. In the Admin Console for your G Suite domain (Admin Console –> Security –> Advanced Settings –> Authentication –> Manage API Client Access), add your client_id in the “Client Name” box, and add
    https://www.googleapis.com/auth/admin.directory.device.mobile.readonly
    in the “One or more API scopes” box
  • Click the “Authorize” button when you’re finished

Pre-requisites (Go)
You’ll need to “go get” a few packages:

  • go get -u golang.org/x/oauth2/google
  • go get -u google.golang.org/api/admin/directory/v1
  • go get -u github.com/dustin/go-humanize

Pre-requisites (Environment Variables)
Because it’s never good to store runtime configuration within code, you’ll notice that the code references several environment variables. Setup them up to suit your preference but something like this will do:

export GSUITE_COMPANYID="A01234567"
export SVC_ACCOUNT_CREDS_JSON="/home/rickt/dev/adminsdk/golang/creds.json
export GSUITE_ADMINUSER="foo@bar.com"

And finally, the code
Gist URL: https://gist.github.com/rickt/199ca2be87522496e83de77bd5cd7db2

Automated insert of CSV data into Bigquery via GCS bucket + Python

i wanted to try out the automatic loading of CSV data into Bigquery, specifically using a Cloud Function that would automatically run whenever a new CSV file was uploaded into a Google Cloud Storage bucket.

it worked like a champ. here’s what i did to PoC:

  1. generate a CSV file with 1000 lines of dummy data via https://www.mockaroo.com/ with their default schema so it looks like:
$ head -3 testdata.csv

id,first_name,last_name,email,gender,ip_address

1,Andres,Berthot,aberthot0@pbs.org,Male,104.204.241.0

2,Iris,Zwicker,izwicker1@icq.com,Female,61.87.224.4
  • make a test Google Cloud Storage bucket:
     $ gsutil mb gs://csvtestbucket 
  • install the necessary python bits & pieces:
    $ pip3 install google-cloud-bigquery --upgrade
  • make a Bigquery dataset:
    $ bq mk --dataset rickts-dev-project:csvtestdataset
  • make a table within that dataset to match the CSV schema:
  • $ bq mk -t csvtestdataset.csvtable \
    
    id:INTEGER,first_name:STRING,last_name:STRING,email:STRING,gender:STRING,ip_address:STRING
  • eyeball the table in the Bigquery dataset and verify it is clean and fresh:
  • now its time to write some python. here’s mine:
  • because you don’t want to hardcode things like bucket/table/dataset names in code, create a yaml file that will store your deployment-specific configuration in environment variables
  •  BUCKET: csvtestbucket
    
    DATASET: csvtestdataset
    
    TABLE: csvtable
    
    VERSION: v14
  • create a requirements.txt file for the necessary imports
  • google-cloud
    
    google-cloud-bigquery
  • create a .gcloudignore file so that your yaml or CSV files will not be deployed into GCP
  • *csv
    
    *yaml
  • at this point, your folder should look something like this:
  • $ ls
    
    env.yaml  main.py  requirements.txt  testdata.csv
  • now we are ready to deploy the cloud function, we will add a trigger on the storage bucket that will fire every time a new file is added to the bucket. here we are creating a cloud function named “csv_loader”:
  • $ gcloud beta functions deploy csv_loader \
    
    --runtime=python37 \
    
    --trigger-resource=gs://csvtestbucket \
    
    --trigger-event=google.storage.object.finalize \
    
    --entry-point=csv_loader \
    
    --env-vars-file=env.yaml
  • ok, the function is deployed, yay! copy your test data CSV into the test bucket:
  • $ gsutil cp testdata.csv gs://csvtestbucket/
    
    Copying file://testdata.csv [Content-Type=text/csv]...
    
    - [1 files][ 60.4 KiB/ 60.4 KiB]
    
    Operation completed over 1 objects/60.4 KiB.  
  •  now that we have copied a CSV file in the bucket, the function should fire! check the cloud function logs:
  • $ gcloud functions logs read
    
    [ ... snipped for brevity ... ]
    
    D      csv_loader  274732139359754  2018-10-22 20:48:27.852  Function execution started
    
    I      csv_loader  274732139359754  2018-10-22 20:48:28.492  Starting job 9ca2f39c-539f-454d-aa8e-3299bc9f7287
    
    I      csv_loader  274732139359754  2018-10-22 20:48:28.492  Function=csv_loader, Version=v14
    
    I      csv_loader  274732139359754  2018-10-22 20:48:28.492  File: testdata2.csv
    
    I      csv_loader  274732139359754  2018-10-22 20:48:31.022  Job finished.
    
    I      csv_loader  274732139359754  2018-10-22 20:48:31.136  Loaded 1000 rows.
    
    D      csv_loader  274732139359754  2018-10-22 20:48:31.139  Function execution took 3288 ms, finished with status: 'ok'

    looks like the function ran as expected!

  • lets eyeball the bigquery table again, and see if the row count has changed
  • $ bq show csvtestdataset.csvtable
    
    Table rickts-dev-project:csvtestdataset.csvtable
    
    Last modified           Schema           Total Rows   Total Bytes   Expiration   Time Partitioning   Labels
    
    ----------------- ----------------------- ------------ ------------- ------------ ------------------- --------
    
    22 Oct 13:48:29   |- id: integer          1000         70950
    
    |- first_name: string
    
    |- last_name: string
    
    |- email: string
    
    |- gender: string
    
    |- ip_address: string

    great! there are now 1000 rows. looking good.

  • as a final check, lets compare the first 3 rows of the CSV
  • $ egrep '^[1,2,3],' testdata.csv
    
    1,Andres,Berthot,aberthot0@pbs.org,Male,104.204.241.0
    
    2,Iris,Zwicker,izwicker1@icq.com,Female,61.87.224.4
    
    3,Aime,Gladdis,agladdis2@hugedomains.com,Female,29.55.250.191

    with the first 3 rows of the bigquery table

    $ bq query 'select * from csvtestdataset.csvtable \
    
    where id IN (1,2,3)'
    
    Waiting on bqjob_r6a3239576845ac4d_000001669d987208_1 ... (0s) Current status: DONE
    
    +----+------------+-----------+---------------------------+--------+---------------+
    
    | id | first_name | last_name |           email           | gender |  ip_address   |
    
    +----+------------+-----------+---------------------------+--------+---------------+
    
    |  1 | Andres     | Berthot   | aberthot0@pbs.org         | Male   | 104.204.241.0 |
    
    |  2 | Iris       | Zwicker   | izwicker1@icq.com         | Female | 61.87.224.4   |
    
    |  3 | Aime       | Gladdis   | agladdis2@hugedomains.com | Female | 29.55.250.191 |
    
    +----+------------+-----------+---------------------------+--------+---------------+

    and whaddyaknow, they match! w00t!

    proof of concept: complete!

    conclusion: cloud functions are pretty great.

    PoC: stream nginx access logs into Bigquery

    lets say you have some servers in a cluster serving vhost foo.com and you want to put all the access logs from all the webservers for that vhost into Bigquery so you can perform analyses, or you just want all the access logs in one place.

    in addition to having the raw weblog data, you also want to keep track of which webserver the hits were served by, and what the vhost (Host header) was.

    so, foreach() server, we will install fluentd, configure it to tail the nginx access log, and upload everything to Bigquery for us.

    it worked like a champ. here’s what i did to PoC:

    1. Install fluentd
      $ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
      
    2. Create a Bigquery dataset
       $ bq mk --dataset rickts-dev-project:nginxweblogs
      Dataset 'rickts-dev-project:nginxweblogs' successfully created.
    3. Create a JSON schema to handle the weblogs + server hostname + vhost name
      [
        {
          "name": "agent",
          "type": "STRING"
        },
        {
          "name": "code",
          "type": "STRING"
        },
        {
          "name": "host",
          "type": "STRING"
        },
        {
          "name": "method",
          "type": "STRING"
        },
        {
          "name": "path",
          "type": "STRING"
        },
        {
          "name": "referer",
          "type": "STRING"
        },
        {
          "name": "size",
          "type": "INTEGER"
        },
        {
          "name": "user",
          "type": "STRING"
        },
        {
          "name": "time",
          "type": "INTEGER"
        },
        {
          "name": "hostname",
          "type": "STRING"
        },
        {
          "name": "vhost",
          "type": "STRING"
        }
      ]
    4. Create a table in the Bigquery dataset to store the weblog data
      $ bq mk -t nginxweblogs.nginxweblogtable schema.json
      Table 'rickts-dev-project:nginxweblogs.nginxweblogtable' successfully created.
    5. Install the fluentd Google Bigquery plugins
      $ sudo /usr/sbin/td-agent-gem install fluent-plugin-bigquery --no-ri --no-rdoc -V
    6. Configure fluentd to read the nginx access log for this vhost and upload to Bigquery (while also adding the server hostname and vhost name) by creating an /etc/td-agent/td-agent.conf similar to this: https://gist.github.com/rickt/641e086d37ff7453b7ea202dc4266aa5 (unfortunately WordPress won’t render it properly, sorry)

      You’ll note we are using the record_transformer fluentd filter plugin to transform the access log entries with the webserver hostname and webserver virtualhost name before injection into Bigquery.

    7. After making sure that the user fluentd runs as (td-agent by default) has read access to your nginx access logs, start (or restart) fluentd
       $ sudo systemctl start td-agent.service
    8. Now make a call to your vhost (in my case, localhost)
       $ hostname
      hqvm
      $ curl http://localhost/index.html?text=helloworld
      you sent: "helloworld"
    9. Query Bigquery to look for that specific hit, first using the bq command line tool
       $ bq query 'SELECT * FROM nginxweblogs.nginxweblogtable WHERE path = "/index.html?text=helloworld"'
      +-------------+------+------+--------+-----------------------------+---------+------+------+------+----------+--------------------------+
      |    agent    | code | host | method |            path             | referer | size | user | time | hostname |          vhost           |
      +-------------+------+------+--------+-----------------------------+---------+------+------+------+----------+--------------------------+
      | curl/7.47.0 | 200  | ::1  | GET    | /index.html?text=helloworld | -       |   14 | -    | NULL | hqvm     | rickts-dev-box.fix8r.com |
      +-------------+------+------+--------+-----------------------------+---------+------+------+------+----------+--------------------------+
    10. Congratulations, you have just setup your web access logs to inject to a Bigquery table!

    proof of concept: complete!

    conclusion: pushing your web access logs into Bigquery is extremely easy, not to mention, a smart thing to do.

    the benefits exponentially increase as your server + vhost count increases. try consolidating, compressing and analyzing logs from N+ servers using months of data in-house and you’ll see the benefits of Bigquery right away.

    enjoy!

    Slack Translation Bot

    stable release of Slack Translator Bot.

    http://github.com/rickt/slack-translator-bot

    what is Slack Translator Bot? the [as-is demo] code gets you get a couple of Slack /slash commands that let you translate from English to Japanese, and vice-versa.

    below screenshot shows example response to a Slack user wanting to translate “the rain in spain falls mainly on the plane” by typing:

    image

    within slack:

    image

    TL;DR/HOW-TO

    • you get a google translate api key
    • you deploy a go backend app to Google Appengine that responds to
    • a couple of custom Slack /slash commands that you create in Slack
    • you profit

    http://gist-it.appspot.com/https://github.com/rickt/slack-translator-bot/blob/master/slacktranslatorbot.go

    rickt/slack-translator-bot

    Slack Team Directory Bot

    stable release of Slack Team Directory Bot.

    http://github.com/rickt/slack-team-directory-bot

    what is Slack Team Directory Bot? you get a Slack /slash command that lets you search your Slack Team Directory quick as a flash.

    below screenshot shows example response to a Slack trying to find someone in your accounting department by typing:

    image

    within slack:

    image

    TL;DR/HOW-TO

    • you deploy a go backend app to Google Appengine that responds to…
    • a custom Slack /slash command that you create in Slack

    http://gist-it.appspot.com/http://github.com/rickt/slack-team-directory-bot/blob/master/slackteamdirectorybot.go
    rickt/slack-team-directory-bot

    UPDATED: golang code to query GA data via the Google Core Reporting API using two-legged service account Google OAuth2 authentication

    i’ve updated my example Golang code that authenticates with the Core Reporting API using service account OAuth2 (two-legged) authentication to use the newly updated golang.org/x/oauth2 library.

    my previous post & full explanation of service account pre-reqs/setup:

    http://code.rickt.org/post/142452087425/how-to-download-google-analytics-data-with-golang

    full code:

    have fun!

    drop me a line or say hello on twitter if any questions.