Skip to content

modelabcl/ecto_bigquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EctoBigquery

Ecto adapter for Google BigQuery focused on schema management and data seeding: creating and evolving BigQuery datasets and tables through Ecto migrations, and loading initial data via the Seeds helper.

Why this library exists

I work as a data engineer and part of my day-to-day is maintaining BigQuery schemas across multiple environments. For a long time our team used Terraform for that, and while it works well for provisioning infrastructure, it falls short when it comes to schema evolution — ALTER TABLE operations like adding or dropping columns aren't natively supported, which forced us to write workarounds or apply changes manually.

Having spent years working with tools like Active Record and Ecto Migrations, I believe the migration pattern is simply the best way to manage a database schema: changes are versioned, ordered, reproducible, and reversible. That mental model transfers directly to BigQuery, so I built this library to bring it there.

Scope: This library is intentionally limited to DDL operations (mix ecto.create/drop/migrate/rollback) and seeding (INSERT/DELETE via EctoBigquery.Seeds). It does not implement a general-purpose query interface — Repo.all, Repo.insert, Repo.update, and Repo.delete are not supported for application tables. This is a deliberate design decision: the goal of this library is schema and seed management, not runtime data access.

Features

  • mix ecto.create — creates the BigQuery dataset
  • mix ecto.drop — drops the BigQuery dataset
  • mix ecto.migrate — runs DDL migrations (create/drop tables, add/drop/rename columns)
  • mix ecto.rollback — reverts migrations
  • Seeds helper — INSERT and DELETE via BigQuery DML

Installation

Add ecto_bigquery to your dependencies in mix.exs:

def deps do
  [
    {:ecto_bigquery, "~> 0.1"}
  ]
end

Setup

1. Define your Repo

defmodule MyApp.Repo do
  use EctoBigquery.Repo, otp_app: :my_app
end

2. Configure your application

In config/config.exs:

config :my_app, MyApp.Repo,
  project_id: "my-gcp-project",
  dataset_id: "my_dataset",
  goth: MyApp.Goth

3. Add to your supervision tree

# lib/my_app/application.ex
children = [
  {Goth, name: MyApp.Goth, source: {:default, []}},
  MyApp.Repo
]

4. Authentication

EctoBigquery uses Goth for Google Cloud authentication.

Application Default Credentials (recommended for local development):

gcloud auth application-default login

Service Account (recommended for production):

{Goth,
  name: MyApp.Goth,
  source: {:service_account, Jason.decode!(File.read!("credentials.json"))}}

Migrations

Migrations use standard Ecto syntax. Supported operations:

Create table

defmodule MyApp.Repo.Migrations.CreateEvents do
  use Ecto.Migration

  def change do
    create table(:events) do
      add :user_id,    :string,       null: false
      add :event_type, :string
      add :amount,     :decimal
      add :active,     :boolean
      add :metadata,   :map
      timestamps()
    end
  end
end

Alter table

defmodule MyApp.Repo.Migrations.AlterEvents do
  use Ecto.Migration

  def change do
    alter table(:events) do
      add    :contact_email, :string
      remove :legacy_field
    end

    rename table(:events), :amount, to: :total_amount
  end
end

Drop table

defmodule MyApp.Repo.Migrations.DropEvents do
  use Ecto.Migration

  def change do
    drop table(:events)
  end
end

Type Mapping

Standard types

Ecto type BigQuery type
:string STRING
:integer INTEGER
:float FLOAT
:boolean BOOLEAN
:decimal NUMERIC
:date DATE
:time TIME
:naive_datetime DATETIME
:utc_datetime TIMESTAMP
:binary BYTES
:map JSON

Column modes

BigQuery columns support three modes as an alternative to null: false:

Option BigQuery mode SQL generated
mode: :nullable NULLABLE (no constraint)
mode: :required REQUIRED NOT NULL
mode: :repeated REPEATED ARRAY<T>
create table(:analytics) do
  add :user_id,    :string,  mode: :required   # STRING NOT NULL
  add :tags,       :string,  mode: :repeated   # ARRAY<STRING>
  add :notes,      :string,  mode: :nullable   # STRING (explicit)
end

mode: :repeated is a shorthand for {:array, type}. Both are equivalent:

add :tags, :string,          mode: :repeated   # ARRAY<STRING>
add :tags, {:array, :string}                   # ARRAY<STRING>

The generator supports :repeated and :nullable as modifiers:

mix ecto_bigquery.gen.migration create_analytics user_id:string:required tags:string:repeated notes:string:nullable

Special BigQuery types

Ecto type BigQuery type
:geography GEOGRAPHY
:bignumeric BIGNUMERIC
{:array, :string} ARRAY<STRING>
{:array, :integer} ARRAY<INTEGER>
{:struct, [field: :type, ...]} STRUCT<field TYPE, ...>
{:array, {:struct, [...]}} ARRAY<STRUCT<...>>
create table(:locations) do
  add :point,        :geography
  add :exact_amount, :bignumeric
  add :tags,         {:array, :string}
  add :scores,       {:array, :integer}
  add :address,      {:struct, [street: :string, city: :string, zip: :string]}
  add :contacts,     {:array, {:struct, [name: :string, email: :string]}}
end
CREATE TABLE `locations` (
  point GEOGRAPHY,
  exact_amount BIGNUMERIC,
  tags ARRAY<STRING>,
  scores ARRAY<INTEGER>,
  address STRUCT<street STRING, city STRING, zip STRING>,
  contacts ARRAY<STRUCT<name STRING, email STRING>>
)

Seeds

Use EctoBigquery.Seeds in priv/repo/seeds.exs:

# Option A: via Repo (Goth must be running in the supervision tree)
EctoBigquery.Seeds.insert(MyApp.Repo, "users", [
  %{id: "1", name: "Alice", active: true},
  %{id: "2", name: "Bob",   active: false}
])

# Atomic multi-table insert — rolls back all if any insert fails
EctoBigquery.Seeds.insert_all(MyApp.Repo, [
  {"users",    [%{id: "1", name: "Alice"}]},
  {"products", [%{sku: "P1", price: 9.99}]}
])

EctoBigquery.Seeds.delete(MyApp.Repo, "users", "active = FALSE")
EctoBigquery.Seeds.truncate(MyApp.Repo, "users")

# Option B: explicit connection (useful in standalone scripts)
config     = MyApp.Repo.config()
project_id = Keyword.get(config, :project_id)
dataset_id = Keyword.get(config, :dataset_id)
{token, 0} = System.cmd("gcloud", ["auth", "print-access-token"])
conn = EctoBigquery.Connection.new(String.trim(token))

EctoBigquery.Seeds.insert_with_conn(conn, project_id, dataset_id, "users", [
  %{id: "1", name: "Alice", active: true},
  %{id: "2", name: "Bob",   active: false}
])

EctoBigquery.Seeds.delete_with_conn(conn, project_id, dataset_id, "users", "active = FALSE")
EctoBigquery.Seeds.truncate_with_conn(conn, project_id, dataset_id, "users")

Migration generator

mix ecto_bigquery.gen.migration generates migration files with pre-filled column definitions based on the migration name pattern.

# Create table
mix ecto_bigquery.gen.migration create_users name:string age:integer active:boolean

# NOT NULL columns
mix ecto_bigquery.gen.migration create_products sku:string:required price:decimal:required

# Add column
mix ecto_bigquery.gen.migration add_email_to_users email:string:required

# Remove column
mix ecto_bigquery.gen.migration remove_phone_from_users phone

# Drop table
mix ecto_bigquery.gen.migration drop_legacy_data

# Special types
mix ecto_bigquery.gen.migration create_places location:geography amount:bignumeric "tags:array[string]"

Generator type aliases

Alias Ecto type
string, text :string
integer, int :integer
float :float
decimal, numeric :decimal
boolean, bool :boolean
date :date
datetime, utc_datetime :utc_datetime
naive_datetime :naive_datetime
binary :binary
map, json :map
geography :geography
bignumeric :bignumeric
array[TYPE] {:array, TYPE}

Note: STRUCT types must be written manually in the migration file since their nested fields cannot be expressed in a single CLI argument.

Mix tasks

# Create the BigQuery dataset
mix ecto.create

# Drop the dataset (deletes all tables)
mix ecto.drop

# Run pending migrations
mix ecto.migrate

# Revert the last migration
mix ecto.rollback

# Run seeds
mix run priv/repo/seeds.exs

Out of scope

The following are intentionally not supported:

  • Generic query interfaceRepo.all, Repo.insert, Repo.update, Repo.delete for application tables. Use google_api_big_query directly for runtime data access.
  • DDL transactions — BigQuery does not support transactional DDL. If a migration with multiple changes fails midway, previously applied changes remain permanent.

About

Ecto Adapter for Bigquery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages