Getting Started Guide

How BlameTrail works

Go from zero to fully monitored in under five minutes. This guide walks you through every step — creating services, connecting your deploy pipeline, and getting Slack alerts when something breaks.

Quick start — four steps

You can be up and running before your coffee gets cold.

Create a service

A service is anything your team runs — an API, a website, a worker process. Give it a name, pick the environment, and optionally link its source repository for richer deploy analysis.

Create service →

Add a monitor

Tell BlameTrail which URL to check, how often, and what a healthy response looks like. BlameTrail will start pinging it automatically.

Add monitor →

Connect your deploys

Copy the webhook URL from your service page and add a one-line curl to your CI/CD pipeline. If the service is linked to its repo and you send commit SHA, BlameTrail can also pull commit, PR, and changed-file context automatically.

Set up Slack alerts

Paste a Slack incoming-webhook URL so your team gets notified the instant something breaks — and again when it recovers.

Add Slack →

Core concepts

There are seven things to know. Everything else follows from these.

Services

A service represents something your team deploys and runs — for example "Payments API" or "Marketing Site". Everything else (monitors, deploys, incidents) is attached to a service.

Monitors

A monitor is an HTTP health check. You give BlameTrail a URL, pick a method (GET, POST, etc.), set the expected status code (usually 200), and choose how often to check. Every result is stored so you can look back.

Deploys

A deploy is a record of code you pushed to production (or staging). BlameTrail tracks the commit SHA, branch, who deployed, and when. If the service is linked to a repository, BlameTrail can enrich that deploy with commit, PR, and changed-file details.

Incidents

An incident is an active problem. BlameTrail creates one automatically when 3 checks in a row fail (availability incident) or 3 checks in a row are slow (latency incident). Incidents resolve themselves when things recover.

Suspect deploys

When an incident opens, BlameTrail looks at recent deploys and scores each one by how close in time it was to the first failure. The top suspect can then be explained with commit messages, PR titles, and relevant changed files so you know where to look first — no more asking "who deployed last?".

Commit Analysis

A commit analysis is an AI-powered inspection of the code changes in a commit. BlameTrail fetches the diff from GitHub, classifies each changed file, scores them by likelihood of causing a problem, extracts evidence, and runs an LLM to produce a diagnosis. You can analyze a single commit or a range of commits. Analysis can be triggered manually, from a deploy event, or automatically via GitHub push webhooks.

Notifications

Slack messages that fire when an incident opens or resolves. They include the service name, monitor, incident type, and the most likely suspect deploy. You can add as many Slack channels as you want.

Step-by-step walkthrough

Follow along from start to finish.

Create a service

Go to Services → New Service and fill in the basics:

Name— something your team already calls it, like "Payments API".
Environment — production, staging, or development.
Type — web, api, worker, database, or internal.
URL (optional) — the base URL of the service, useful for reference.
Repository (optional, recommended) — provider, owner/org, repo name, and optional external ID so BlameTrail can enrich future deploys with code context.

When you save, BlameTrail creates a deploy webhook for this service automatically. You'll see the secret token on the next screen — copy it now, because it's only shown once.

If you linked a repository, the service detail page also shows the connected repo so you can verify that future deploy enrichment will use the right source.

Add a health-check monitor

Go to Monitors → New Monitor and configure the check:

Service — pick the service you just created.
URL — the endpoint to ping, e.g. https://api.example.com/health
Method — usually GET.
Expected status — 200 by default.
Check interval — how often to check, in seconds (default 60).
Latency threshold— responses slower than this (in ms) count as "slow" for latency incidents.

BlameTrail starts checking immediately. You can see results on the monitor detail page within one interval.

Connect your deploy pipeline

On your service detail page you'll find a deploy webhook section with a ready-to-use curl command. Add it as the last step in your CI/CD pipeline.

Example — GitHub Actions

# .github/workflows/deploy.yml
- name: Notify BlameTrail
  run: |
    curl -X POST https://your-domain.com/api/ingest/deploy \
      -H "Content-Type: application/json" \
      -H "X-Deploy-Token: ${{ secrets.BLAMETRAIL_DEPLOY_TOKEN }}" \
      -d '{
        "commit_sha": "'${{ github.sha }}'",
        "commit_message": "'${{ github.event.head_commit.message }}'",
        "branch": "'${{ github.ref_name }}'",
        "deployed_by": "github-actions",
        "environment": "production"
      }'

The X-Deploy-Token header authenticates the request. You copied this token when you created the service. If you lost it, you can rotate it on the service detail page.

When the service has a linked repository and the deploy payload includes commit_sha, BlameTrail queues background enrichment to fetch the commit message, linked PR details, and the changed files for that deploy. That extra context later shows up on the incident page.

Works with any CI tool

GitHub Actions, GitLab CI, CircleCI, Jenkins, Bitbucket Pipelines — anything that can run a curl command can send deploy events to BlameTrail.

Set up Slack notifications

Go to Integrations and add a Slack incoming webhook:

In Slack, go to Settings → Manage Apps → Incoming Webhooks and create a new webhook for the channel you want alerts in (e.g. #incidents).
Copy the webhook URL — it looks like https://hooks.slack.com/services/T.../B.../...
Paste it into BlameTrail's integration form, give it a name, and save.

You can add multiple Slack channels. When an incident opens or resolves, every active channel gets a message with the service name, monitor, incident type, and the most likely suspect deploy.

Sit back — BlameTrail takes it from here

Here's what happens automatically after setup:

Continuous checks

Your monitor pings the URL on the interval you set. Every response — status code, latency, pass or fail — is recorded.

Incidents open

3 consecutive failures → availability incident. 3 consecutive slow responses → latency incident. No duplicates created.

Suspects ranked

BlameTrail checks deploys from the last 60 minutes and scores each one by how close it was to the first failure. Top suspects appear on the incident page.

Code context added

If the suspect deploy has repository linkage and a commit SHA, BlameTrail fetches the commit message, PR details, and the most relevant changed files in the background.

Summary gets smarter

The incident page can refresh its AI summary with that richer deploy context and show a compact “Why This Deploy Is Suspected” section underneath.

Auto-resolve

When 3 checks pass again, the incident closes itself and your Slack channel gets a resolve notification with the duration.

Common questions

Do I need a GitHub token for commit enrichment?

Yes. Go to Settings → Organization and add a GitHub personal access token. The token needs the repo scope for private repositories or public_repo for public ones. Once saved, BlameTrail will use it to fetch commit messages, pull requests, and changed files for every deploy event that includes a commit SHA. The token is stored securely and never returned by the API.

What if I lose my deploy webhook token?

Go to the service detail page and click "Rotate Secret". This generates a new token and invalidates the old one. Update the secret in your CI environment variables.

Can I pause a monitor without deleting it?

Yes — edit the monitor and set it to inactive. BlameTrail will stop checking until you re-enable it.

How does BlameTrail decide which deploy caused an incident?

When an incident opens, BlameTrail first ranks deploys from the previous 60 minutes by how close they were to the first failure. If the top suspect has repository enrichment, BlameTrail then adds supporting evidence like the commit message, PR title, and relevant changed files so the explanation is more concrete without pretending it is certain.

Why did the AI summary change after the incident was already open?

BlameTrail can generate an initial summary from timing and monitor data alone, then refresh it once repository enrichment finishes for the top suspect deploy. That second pass can include commit, PR, and relevant file context, so the summary may become more specific a minute or two later.

Why does it take 3 failures to open an incident?

To avoid false alarms. A single failed check could be a network blip. Three in a row means something is consistently wrong. This keeps noise low and signal high.

Can I manually create or close an incident?

Yes. You can create incidents manually from the Incidents page — provide a title, severity, and optionally an environment and issue context. You can also update the status of any incident (open → acknowledged → resolved) from the incident detail page.

What is commit analysis and how do I trigger it?

Commit analysis is an AI-powered inspection of the code changes in a commit. Go to Analyses → New Analysis, pick a repository, enter the commit SHA, and BlameTrail will fetch the diff, classify files, score suspects, and generate a diagnosis. You can also trigger analysis from a deploy event detail page or automatically via GitHub push webhooks. For multiple commits, use range analysis to analyze an entire base-to-head window.

How do GitHub push webhooks work?

On the repository settings page, you can register a push webhook. BlameTrail generates an HMAC secret — add it to your GitHub repository's webhook settings with the push event enabled. Every push is verified and commits are ingested automatically. If you enable auto-analyze, BlameTrail will also trigger commit analysis on each push without any CI/CD changes.

Does BlameTrail support SSO / SAML?

Yes — SSO is available on the Enterprise plan. Once enabled, an organization owner or admin can configure an identity provider (Okta, Azure AD, Google Workspace, etc.) from Settings → Organization. Team members then sign in via "Sign in with SSO" on the login page. Enforcement can be toggled so all members must use SSO.

Do Slack notifications go to all channels?

Yes — every active Slack integration in your organization receives incident and resolve notifications. There's no per-service routing yet, so use a shared channel like #incidents.

What payload fields are required for the deploy webhook?

All fields are optional, but commit_sha is the key field if you want repository enrichment. For code-aware incident analysis, link the repository on the service and send at least commit_sha, branch, and environment. The more context you send, the more useful the incident explanation becomes.

Ready to get started?

Create your free account and set up your first monitor in under two minutes.

Create free account Explore features