Last updated: July 12, 2026

Scanner Documentation

The AI/BI Readiness Scanner is a Databricks Solution Accelerator package for read-only diagnostic analysis of cost-waste indicators, AI/BI readiness, metric reconciliation, query intent, and shadow dependencies.

Scanner-Lite Package Contents

Setup and permission check notebook.
Query pattern clustering notebook.
Metric conflict detection notebook with owner-validation recommendations.
Dashboard dependency notebook.
Shadow dependency notebook.
Genie readiness scoring notebook.
Genie readiness domains and benchmark questions for owner review.
Repeated-query candidates for materialized-view or aggregate-table review, with query counts, dashboard counts, and bytes-read evidence.
Serverless usage-policy attribution gaps, hot-table maintenance candidates, and Lakeflow tier-review findings that require owner validation.
Report generator notebook.
Synthetic BI, dbt, workflow, and ticket metadata examples for private-pilot demos.
Permission guide and troubleshooting guide.

Separate Private App and Enterprise Scope

The free scanner-lite notebooks do not generate metric-view DDL, validation SQL, data-product contracts, dollar-denominated savings/payback economics, or a 90-day remediation backlog. They also do not include warehouse-sizing, scheduled-job compute, DBR-runtime, Photon-policy, compute-policy, cluster-restart, instance-pool, detailed cost-attribution, or billing-reconciliation detectors. Those richer artifacts and detectors belong to a separately deployed private Databricks App or enterprise diagnostic, require additional telemetry, and must be validated in the customer workspace.

Related Deployment Docs

Connector contracts for BI, workflow, dbt, orchestration, Git, and ticket exports.
Enterprise controls for allowlists, privacy defaults, audit events, and retention posture.
MCP server for read-only agent access to readiness evidence, reconciliation drafts, and grounding context.

Operating Modes

Mode	Inputs	Output
Basic	Query history and information schema where available.	Repeated query patterns and candidate metric conflicts.
Full	Query history, billing, lineage, Lakeflow/jobs tables, and optional BI metadata.	Report sections, dashboard and shadow dependency findings, materialization candidates, policy templates, maintenance candidates, and tier-review findings.
Lite fallback	Manually exported query or BI metadata sample.	Limited candidate findings with lower confidence labels.

Setup Overview

Create or select a customer-owned Unity Catalog output schema.
Grant least-privilege read access to the relevant Databricks system tables.
Import the scanner notebooks into the customer workspace.
Run notebooks in order from 00_setup_and_permissions.py through 06_report_generator.py.
Run the benchmark export to produce Genie validation questions and reviewer answer templates.
Review findings with business, platform, governance, or audit owners.

Private App Background Refresh

This section applies to the separately deployed private Databricks App, not the scanner-lite notebook package.

Use --section light for frequent saved-result refreshes: SQL warehouses, table-maintenance candidates, readiness, and diagnostic snapshots.
Schedule heavy jobs/serverless/query/cost sections separately so user-facing tabs do not run long system-table scans inline.
The app serves Diagnostic, Graph, Attention, Serverless, and Optimization Experiments from cached or saved snapshots.

Minimum Basic Permissions

SQL warehouse CAN USE.
SELECT on system.query.history.
USE CATALOG and USE SCHEMA on the output location.
CREATE TABLE and MODIFY in the output schema.

Important Limitations

System table data is not real time, so recent events may not appear immediately.
Lineage is treated as evidence with confidence labels, not perfect truth.
Metric conflict detection produces candidates that require human validation.
Materialized-view candidates use query counts, dashboard counts, and bytes-read evidence; scanner-lite does not calculate dollar savings or payback.
Optimization experiment plans are review-only control/treatment designs; customers decide whether and when to run them.
Serverless usage-policy templates are attribution drafts and should be reviewed against customer budget-policy ownership.
Table maintenance candidates require Query Profile and table-owner validation before enabling predictive optimization or liquid clustering.
Predictive optimization, Unity Catalog Metric Views, Genie, and Databricks MCP are treated as native destinations or control planes, not scanner replacements.
Runtime observability tools remain the system of record for deployed data-quality and agent-quality monitoring; the scanner focuses on pre-deployment assessment and reconciliation backlog generation.
Lakeflow tier-review findings are owner-review prompts and should be validated against required pipeline features before any tier or execution-mode change.
The scanner-lite package does not provide a final audit opinion or legal advice.

Production App Performance Model

Dashboard and diagnostic views are snapshot-first and should not run full system-table scans on page load.
Slow Databricks work should run through scheduled Brickcost refresh jobs or manual background refresh.
Production deployments should keep refresh-on-read disabled so API reads serve cached or starter snapshots immediately.
The app exposes scan status and export-allowlisted API timing diagnostics for support review.