Avesys DataForge · Live · Production

Production-quality data — without production-quality risk.

DataForge discovers sensitive data with AI, masks it on the way out, slices it down to the size your team needs, and spins up safe, on-demand environments for dev, test, and analytics. Engineers stop waiting on data. Compliance stops worrying about it.

SQL Server firstPluggable AI · on-prem or cloudOn-premise · air-gap capable
Avesys DataForge — discovery & maskingv1.0 · production
DiscoveryMasking rulesSubsetsEnvironmentsAudit
dbo.Customers → preview row 4 of 1,284,200AI · reviewable
row 4PRODUCTION
FullName Real Customer
Email real@example.com
NationalID38172649501
DOB 1986-04-17
ShipAddr 412/8 Center St, City
CardLast44782
row 4MASKED
FullName Synthetic Name
Email syn.name@example.com
NationalID61204938771
DOB 1986-04-17
ShipAddr 17/3 Side St, City
CardLast42049
FK·integrity preservedformat preserved27 sensitive columns flagged
Built for
SQL Server estatesPluggable AI · on-prem or cloudGDPR · KVKK · PCIAir-gap deployable
The problem · sound familiar?

Real bugs need real data. Real data needs real protection.

Four sentences that come up on every release cycle — and the answers your team is forced to give without proper tooling for the data itself.

We can't give devs production data.

Compliance won't allow it. Legal won't allow it. But synthetic data doesn't reproduce the bugs, and a half-empty test database doesn't catch the real ones. Development happens against data that doesn't reflect reality — and bugs ship to production anyway.

Head of Platform · Tuesday standup

Refreshing the test environment takes a week.

The DBA pulls a copy, scrubs it, loads it, verifies it. By the time the test environment is fresh, the data is already two sprints stale. Multiply by every team that needs a copy.

DBA · sprint review

We don't actually know where our PII lives.

There's PII in the database somewhere — there always is. You can't list every column that holds it. Neither can your auditor. That's a problem before it's a problem.

DPO · pre-audit prep

Test data was an Excel file the team passed around.

Each environment looks different. Each developer's local copy looks different again. Bugs that reproduce in staging vanish in dev. Nobody trusts what they see.

QA Lead · retro
What DataForge does

Discover. Mask. Subset. Provision.

Four operations that today live in scripts, tickets, and DBA brains — built into one product. The full lifecycle of non-production data, from "where is the PII" to "the new dev has a working environment."

01 · AI-ASSISTED DISCOVERY

Find every sensitive column. Pluggable AI runtime.

DataForge scans your schemas and flags PII, payment data, national IDs, and patterns you define. Smarter than pure regex; faster than a manual audit. The AI's reasoning is visible and reviewable — run the model on your own hardware for full sovereignty, or plug in your LLM provider of choice (Gemini, Claude, OpenAI) when the data path suits your policy.

dbo.Customers.EmailRFC-5322 · contextual headersPIIconf 99%
dbo.Customers.NationalID11-digit · checksum validIDconf 96%
dbo.Orders.CardPAN_Last44-digit suffix · PCI scopePCIconf 99%
dbo.Customers.DateOfBirthDATE col · age-derivablePIIconf 94%
dbo.Orders.ShippingAddrfree-text · street/city tokensPIIconf 91%
AI on your terms

Choose the runtime: a local model on your hardware for air-gapped sovereignty, or your preferred cloud LLM when the data path is acceptable to your privacy office.

You confirm before masking

Every finding has a confidence score and a reason. You approve, reject, or refine — nothing is masked until you say so.

02 · MASKING THAT KEEPS USEFULNESS

Realistic substitutes. Same shape. Same relationships.

Replace sensitive values with realistic ones that preserve format, length, and referential integrity. Foreign keys still match. Reports still run. Bugs still reproduce. PII no longer exposed.

PRODUCTION
name Real Customer
emailreal@example.com
id 38172649501
card 4782
MASKED · FORMAT-PRESERVING
name Synthetic Name
emailsyn.name@example.com
id 61204938771
card 2049
03 · SUBSETTING THAT RESPECTS YOUR DATA

A representative slice. Not the whole 10 TB.

Pull subsets by tenant, customer, date range, or any rule that fits your workload. Smaller environments, faster refreshes, lower storage costs — and the relationships between tables stay intact.

PRODUCTION10.0 TB
SUBSET340 GB
WHERE tenant_id IN (42, 71) AND placed_at > '2025-07-01'
JOIN orders → order_lines → products (FK-traced)
04 · EPHEMERAL ENVIRONMENTS ON DEMAND

Use it. Tear it down. The next team gets the same clean start.

Provision a fresh, masked, right-sized database environment for a developer, a tester, or a CI job. Use it for as long as you need. Expire it on a schedule or tear it down on command. No more shared dev databases that nobody is brave enough to refresh.

09:02request dev-feat-checkout · subset tenant=42 · 120 GBQUEUED
09:04discover → mask → load · ~6 minPROVISIONING
09:10connection string handed off · expires in 7dRUNNING
— 7dautomatic teardown · storage reclaimed · audit loggedEXPIRED
For developers · QA · CI

Personal namespaces for engineers. Reproducible slices for QA. Fresh, governed copies for every CI run — without the DBA being the bottleneck.

Lifecycle, automated

Create on demand or on a schedule. Refresh, expire, recreate. The whole loop is orchestrated — no spreadsheets, no Slack reminders.

05 · PIPELINE ORCHESTRATION

Fits into the workflow you already have.

Schedule discoveries. Schedule refreshes. Trigger on commit, on ticket, on deploy. DataForge plugs into your CI/CD — it doesn't ask your team to build a new pipeline from scratch.

TRIGGERgit pushon feat/* branch
DISCOVERre-scandelta only · 12s
MASK + SUBSETapply rules~4 min
PROVISIONCI env upconn string emitted
Who it's for

Built for the teams that move data between environments.

If your week revolves around the question "can the team have a fresh copy of production?" — DataForge is for you.

01 · DBAs

DBAs who provision environments

Tired of being the bottleneck when every team needs a refreshed copy. DataForge automates the work that fills your inbox.

02 · PLATFORM

Data & platform teams

Need repeatable, governed data pipelines into non-prod — masked the same way every time, regardless of who's asking.

03 · DEV & QA

Development & QA teams

Want realistic data without a week-long ticket. Realistic enough to reproduce bugs. Safe enough that nobody panics.

04 · SEC & COMPLIANCE

Security & compliance

Need to prove sensitive data is protected outside production — with evidence, not assurances. Every mask is logged and attributable.

05 · CISO & DPO

CISOs & DPOs

Answering audit questions about where regulated data lives — and producing the inventory the auditor asked for, on demand.

Use cases · in the wild

The week before DataForge — and the afternoon after.

Three scenarios we hear on every call. Same problem, same time pressure, same audit risk — different outcomes once DataForge is in the loop.

Scenario 01 · Onboarding

A new developer joins the team.

Day one, they need a working environment. DataForge provisions a masked, subsetted copy of production in their isolated namespace. They're coding by lunchtime. The environment expires at the end of their project.

Before · ~2 weeksAfter · half a day
Scenario 02 · Repro

QA wants to reproduce a customer bug.

The bug only shows up against a specific data shape. DataForge clones the relevant production slice, masks it, and hands QA a fresh environment. The repro happens this afternoon — not next week.

Before · 5–7 daysAfter · same afternoon
Scenario 03 · Audit

An auditor asks where payment data lives.

DataForge already discovered every column. You hand the auditor the inventory and the masking rules — with confidence scores and AI reasoning. The audit takes hours instead of weeks.

Before · weeks of forensicsAfter · hours, with evidence
Why Avesys DataForge

Pluggable AI. On your terms.

Other vendors in this space want your column names and your sample data in their cloud, full stop. Avesys gives you the choice. Run the discovery model on your own hardware for air-gapped sovereignty, or plug in your existing LLM provider when the data path suits your privacy office.

Built by a team that already operates SQL Server in production through Avesys DPM. We understand the constraints of large databases, referential integrity, and operational windows — because we've worked inside them.

01

Pluggable AI runtime

Run the discovery model on your own hardware for full sovereignty, or plug in Gemini, Claude, OpenAI, or your provider of choice. The architecture supports both — the choice is yours, not the vendor's.

02

Built for on-premise SQL Server estates

Production data, masked data, ephemeral environments — none of it routes through an Avesys SaaS in the middle. Critical when the data being processed is the data you can't expose.

03

Designed around real database workloads

Foreign keys, system-versioned tables, columnstore, operational windows. The product was built by DBAs who've subsetted billion-row tables, not a UI team guessing at the workflow.

04

Part of the Avesys platform

One vendor across monitoring (DPM), comparison (Database Compare), version control (SQL Version Control), and data management (DataForge). One operational model. One team to call.

05

Engineered for regulated industries

Retail, manufacturing, finance, healthcare — environments where GDPR/KVKK/PCI obligations make casual data copying a career-ending move. Designed for buyers who can't just send their data to a cloud SaaS.

How it works

Five steps. From connect to running.

From "we'd like to give this team a copy" to "they have a fresh, masked, governed environment" — in a workflow that doesn't depend on the DBA's calendar.

01.

Connect

Point DataForge at your source databases. Read-only by default. No agents on the SQL Server, no production write paths.

⏱ ~15 minutes
02.

Discover

AI-assisted scanning identifies sensitive data across your schemas — with confidence scores and reasoning. You confirm the inventory.

⏱ Minutes per database
03.

Define rules

Masking strategies, subset rules, lifecycle policies — defined once, reusable across environments and teams.

⏱ Hours, then reused forever
04.

Provision

Spin up masked, subsetted environments on demand or on a schedule. Connection strings handed to whoever asked.

⏱ Minutes per environment
05.

Orchestrate

Create, refresh, expire, recreate. The full lifecycle is automated — and every action is logged and attributable.

⏱ Inline with your CI/CD
Pricing

Priced by the data, not the instances.

One number per terabyte of data you process. No per-instance fees. No add-on modules.

Avesys DataForge · per TB, per year
2,900
Scales with the value, not the inventory.

€2,900 per TB, per year. Volume pricing available for estates above 10 TB. Existing Avesys customers see bundle options across the platform — talk to our team.

AI-assisted sensitive data discovery
Masking, subsetting, and orchestration
Unlimited environments and refresh cycles
All current and future features through your license term
Email support from the engineering team
Volume pricing above 10 TBTalk to us
Bundle with DPM, SQL Version Control, Database Compare
Already on the platform?Bundle into one license relationship.
Frequently asked

FAQ — the honest answers.

The questions procurement, security, and the DBA team usually arrive with on the first call. Same answers we give them.

Q · 01

What does it cost?

€2,900 per TB, per year. Volume pricing available for estates above 10 TB. Existing Avesys customers can bundle DataForge into one license relationship.

Q · 02

Is DataForge generally available?

Yes — DataForge is live and shipping. Pricing, support, and the full feature set are documented above. Contact us to schedule a demo against your environment.

Q · 03

Is DataForge an ETL tool?

Not in the classic sense. DataForge is purpose-built for getting safe, useful copies of production data into non-production environments — discovery, masking, subsetting, provisioning. If you need a general-purpose ETL/ELT, this isn't that.

Q · 04

Where does the AI run?

Wherever your privacy office tells us to. The discovery model can run on your own hardware for full sovereignty, or you can plug in a managed LLM provider (Gemini, Claude, OpenAI). The runtime is configurable per deployment.

Q · 05

Does it work without internet access?

Yes — when you deploy the local AI model. DataForge follows the same on-premise design as the rest of the Avesys platform. Air-gapped installations are supported.

Q · 06

Will it support databases other than SQL Server?

SQL Server is the launch target. Other engines are being evaluated based on customer demand — tell us what you run when we talk.

Q · 07

How is sensitive data discovered?

AI-assisted scanning combined with pattern matching, column metadata, and rules you define. The output is reviewable — every finding has a confidence score and a reason. You confirm before masking is applied.

Q · 08

Does masked data preserve referential integrity?

Yes. Foreign keys still match, substitutions are consistent across tables, and format-preserving masking is the default. Reports still run, bugs still reproduce.

Q · 09

How does it relate to DPM, Database Compare, and SQL Version Control?

They're complementary. DPM monitors live SQL Server. Database Compare diffs environments. SQL Version Control versions schema. DataForge handles the data that flows between environments.

Production-quality data · without production-quality risk

Real data. Real safety. Real environments. Without the wait.

Talk to us about your estate. We'll schedule a demo against your environment with a real engineer — not a sales rep.