Mock vs. Synthetic Data

Mirage currently supports both Mock Data Generation and Synthetic Data Generation. Mock and Synthetic data are alternative forms of data that you can consider when real-world data is unavailable, restricted or unsuitable for certain use cases. While both serve as substitutes for real data, they differ in how they are generated and their characteristics.

Mock Data
Synthetic Data

Available today on Cloak (Web UI and API) and Mirage (Web UI, API coming soon)

Available on Mirage today

Created using rule-based approaches, designed to mimic the structure and format of real-world data, with support for SG-specific entities. Unlike SDG, MDG does not learn from real-world data patterns.

Generated using statistical models and generative AI techniques that learn patterns and relationships in real-world data.

Suitable for scenarios requiring predictable and repeatable datasets that do not have to be highly realistic. For example, it can be used for software testing/ development and basic validation tasks.

Suitable for scenarios where statistical patterns in an existing dataset need to be retained, or if you require a more realistic dataset. For example, it can be used for AI/ML training, sharing of sensitive data and software testing.

Sample use cases:

  • Software Testing and Development

  • Training and Education (e.g., Hackathons)

  • Performance Benchmarking: Simulate real-world workloads to assess system performance.

  • Data Preview or Exploratory Data Analysis

Sample use cases:

  • Software Testing and Development where higher data fidelity is required or where Mock Data is too time consuming

  • Data Augmentation for AI/ML

  • Sharing of sensitive information to other parties for collaboration

To optimise your data generation process, consider a hybrid approach:

  • Use mock data generation for columns that don't require high fidelity to the original dataset. This is suitable for fields with simple rules or distributions.

  • Reserve synthetic data generation for columns that need to closely mimic the patterns and relationships in your real data. Read more about column selection here (link to Step 2 docs).

Last updated