Mock vs. Synthetic Data
Mirage currently supports both Mock Data Generation and Synthetic Data Generation. Mock and Synthetic data are alternative forms of data that you can consider when real-world data is unavailable, restricted or unsuitable for certain use cases. While both serve as substitutes for real data, they differ in how they are generated and their characteristics.
Available today on Cloak (Web UI and API) and Mirage (Web UI, API coming soon)
Available on Mirage today
Created using rule-based approaches, designed to mimic the structure and format of real-world data, with support for SG-specific entities. Unlike SDG, MDG does not learn from real-world data patterns.
Generated using statistical models and generative AI techniques that learn patterns and relationships in real-world data.
Suitable for scenarios requiring predictable and repeatable datasets that do not have to be highly realistic. For example, it can be used for software testing/ development and basic validation tasks.
Suitable for scenarios where statistical patterns in an existing dataset need to be retained, or if you require a more realistic dataset. For example, it can be used for AI/ML training, sharing of sensitive data and software testing.
Sample use cases:
Software Testing and Development
Training and Education (e.g., Hackathons)
Performance Benchmarking: Simulate real-world workloads to assess system performance.
Data Preview or Exploratory Data Analysis
Sample use cases:
Software Testing and Development where higher data fidelity is required or where Mock Data is too time consuming
Data Augmentation for AI/ML
Sharing of sensitive information to other parties for collaboration
To optimise your data generation process, consider a hybrid approach:
Use mock data generation for columns that don't require high fidelity to the original dataset. This is suitable for fields with simple rules or distributions.
Reserve synthetic data generation for columns that need to closely mimic the patterns and relationships in your real data. Read more about column selection here (link to Step 2 docs).
Last updated