Mock Data Generation
User Guide for Mock Data Generation Tool
Overview
Mock and Synthetic data are alternative forms of data that you can consider when real-world data is unavailable, restricted or unsuitable for certain use cases. While both serve as substitutes for real data, they differ in how they are generated and their characteristics. Your use case might benefit from one, or a combination of both approaches:
Available on Cloak
Available on Mirage today
Created through an explicit list of rules to resemble real data in structure and format. It is not derived from real-world data.
Generated by statistical and ML-based algorithms, which learn the patterns from real-world data. It requires you to input data on which to train the data generation model.
Allows you to generate SG-localised fields and define the distributions / constraints of data fields up to a certain extent. However, it is challenging to replicate the underlying statistical attributes of your dataset, thus is more suitable for use cases that do not require high fidelity to the original dataset.
Synthetic data can capture the underlying statistical attributes of your original dataset, thus is suitable for use cases that require high fidelity to the original dataset.
Mock Data Generation
Available on Cloak today with planned migration to Mirage in CY2Q25.
Cloak's Mock Data Generation tool enables the generation of realistic mock data across the different data categories such as Person, Business, Location etc.
The tool includes localised mock data fields tailored to Singapore's context, and field groups that maintain specific relationships between a set of data fields. Additionally, the tool allows for the customisation of data distribution for selected fields, as well as the injection of custom text and numeric values.
The generated mock data can be downloaded in CSV, Excel or JSON format.
Cloak's mock data generation tool built in collaboration with GovTech's Analytics by Design team, which developed Zorua - a public-facing mock data designer.
Synthetic Data Generation
Available on Mirage today.
Upload your real dataset and obtain AI-generated data that maintains the statistical properties of the original data, while ensuring that your sensitive information is protected. This reduces your attack surface and safeguards against potential data leaks.
Have some suggestions or feedback? Or were you unable to find the information you were looking for?
Please get in touch with us at [email protected], and we will be glad to help. 💙