Written Answer to Unanswered Oral Question

Regulations on Input Prompts for Large Language Models to Prevent Disclosure of Confidential Data

9 January 2024 · Ministry of Home Affairs

Speakers

Josephine Teo (Minister for Communications and Information and Second Minister for Home Affairs) Tan Wu Meng Gerald Giam Yean Song

Summary

This question concerns the management of confidential data and the use of large language models (LLMs) within the public sector. Dr Tan Wu Meng and Mr Gerald Giam Yean Song asked about in-house AI development and measures to prevent data disclosure to private or foreign entities. Minister Mrs Josephine Teo replied that the Government adopts a risk-managed approach, isolating highly sensitive data from the internet and deploying open-source models on internal servers for sensitive tasks. For less-sensitive data, commercial LLMs are governed by service agreements that prohibit data retention and use for model training. Technical safeguards, including data screening and visual security cues, are also implemented alongside governance frameworks to ensure compliance and data security.

Transcript

26 Dr Tan Wu Meng asked the Minister for Communications and Information whether the Government has plans to develop in-house artificial intelligence capabilities to ensure that input prompts for large language models need not be processed by private firms not under the purview of the Government, or by cloud computing units located in foreign territories or under foreign jurisdiction or control.

27 Mr Gerald Giam Yean Song asked the Minister for Communications and Information (a) when using large language models owned by private or foreign companies, how does the Government ensure that confidential data is not disclosed in the input prompts; (b) whether the Government has signed any non-disclosure agreements (NDAs) with these companies; (c) what are the companies that the Government has signed NDAs with; and (d) how does the Government monitor compliance with such NDAs by these companies.

Mrs Josephine Teo: Large language models (LLMs), such as those powering ChatGPT, have the potential to enhance the delivery of public services and the productivity of public officers. We adopt a risk-managed approach for LLMs, consistent with the existing public sector framework for the handling of classified information when using technologies, such as Internet-based applications and the commercial cloud.

Highly sensitive applications and data are not exposed to the Internet. Where use cases involve sensitive data, open-source models may be finetuned for use but must be deployed on Government servers and computers.

For use cases involving less-sensitive data, the artificial intelligence models may be owned and managed by commercial and private companies. Our contracts with these companies are governed by service agreements, which include clauses on data handling and security, such as the non-retention of data, and limitations on the use of data to train other products or models. Beyond contractual safeguards, the Government has also implemented technical measures to screen sensitive data, visual cues to remind users on data security practices and governance measures to enforce compliance.

We continuously reassess the adequacy of our measures as the technology evolves.