Why everyone is talking about AI safety and cybersecurity

By Valeska Bloch, David Rountree, Lauren Holz, Aveline Orban
AI Cyber Data & Privacy Technology, Media & Telecommunications

What to do about it 10 min read

The Australian Signals Directorate, together with 19 other global partners, has published Guidelines for secure AI system development. This is the latest in a growing pile of orders, declarations and guidance released over the past month, which have sent AI safety and security to the top of the global AI agenda.

This Insight examines the themes emerging from these latest developments and outlines the top nine steps organisations should be taking to address them.

Key takeaways

  • Geopolitical tensions, the evolving cyber threat environment and the accelerated mainstream uptake of generative AI, have triggered an intense focus on AI safety and security—and in particular on:
    • the potential for misuse of AI for malicious purposes
    • threats to the safety and security (ie the confidentiality, integrity or availability) of AI systems.
  • Global players, including Australia, are pre-emptively dealing with the most serious safety and security risks (including the risk of foreign interference), while also ensuring they aren't left behind when it comes to the development and deployment of AI.
  • Established cyber-hygiene practices and standards won't be sufficient to safeguard against these emerging AI-related cyber threats. There are nine steps organisations should take to address AI security risks, including ensuring they are up to date, trained and aware of the key regulatory reporting requirements, and are continuing to adapt to the evolving landscape.
  • Just as AI presents threats to safety and security, it also presents enormous opportunities, with AI increasingly being used to defend against cyber threats.

What you need to know

  • Although AI-specific security regulation and industry standards are being developed, organisations can't afford to wait to take action. Regulators, policymakers, shareholders and the public will use existing principles and risk-based frameworks to hold companies to account for failures to address AI-related security risks.
  • Given this, organisations should already be implementing AI governance, risk management, technical and operational measures to ensure that:
    • high-risk AI systems are safe, reliable, secure and resilient1
    • organisations are prepared to respond to—and can withstand—serious AI incidents, system weaknesses and malfunctions.
  • Specifically, providers and users of high-risk AI systems should implement measures to ensure such systems:
    • do not present a significant risk of injury or damage to humans, property or the environment
    • are designed to perform consistently and in accordance with their intended purpose
    • protect against any compromise of the confidentiality, integrity or availability of any AI system (eg as a result of data poisoning and other adversarial examples).
  • Security should be a core requirement throughout the lifecycle of AI systems—from design, to development, to deployment, to operation and maintenance, to retirement.
  • Although existing AI systems already present risks to safety and security, recent developments (including both the Executive Order and the Bletchley Declaration—discussed below) emphasise the particular risk posed by very large, general-purpose AI models that can be used in a wide range of legitimate contexts, but could easily be repurposed for harmful ends—sometimes referred to as general AI, dual-use foundation models or frontier AI. Organisations that develop or use these AI models should take particular care to implement and maintain measures for the security and monitoring of the model in light of this increasing regulatory focus.[2] Organisations should start considering whether an AI model is a general AI model at the design phase.
  • Governments and institutions are collaborating to develop standards to establish security frameworks that leverage unified protocols and best practice. AI security standards under development include the Technical AI Standards, which are part of a collaboration between the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) and the U.S. Government, and ISO/IEC 27090, which will provide guidance for addressing security threats and failures in artificial intelligence systems. In the meantime, organisations should look to NIST's AI Risk Management Framework. The framework has been designed to assist organisations to develop responsible AI practices.
  • Consistent with other recent developments, the Guidelines for secure AI system development call for providers to exercise 'radical transparency' to enable end users to understand the limitations and risks posed by AI system components, and to ensure that end users know how to use components securely. This will help to overcome the risks that can arise when AI system components are designed, developed or integrated by different organisations and when end users have limited visibility into their supply chain.
  • We expect to see more guidance on labelling so that users can easily identify: (i) where content that resembles existing persons, places or events has been artificially created or manipulated; (ii) where users are interacting with an AI system; and (iii) where outputs or outcomes have been generated from a generative AI tool and how and why that may impact individuals or other third parties.


'Artificial Intelligence must be safe and secure. Meeting this goal requires robust, reliable, repeatable and standardized evaluations of AI systems, as well as policies, institutions and, as appropriate, other mechanisms to test, understand and mitigate risks from these systems before they are put to use'.

- US Executive Order


What are the AI safety and cybersecurity risks?

AI safety and security typically seeks to address two areas of risk:

1. The misuse of AI for malicious purposes—eg the use of AI to:
  • plan and carry out cyberattacks, including by generating malicious code and facilitating highly persuasive social engineering (eg through voice cloning and phishing emails that closely mimic the impersonated sender)
  • amplify disinformation, erode trust in information and influence societal debate (including through the use of deepfakes)
  • increase scams, fraud, impersonation and data harvesting
  • compile information about physical attacks, including chemical, biological, radiological and nuclear weapons, to aid in weapons development.3
2. Threats to the safety and security (ie the confidentiality, integrity or availability) of AI systems—eg:
  • attempts to manipulate or corrupt training data sets so that the AI model produces incorrect outputs (ie data poisoning, which is a form of adversarial machine learning). Types of data poisoning include availability poisoning (which involves model misclassification on testing samples and reduction in model accuracy that renders it unusable), targeted poisoning (which is localised to a very small number of samples, making it hard to detect), backdoor poisoning (which embeds backdoors into training examples that trigger the misclassification of samples), and model poisoning (which modifies the model to inject malicious code into it).4
  • AI model flaws, eg model inversion—which involves a threat actor using outputs from a model to infer the model’s architecture, model tampering—which involves manipulating the parameters to generate inaccurate or biased results, and backdoors embedded in models—which cause a model to produce a threat actor’s desired output when a trigger is introduced into the model’s input data.
  • compromised supply chains, such as the exploitation of vulnerable workforces responsible for the human labelling of AI datasets (ie label poisoning and input attacks). In a recently published paper, the Cyber Security Cooperative Research Centre notes that poor workers and corrupt officials in developing nations may be particularly vulnerable to coercion by malicious parties who could employ financial incentives to use such a workforce to manipulate the labelling of training data.5
  • inadequate backup, disaster recovery and incident response strategies to respond to security incidents, which can lead to business interruption, loss of critical information, legal exposure, and financial and reputational damage.

Recent global developments in AI safety and security

Over the past month, AI safety and security has shot to the top of the global AI agenda.

  • On 26 October 2023, in advance of an AI Safety Summit, UK Prime Minister Rishi Sunak announced a plan to position the UK as the global leader for AI safety.
  • On 30 October, President Biden rolled out 'the most significant action ever taken by any government to advance the field of AI safety'—an Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. In June 2023, the White House secured voluntary commitments from leading AI companies, including Amazon, Anthropic, Google, Inflection, Meta, Microsoft and OpenAI on the safe and responsible development and use of AI, investment in cybersecurity and information sharing to advance public trust in AI.
  • In response to the October Executive Order, NIST put out a call on 1 November for participants in a new consortium supporting development of innovative methods for evaluating AI systems to improve safety and trustworthiness of AI.
  • That same day, 29 countries—including Australia—signed the Bletchley Declaration, recognising that this is 'a unique moment to act and affirm the need for the safe development of AI'.
  • Negotiations in connection with the EU AI Act recently hit a roadblock after a handful of EU member states expressed concerns that the proposed regulation of foundation models could harm innovation, and a new model for self-regulation through codes of conduct has subsequently been proposed by these member states. The EU is also exploring adapting civil liability rules to compensate those who have suffered harm from an AI system or the use of AI.
  • In an address to the National Press Club on 22 November 2023, the Federal Minister for Communications, Michelle Rowland, announced that the Federal Government is commencing consultation on reforms to the Basic Online Safety Expectations regime. This would introduce new expectations for services using generative AI to 'proactively minimise the extent to which AI can be used to produce unlawful and harmful material'.6 This follows the registration in September 2023 by the eSafety Commissioner of the Internet Search Engine Services Online Safety Code. The Code – which sets out specific measures for generative AI to minimise and prevent the generation of synthetic class 1A material, such as child sexual exploitation material – will come into effect form March 2024. The Minister also announced that Australia and the United Kingdom are collaborating on an online safety and security memorandum of understanding (MOU) that will respond to technology-facilitated issues like misinformation and threats to privacy across 'emerging technologies like generative AI'. The MOU is expected to secure collaboration between these governments to enable information and learnings to be shared to facilitate regulatory interventions.
  • The Australian Government recently made commitments in its 2023-2030 Australian Cyber Security Strategy released on 22 November 2023 to:
    • explore and invest in national initiatives to promote the safe and responsible use of AI in the face of evolving cybersecurity threats
    • develop a framework to assess national security risks in supply chains and, in particular, risks presented by products and services entering the Australian market from overseas.
  • On 27 November, the Australian Signals Directorate, together with 19 other global partners, published Guidelines for secure AI system development.

These latest developments underscore the tension nations face between the need to pre-emptively deal with the most serious safety and security risks (including the risk of foreign interference), while also ensuring they aren't left behind when it comes to the development and deployment of AI.

Nine steps organisations should take to address AI safety and security risks

Organisations should…
1 ensure that AI systems are subject to cyber risk and impact assessments that consider both the internal architecture and design of the AI system and its components, as well as the intended (and potential) application context.
ensure that high-risk AI systems:
  • incorporate security-by-design and security-by-default principles at each level and throughout their lifecycle
  • are independently valuated, performance tested in a real world-context, and monitored (including post-deployment), to ensure they 'function as intended, are resilient against misuse or dangerous modifications, are ethically developed and operated in a secure manner, and are compliant with applicable…laws and policies'7
  • are trained against specific types of adversarial / evasion attacks and unauthorised knowledge distillation (ie the capturing and redeployment of knowledge from an AI model (or a group of AI models) into another model
  • are designed to include tripwires and controls to override, reverse or halt the output or operation of AI systems.
3 assess and monitor the security of their AI supply chains and require suppliers to adhere to robust security and other applicable standards.
4 start considering whether an AI model is a general AI, dual-use foundation model or other high risk model at the design phase.
5 ensure that developers, trainers, operators and users of AI systems undertake roles-based training.
6 understand regulatory reporting requirements in relation to serious AI incidents, system weaknesses or malfunctioning systems.
7 create AI incident response plans and playbooks.
8 run red teaming and other war gaming exercises (including with the board) that contemplate AI-related incidents (eg disinformation about the company or its management spread by deepfakes, unintended consequences of AI that they use or operate, cyberattacks undertaken using AI-generated malicious code etc) into their war gaming scenarios. For more on cyber war games, see Why every company should have a structured cyber simulation program.
9 continuously adapt organisational security training and incident response measures with reference to AI-specific threats, in particular the use of deepfakes to perpetrate highly sophisticated social engineering attacks on staff.

With the AI landscape constantly changing, it’s important to be vigilant and stay up to date, understanding what's required and how to manage and implement all the changes coming through.


  1. NIST Risk Management Framework: 'AI systems should 'not under defined conditions, lead to a state in which human life, health, property, or the environment is endangered' (Source: ISO/IEC TS 5723:2022). Safety risks that pose a potential risk of serious injury or death call for 'the most urgent prioritization and most thorough risk management process'. NIST: 'security includes resilience but also encompasses protocols to avoid, protect against, respond to, or recover from attacks'.

  2. Biden's Executive Order would require companies to make ongoing disclosures to the U.S. Federal Government when developing or 'demonstrating an intent to develop' AI systems that qualify as 'dual-use foundation models'.

  3. UK Department for Science, Innovation and Technology, Safety and Security Risks of Generative Artificial Intelligence to 2025, 25 October 2023, p 6.

  4. Falk, R and Brown, A, Poison the Well: AI, data integrity and emerging cyber threats, Cyber Security Cooperative Research Centre.

  5. Falk, R and Brown, A, Poison the Well: AI, data integrity and emerging cyber threats, Cyber Security Cooperative Research Centre.

  6. The recently proposed Combatting Misinformation and Disinformation Bill will have a similar nexus to the adoption in of AI in Australia.

  7. Biden Administration, Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, 30 October 2023, section 2(a).