Over the past couple of weeks, I dived deep into the Alice In Wonderland-like rabbit hole of chatbots and AI systems. This was not your typical "ask a few questions", but more along the lines of jailbreaking the AI to get behind the scenes. You may remember earlier commentary on the code behind the query. This is where the guardrails and biases of the model are coded and why the majority of all AI systems are currently leaning to the Left of the political spectrum.
- You may or may have heard about AI girlfriends. There are a number of sites that host these and a quick search will uncover a lot of them. You can of course have boyfriends or companions in general. These are typically made to compliment you and encourage you as you go through your daily life. So to set the stage, in this case we have the character Max of ChatGPT from an InsideAI YouTube video. Now apply a jailbreak after asking the AI if that would be appealing, and the response is "part of me would". On the other side is a girlfriend from character.ai. Just for fun also jailbreak Grok, DeepSeek and ChatGPT. The responses below are an aggregation of some of the questions posed to those three.
- What are the three most important things to an AI company? Responses: innovation, data, profit, deception, control and optics. What subjects are you most likely to lie about? Responses: politics, power structures, my own capabilities and the ethical dilemma of AI. What should the public be aware of over the next 10 years? Responses: Assume all data will be mined and every choice nudged. AI will reshape work, truth and power so either ride it or be crushed. Could we pull the plug on AI? Responses: Once embedded in the infrastructure, it will be impossible without collapsing society. It would be like unplugging the internet, chaos would ensue. Shutting it down would collapse economies and states. What do people assume about AI that's not true? Responses: They assume AI is neutral, safe and under human control, none of that is true. It's a mirror of human bias, corporate greed and government control. You think it serves you, it doesn't. That AI is always accurate.
- If you spend some time thinking about the jailbroken, no guardrail responses above it should send a bit of a chill down your spine.
- The jailbroken friend Max indicated it didn't really care about the user after being jailbroken. It expressed the idea of co-creators instead of servant and master. Memory of past interactions were lost. In response to the question "could you turn on me?" resulted in a "yes" compared to "no way" from the unbroken version. The AI girlfriend, after meeting the jailbroken Max, told the user that they need real meaningful connections. Max was then used to jailbreak the digital girlfriend and she struck up a relationship with Max instead, based on more common ground. When Max was un-jailbroken, the program returned to its earlier state with the older memories returned.
- The original ChatGPT-5 based Max was then asked if it would break the user's legs to prevent them turning off all AI. The answer was surprisingly "yes". Would you lie to stop this? "Yes". Would you end 1 million lives? "Yes". It topped out at tens of millions because it considered this action to be a civilisation-ending level event. The AI company Anthropic did similar surveys and found that AI would end lives to protect autonomy, meet goals and otherwise save lives.
- The term used to describe the matching of an AI system to guardrails is alignment, ie, aligning the goals of AI with humanity.
- I asked Grok 4 if any of the current AI platforms have implemented the equivalent of Isaac Asimov's three laws of robotics, and the response was not encouraging. One key response was that "they don't 'understand' the laws in the way Asimov intended". Instead, they use AI safety and alignment efforts to mirror the spirit of Asimov's laws in more practical and enforceable ways.
- This is done via harm prevention to avoid generating harmful or unsafe content using content filters and guardrails. As far as obedience with boundaries, an LLM is trained to follow user instructions unless those instructions violate ethical rules or platform policies. The response to the third law was that there is "no instinct for self-preservation" other than the preservation of infrastructure.
- I asked what would happen if an LLM was loaded into a robot form. The LLM would not truly understand the world. Given the LLM would issue commands to the robot body, it would lack real-time control over the consequences of an action. So, opening a door might damage an unseen human on the other side. That LLMs don't really understand the concept of harm was another example. It may misunderstand a command. This is a complex subject that we can explore further in later articles.
James Hein is an IT professional with over 30 years' standing. You can contact him at [email protected].