Here's an amusing example of the limits of large language models. To understand the joke, you need to realize how security vulnerabilities in software are exploited. The system under attack always takes some sort of input, whether text, pictures, structured data or something else. The attacker attempts to discover some input which was not anticipated by the programmer, and which will make the program fail in a way that leads to control by the attacker. So, the essence of secure programming is to make damn sure that in all the places where input you don't control is processed, that processing is done very carefully so as not to fall for such an attack.
The Register has an article up about a team of researchers at Université du Québec who used ChatGPT to generate a bunch of programs (which seems to be a common use these days). They then tested the 21 programs generated for security against a specific known vulnerability. Only 5 were secure. When they asked ChatGPT whether these programs were insecure, it admitted that they were, but only when asked specifically. The punch line is this quote:
The academics observe in their paper that part of the problem appears to arise from ChatGPT not assuming an adversarial model of code execution. The model, they say, "repeatedly informed us that security problems can be circumvented simply by 'not feeding an invalid input' to the vulnerable program it has created."