OpenAI Codex and LLMs' Ability to Problem Solve
I do not mean to belabor the point, but large language models’ ability to problem solve is very unsophisticated. My experience is that the models do not have a holistic view of the software projects they work on - they can’t reason in real-time across the component parts (files) that constitute an application to resolve bugs. Even when you expose your Github project to the LLMs, they forget the contents of the files they have ingested and how those components work with one another. It is very frustrating. If I did not map these elements out, the model (in my case OpenAI o3, o4-mini and o4-mini-high), would not be helpful. Where the model helps me, an inexperienced programmer, is with structure and syntax. The models are very good in this area as one would expect as these are after all “language models”.
Regarding OpenAI Codex, I’ve learned that you can use it for $30/month if you and another person sign up as a team rather than pay $200/month as an individual. I have also learned that the model goes rogue. Therefore, it is not a good idea to allow it to edit your code autonomously while you sleep. For that reason I will not sign up to Codex version 1 as I want to see what the agent is doing so that I can see mistakes in real-time. I could move Codex to a sandbox environment and allow it to code overnight there, but I prefer to use o3 to code iteratively in real-time. By the way, OpenAI Codex uses a new model.
My view is that we are 50 years away from a model being able to autonomously replicate TikTok every 30 seconds as former Google CEO (and OpenAI investor), Eric Schmidt said was “around the corner” in August 2024 - obviously hype. The link above will take you to the Schmidt video as I posted it on Twitter/X last summer, as the file was too large to upload here. YouTube would not let me post it last year as Schmidt asked Stanford to copyright the talk as he was embarrassed by negative comments he made in the talk about Google’s leadership style.



