If you’re a programmer who has been using OpenAI, you have likely experienced the boundaries of what it can and cannot do. At this point, I am making a bold statement that it cannot replace software engineers.
If you ask an AI to build a simple web application, it will immediately reply that building an application is a complex process and that it cannot do so as an AI language model. However, it won’t have a problem spitting out random snippets of code aimed at the type of application you want to build. But if you are technically savvy, you know that a few snippets of code do not make an application.
Now, ask the AI to do it again in a new window without the previous context. It will always generate the same guardrails of saying it cannot do so but will keep producing random snippets of code. There is also a high chance that it will now come back with code in a completely different language. The first response I got was in Python, while the second was in Node.js.
I purposely did not specify the language because if you are not as adept, you might assume that all programming languages are equal. This is like saying all languages are the same as English. The reason why there is still no universal translator technology that exists is that when you translate English into a different language or vice versa, it is never a one-to-one conversion for each word. Each language has implicit rules unique to that specific language, and the same goes for programming languages.
Additionally, programming languages cannot run in one environment all at the same time without heavy configuration. This goes out the window when AI gives you prompts and snippets of code, making you assume that it’s all easy. The AI can only give you a function or two at any point, never the whole application and all the plumbing that comes with it. If we compare it to building a house, it can only give you a window, a door, or a brick at a time, never a whole house in one go.
In the current iteration of GPT-3.5 or GPT-4 at the time of this writing, the token limit is set to 16k per token.
I asked ChatGPT what is the size of a web application and the average number of lines of code per web application. The response was that it can vary greatly depending on various factors such as the size and complexity of the application, the programming language used, the framework employed, and the coding practices followed. It is difficult to provide an exact average as it can range from a few hundred lines for small and simple web applications to several thousand or even tens of thousands for larger and more complex ones.
Let’s assume we need 100,000 lines of code. Let’s also assume that each OpenAI call can generate around 1000 lines of code at a time (rounded off for simplicity), with a 16k token size limit in mind. It would take 100 calls just to get the same number of lines of code. However, from the previous point, we already know that without context, each call will generate random snippets of code that won’t be plug-and-play with the previous set of code. This means heavy debugging would be required to make it work.
Your workflow would start looking like this early on:
1st code from AI + (2nd code from AI + 1st code Debugging) + (3rd code from AI+ 2nd code debugging + 1st code debugging) + so on.. The cost of the API calls and the time it would take to debug would still amount to something similar to hiring an engineer, which entirely defeats the purpose. Additionally, you would have technical debt right out the window.
Exponential Bug Fixing Costs Now let’s say that AI was able to spit out the application and you were able to piece together one snippet at a time after extensive debugging. There is a high chance that this code would have close to zero coding conventions on each snippet. And knowing that there is no such thing as perfect software, what happens when bugs start appearing?
When users inevitably encounter issues, your Frankenstein code would experience edge cases that the AI did not expect or you did not prompt it to expect. How do you then start fixing those bugs? Do you send all 100,000 lines of code and say “fix this bug”? If you want to send a smaller snippet, how do you know which lines of code to send? The time it takes to figure this out would be much higher than if someone had written the code with proper coding conventions.
Software has developed significantly since its inception. It has become more secure and distributed due to the number of malicious attacks and the need to accommodate a large number of users. How then do we expect AI to communicate with such distributed systems? If my backend server is sitting in San Francisco or wherever my cloud provider is best, how do we expect AI to connect to it and build software around it?
There are three options:
The first two options are security hazards, while the last one is entirely not feasible if you are a small company and would take much more time than just building what you need in the first place.
Netflix alone has hundreds of microservices to enable streaming for millions, if not billions, of people. Even if we had access to all the backend servers, we would need a deep understanding of the entire context of all the microservices in order to generate complex features that can affect the entire architecture.
For example, let’s say I want to generate a feature that prevents password sharing. To do this:
Or it could be more complex (or much simpler), with data spread across multiple locations. The point is, I don’t know how the distribution operates unless Netflix is willing to give proprietary data over to a public model. This does not make sense for such big conglomerates. It only makes sense for them to train their own models, which is why we see other big tech companies building their own models now.
Java, a previously open-sourced language widely used in banking systems, tactically switched to being closed-source months before OpenAI was released. Stack Overflow, a forum for all engineers, banned answers that were generated from AI and have recently released its own Overflow AI model trained only on their data. Training AI and building models both require engineers.
The reason people worry about it so much is that we know for a fact that it can give accurate and good-quality snippets of code. But that’s because it has access to most open-source code repositories. The keyword here is “open.”
Public repositories like GitHub host billions of lines of code. However, if you take a closer look, you’ll see that most publicly accessible codebases are small mini-projects or interview questions from different engineers. Companies are not necessarily hosting their proprietary data publicly (although there have been leaks of sensitive codebases). This is a conversation for another time.
Big software applications and legacy systems do not sit publicly in any open-source repository, and AI models do not have free access to them. So you can have an idea of what type of suggestions the model would always be giving you.
Now, I did not say no one will get replaced; I said AI cannot replace engineers.
A more likely scenario would be that AI would make engineers 10 times more efficient, and the need for hundreds of engineers would likely diminish. Why hire 100 engineers when 10 could do the job? We saw this happen on Twitter when Elon Musk laid off 80% of the workforce. This would also mean that engineers would start evolving from constantly learning different web frameworks and languages to expanding their skills to include data engineering and data science, to accommodate big tech companies' need to train their own models.
The learning curve to join software engineering would now be higher than ever because new engineers in the industry have a lot to catch up on. However, this would also mean that the demand (and the pay) for those who keep up will be much higher. Engineers who are not able to keep up will become obsolete much faster than ever before.
It will be cutthroat in the software engineering world, but then again, it has always been this way from the beginning.