The evolution of AI-supported software development: From rule-based tools to agent-based systems

Software development is currently undergoing a fundamental transformation. Automated tools have been supporting developers for decades – but what began with simple rule-based tools has accelerated dramatically in recent years. Modern AI systems can now not only suggest code, but also develop, test and debug it independently.

This article traces this development: from the first deterministic tools to the breakthrough of machine learning and today’s agent-based systems. We look not only at the technical possibilities, but also at the practical challenges and our own experiences from everyday project work at Be Shaping the Future.

1. The beginnings: deterministic development tools

The history of automated support in software development began long before the use of AI models. Even in the early decades of computer science, developers recognised the need to automate repetitive tasks and detect errors at an early stage.

The first systematic approaches to automated code analysis emerged with tools such as Lint, which was developed in 1978 for the C programming language. These tools work purely on a rule-based basis: they check source code against a fixed catalogue of rules and conventions. A linter recognises unused variables, missing brackets or violations of naming conventions, for example. Modern representatives such as ESLint, Pylint, and FindBugs have refined this approach and can also recognise more complex patterns – such as problematic code structures or frequent sources of error.

At the same time, tools for automated code transformations were developed. Refactoring refers to the structured transformation of code to improve its quality and readability without changing its behaviour. Modern development environments (IDEs) offer sophisticated refactoring functions based on a deep understanding of programming language grammar.

The strength of these deterministic approaches lies in their predictability and speed: they operate according to fixed rules and deliver reproducible results. However, their capabilities are limited by the explicitly programmed rules.

2. The rise of machine learning in software development

2.1 The first wave: code completion with ML (2010s)

The rise of machine learning heralded a new phase in development tools. While traditional auto-completion features in IDEs were based on simple word lists or, at best, type information, researchers and companies began experimenting with statistical models for code prediction in the early 2010s.

Microsoft integrated ML-based suggestions into Visual Studio IntelliSense early on, but the real breakthroughs came with specialised products. Kite, founded in 2016, used local machine learning models to offer context-sensitive code completions. TabNine, launched in 2019, used similar approaches and offered support for numerous programming languages.

These tools marked a paradigm shift: instead of explicitly programmed rules, models learned statistical patterns from large code repositories. They could suggest idiomatic expressions and mimic the development style of the project. However, their limitations lay in their restricted context – typically limited to a few lines or the current file – and their lack of semantic understanding, i.e. insight into the purpose of the programmed code.

2.2 GitHub Copilot as a paradigm shift (2021)

In June 2021, GitHub and OpenAI introduced GitHub Copilot, which was based on a code-specialised variant of GPT-3. Copilot marked a qualitative leap in several respects.

First, Copilot dramatically expanded the context it took into account. While earlier tools were limited to a few lines, Copilot could consider the entire current file and parts of the project structure. This enabled suggestions that were not only syntactically correct but also semantically meaningful in the project context.

Second, Copilot went beyond mere completion to true code generation. Developers could describe functionality in comments, and Copilot generated the corresponding code. This worked not only for simple utility functions, but also for more complex algorithms or the integration of APIs.

Thirdly, Copilot was based on a fundamental breakthrough in AI: transformer architectures and large language models (LLMs). These models had developed an implicit understanding of code semantics through training on huge amounts of source code. They were able to recognise patterns that were too complex to be explicitly programmed.

Nevertheless, Copilot also had clear limitations. It acted purely passively – it made suggestions, but did not execute code or use external tools. The context was expanded, but still limited. Complex cross-project refactorings or interactions with build systems were beyond its capabilities.

2.3 The chat-based era (2022–2023)

The launch of ChatGPT in November 2022 paved the way for a range of new applications for large language models. For software development, this meant a transition from passive completion to active conversation about code.

Large language models such as ChatGPT and Claude quickly became popular tools for developers. They could not only generate code, but also explain, debug and restructure it. Developers could ask complex questions and receive detailed, context-sensitive answers.

Products such as GitHub Copilot Chat, introduced in December 2022, integrated this chat functionality directly into IDEs. Cursor, an evolution of VS Code with deep AI integration, offered similar features with even stronger context awareness.

This phase brought a new quality of interaction: developers could work iteratively with the AI, refine suggestions and discuss alternative approaches. The AI evolved from a pure tool to a pair programming partner.

Nevertheless, fundamental limitations remained. The systems could suggest code, but not execute it. They could describe tests, but not run them. Integration into the actual development environment was limited to reading and writing text. True autonomous action was not yet possible.

3. Agent-based systems: The next stage of evolution

3.1 What characterises agent-based systems?

Agent-based systems represent a fundamental advancement: they combine the language capabilities of LLMs with the ability to actually act within the development environment. Four core characteristics define what makes an ‘agent’:

Autonomy: Whereas earlier systems waited for direct instructions and made suggestions for manual implementation, agents can now plan and execute multi-step tasks independently. Instead of ‘Here is the code you could insert,’ they act according to the pattern ‘I have made the change, run the tests, and discovered an error, which I am now correcting.’

Tool usage: Agents have access to the same tools as human developers – compilers, test frameworks, debuggers, version control, file systems, terminals. They can not only generate code, but also execute it, analyse build errors, run tests and respond to their results.

Iterative action: A key feature is the ability to perform feedback loops. If an agent generates code that results in a compiler error, it can analyse the error, identify the cause and make a correction – often without human intervention. This enables a trial-and-error process similar to human problem solving.

Multimodal interaction: Agents can process and produce various types of information – read and write code, interpret error messages, analyse logs, consult documentation, and even analyse screenshots of UI issues.

Specific examples illustrate the range of approaches: Claude Code works entirely on a terminal basis and integrates seamlessly into command line workflows. Cursor, on the other hand, offers deep IDE integration and enables agent-based functions directly in the editor. Both approaches have their strengths depending on developer preference and the task at hand.

3.2 Standardisation: The Model Context Protocol (MCP)

A key challenge of early agent-based systems was the lack of standardisation. Each system implemented tool integration in its own way, leading to fragmentation and limited interoperability.

The Model Context Protocol (MCP), developed by Anthropic and released as an open-source standard in November 2024, addresses this challenge. MCP defines a uniform interface through which agents such as Claude Code can communicate with a wide variety of development tools.

The way it works is simple: for each tool or system – such as Git, a database or a cloud API – a small MCP server is implemented that makes its functions available via a standardised interface. The agent (such as Claude Code) then no longer needs to know how each individual tool works in detail – it communicates with all of them using the same protocol.

A concrete example: Instead of Claude Code requiring separate, specific code for Git, Jira and the company database, there is one MCP server that encapsulates this functionality. Claude Code uses all three via the same standardised interface.

The advantages: Tool developers only need to implement one MCP server, which then works with all MCP-compatible agents. Companies can develop their own MCP servers for their internal systems and integrate them seamlessly. Compared to previous proprietary approaches, MCP offers greater flexibility and vendor independence.

4. Classification and outlook

4.1 Qualitative leaps in evolution

Looking at developments over recent decades, several fundamental transitions can be identified:

From deterministic to probabilistic: Early tools worked with fixed rules and guaranteed outputs. Modern AI-based systems are probabilistic – they generate probable solutions based on learned patterns. This brings new capabilities, but also new challenges in terms of reliability and predictability.

From reactive to proactive: While traditional tools only responded to explicit instructions, agents can independently identify problems and suggest solutions. For example, an agent might notice that an implementation does not meet performance requirements and proactively suggest optimisations.

From isolated to tool-integrated: The transition from pure text generation to integration with the entire development environment is perhaps the most significant change. Agents are no longer isolated assistants, but full-fledged participants in the development process.

From passive to active: The ability to create multi-step plans, implement them independently and respond to feedback transforms the role of AI in software development from a tool to a collaborator.

4.2 Open questions and challenges

Despite impressive progress, significant challenges remain:

Reliability and fault tolerance: Agents make mistakes. They can generate code that contains subtle bugs, make incorrect assumptions about the codebase, or fail in edge cases. The probabilistic nature of LLMs makes absolute guarantees impossible. This requires robust testing strategies and human oversight, especially in critical systems.

Security in autonomous system access: If an agent can execute arbitrary commands, security risks arise. A compromised or misinstructed agent could delete data, expose access credentials, or make unintended changes. Sandboxing, permission management, and audit logs are important security measures, but their consistent implementation in practice remains challenging.

Limits of autonomy in critical systems: In safety-critical areas – such as financial systems, medical software, or infrastructure controls – complete autonomy is problematic. Clear boundaries between autonomous actions and human-approved changes must be defined.

Explainability and traceability: When an agent makes a complex change, it must be possible to trace why it made that decision. This is important for debugging, code reviews, and compliance.

Costs and resource consumption: Agent-based systems with many iterations can incur significant API costs. This limits their applicability, especially in smaller organisations or with frequent use.

4.3 Practice and corporate perspective: The path to productive use

Systematic knowledge building in practice

We have been actively working with agent-based systems for some time now. We have launched a dedicated initiative to evaluate their specific benefits in software development in a structured manner. We are continuously gathering experience in internal projects and our product development – specifically, we use Claude Code for various development tasks. This practical application is essential: only by actually working with the tools can we develop a feel for their strengths and weaknesses and understand how to use them effectively.

At the same time, we are creating internal guidelines and best practices that systematise our team’s knowledge. These guidelines document successful patterns, identify common pitfalls and define quality criteria. These guidelines are continuously evolving as we gain more practical experience.

Insights from practical work

Several key findings have already emerged from this ongoing initiative:

Successful use requires experience: agent-based systems are highly sensitive to how tasks are formulated. Precise wording with a clear context achieves significantly better results than vague queries. This skill – often referred to as ‘prompt engineering’ – must be actively developed and cannot be taken for granted. To this end, we invest in internal training and knowledge transfer. Developers share their experiences in regular sessions, and new team members are systematically introduced. This continuous learning process is essential, as the technology is developing rapidly.

People remain central: People – and their experience, expertise and judgement – remain the primary and essential component of development work. Agents are powerful tools, but they do not replace developers. They shift the role: from writing every line of code to orchestrating, defining goals, evaluating solutions and ensuring quality. AI-generated code must be rigorously reviewed – code reviews are not optional, but critical.

Selective adoption pays off: Not every development task is equally suited to agent-based support. We have identified where its use is particularly productive – for example, in repetitive refactorings or the implementation of clearly specified features. At the same time, there are areas where we rely on traditional development: architectural decisions, complex algorithms or safety-critical components.

Legal aspects in customer project deployment: Deployment in customer projects brings additional complexity. Legal aspects must be clarified: Who is responsible for AI-generated code? How do we deal with licensing issues? What data protection regulations apply when code fragments are sent to external APIs? These questions require clear contractual arrangements and transparent communication with customers.

Conclusion: Active engagement as a competitive advantage

Agent-based systems will change software development forever. Companies that actively engage with the technology now, build up knowledge and develop best practices will gain a competitive advantage. It’s not about jumping on every bandwagon, but about learning pragmatically and systematically where and how these tools create real added value.

The combination of powerful technology and experienced developers who know how to use it effectively opens up new possibilities in software development – without calling into question the fundamental importance of human expertise.

Benefit from our experience

Our development teams already use agent-based systems productively – and bring this expertise to your projects. For you, this means more efficient implementation, modern development practices and partners who don’t just talk about these technologies, but use them every day.

Whether you’re looking to develop something new, modernise existing systems or undertake digitisation projects – get in touch with us.