Coding with AI: The Good, The Bad, and The Ugly
My experience coding with Cline, Cursor and Windsurf
You know that feeling when you’re the head of engineering, leading a team so small it could fit in a phone booth? That was me a few weeks ago, diving headfirst into Pacify’s infrastructure challenges. Picture Sisyphus, but instead of a boulder, it’s a tangled mess of CI/CD pipelines, manual deployments, and AWS configurations.
Starting mid-replatforming, I had to juggle understanding the current architecture, preparing for the new one, and navigating product requirements — all while ensuring I didn’t burn out. On top of that, I gave myself an extra challenge: to see how much AI could help not just me, but also my team, work smarter. This is the first chapter of what I hope will be a series of reflections on that journey.
Setting the Stage
Before we get into the AI drama, let me paint the picture. Our current infrastructure comprises Serverless Framework-based APIs for a React web app and two native mobile apps (Kotlin and iOS). The APIs live in a monorepo, but we only had deploy workflows — no CI. Deployments required manual versioning, and infrastructure changes were made directly in AWS. While a lot of good practices were in place, the workflow screamed for automation and documentation.
Enter AI. Like many this past year, I’d dabbled with GitHub Copilot for code suggestions, but I wanted to push the envelope. What if I could turn AI into a true productivity multiplier? Could it not just assist with feature development, but also tackle process automation and infrastructure management, all while staying secure, scalable, and documented?
Meet the Cast: Cline, Windsurf, and Cursor
Diving in, over the past three weeks, I used three AI agents — Cline, Windsurf, and Cursor, all VS Code based (plugin or VS Code forked IDEs). Here’s a quick overview of what each brought to the table:
Cline: The meticulous architect. For some reason, Cline, although being a VS Code plugin as opposed to a full fledged IDE like the other two, resonated with me the most in its approach. The way it went about epxlaining its reasoning, then asking for permissions to access specific files, then editing them, and showing the diffs (which all of them do with various degrees of success) gave me my first AI induced “Ohhhh Sheeetttt…” moment! It shone in complex workflows like setting up CI/CD pipelines and database migrations. It was excellent at handling environment-specific configurations and ensuring separation of concerns. However, it sometimes stumbled on edge cases like branch pattern handling and missed opportunities to suggest better caching or rollback strategies. When the other two failed, I came back to Cline over and over as a more trusted partner, but man, seeing the API credits get eaten up in real time is quite the sight but it was so good at what it did that I re-upped from my initial test $25 and I feel like it’s money well spent!
Windsurf: The debugger. Also the baby of the paclk. I discovered it thanks to to this excellent Pragmatic Engineer’s article that summarized the latest AI infused IDEs engineers love. Windsurf excelled at iterative problem-solving for issues like Slack notifications or infrastructure as code. It was great at maintaining context and refining solutions, but sometimes over-engineered simple problems or missed the forest for the trees, requiring multiple iterations to get things right. I would say that I trusted it the least based on my experience in getting things right in Cascade, its Cline like Agent. I still have a few days left on my trial so we’ll see how it goes.
Cursor: The educator. Cursor was phenomenal at providing in-depth explanations and targeted solutions, especially for tricky integrations like Twilio call management. Its interactive nature made it feel like pair programming with a senior engineer. But complexity overload and occasional miscommunications reminded me that AI still needs human oversight. It’s actually after using Cline for a while that I discovered that Composer existed, I spent a lot of my initial time messing in the Chat interfact which is cool, but I am here for the “do things for me and let me review” action that Cline got me used to. Like Simon Willison, the co-creator of Django mention in again, this highly recommended Pragmatic Engineer podcast episode on AI tools for software engineers, “there is no manual for these things”.
While each had its quirks, they shared some strengths: clear documentation, iterative problem-solving, and a knack for turning vague questions into actionable insights. However, their weaknesses highlighted a common theme, perfectly highlighted in Addy Osmani’s latest article on the topic — AI gets you 70% there, but that last 30%? That’s where the real work (and sometimes frustration) begins.
The Good
Based on my very recent experience then, here’s where I found AI to shine the brightest. I came to this conclusion by switching for the same task between the different platforms, depending on if a particular tool got stuck on a tasks, on all 3 Irelied on Claude Sonnet 3.5 model, known for how good it is at handling code.
1. Infrastructure Overhaul
With Cline, we turned a chaotic, manual setup into polished infrastructure-as-code using CloudFormation. Multi-environment configurations that used to involve manual tweaks were now seamlessly managed, OIDC-based authentication became the new standard, and deployments no longer required a nerve-wracking pre-deploy ritual. It felt like having an expert devops engineer by my side — one who not only knew the blueprint but could also adapt it to my team’s unique quirks. For example, when setting up GitHub Actions for deployment, Cline helped craft environment-specific configurations that saved hours of debugging down the line.
Cursor, while less specialized in infrastructure, complemented this by ensuring our documentation was robust. It automated the creation of setup guides, troubleshooting documents, and IAM policy references — turning what used to be an afterthought into an integral part of our process. Cursor’s thoroughness ensured that even my future self would thank me.
Windsurf occasionally chimed in too, especially when debugging tricky environment variable mappings. Its iterative approach helped pinpoint subtle configuration mismatches that could have otherwise taken hours to unravel
2. Debugging Simplified
When it came to debugging, Windsurf truly earned its stripes. Fixing Slack notifications required diving into YAML formatting and handling multiline strings — a task that felt like unraveling a ball of yarn tied by a hyperactive kitten. Windsurf’s iterative problem-solving and documentation-first approach turned this daunting task into something manageable. Its ability to maintain context across changes meant we could quickly revert or tweak configurations without losing our way.
Cline also played a role, especially when troubleshooting deployment workflows. For instance, it flagged subtle inconsistencies in path filtering logic that were silently breaking builds. Meanwhile, Cursor contributed by breaking down complex errors into understandable explanations. Its knack for identifying patterns in code allowed us to isolate issues like version detection edge cases in our monorepo setup.
3. Clearer Documentation
We’re dealing with LLMs here, so all 3 tools are great at it and you should not hold back at all. Cursor turned into a one-person documentation team. From AWS troubleshooting guides to semantic release workflows, it created resources that not only benefited the current team but also our future selves. Documentation isn’t just flossing anymore — it’s a dental hygienist doing the job for you. I’ve come to rely on LLMs know for all of my engineering documentation, git commits, PR templates and have even started writing some tools to help with that workflow but more to come on that.
The Bad
It wasn’t all smooth sailing though, as I mentioned above, the reason why I ended up switching betweem 3 different tools outside of plain curiosity is that they are not perfect, each has its own quirks.
1. Over-Engineering
AI sometimes proposed overly complex solutions when simpler ones would do. Windsurf, for example, suggested intricate GitHub expressions instead of just reading the action’s documentation more carefully. As I mentioned here, this is also tied to your ability to prompt, which as others smarter than me have noted, is really where previous experience helps enhance your ability to get the best out of the LLMs. If the code writing part of our trade is going the way of the dodo, since LLMs will be doing most if not all of it now, most of our time should be spent on writing the correct prompt, giving the most precise context the LLM needs to do its job properly.
2. Tooling Gaps
Cline occasionally missed opportunities for optimization, like suggesting GitHub Actions caching or leveraging TypeScript for better type safety. This left me wondering, “Why didn’t I think of that earlier?”. Also Cline tended to run into Claude’s rate limiting timeouts a lot, forcing to switch to another of the editor to continue on my task. There is also the context of price here. I ended up paying the annual member for Cursor because I believe it to be best balance of price/quality over time but I have to admit that I am not totally familiar or understanding of the pricing model. I just know that, as opposed to Cline, $20/month for “unlimited” access to an AI Agent is something that I will need from now on, and the a la carte approach of Cline can prove to be too onerous. I am still a noob here though, so more to learn in the months ahead.
3. Contextual Missteps
Cursor’s assumptions sometimes led us astray, especially when it lacked a full understanding of the codebase. A Python vs. Node.js confusion during a webhook integration set us back before we course-corrected. Let’s jump back to our first point about fighting back the excitement and spending time ahead carefully prompting at least your initial prompt, believe me, it will save you a lot more time in the long run. I think it’s also a great thing because it emphasizes something we’ve always being told in Engineering, which to spend more time ahead on design rather than jump straight into coding, i.e measure twice to cut once. It’s one of the lessons that you will learn here once the LLM loses context and starts taking you down a rabbit hole.
The Ugly
And here’s where things got downright frustrating, and this constitutes the bulk of the remaining 30% I mentioned above.
1. Hallucinations
Sometimes, the AI just made things up. Cursor once generated a GraphQL query structure that looked great — except it wasn’t valid. Testing and verification became non-negotiable. I had shared the link to the GraphQL documentation assuming to it would have picked it up correctly for the task at hand but realized I had not being specific enough. Because of course once I shared the explicit schema, it was able to generate the correct query flawlessly.
2. Incomplete Solutions
Working with Cline on automating a database migration process, its proposed workflows often worked in theory but failed to account for edge cases like database rollbacks or branch-specific configurations. It felt like building IKEA furniture with one missing screw at times.
3. Complexity Overload
Windsurf’s debugging guidance could be overwhelming, mixing shell scripting, YAML, and Node.js in ways that made my head spin (and of course confused Github itself). It was a stark reminder that while AI accelerates work, complexity doesn’t go away — it just changes form.
The 70% Problem
Reflecting on this journey, the key insights from Addy Osmani’s “The 70% Problem” really resonated with me. AI consistently got me 70% of the way there. It turned tedious tasks into manageable ones and surfaced solutions I might have overlooked. It actually brought back the passion and the fun in coding that I had lost a few years back and made my transition to engineering management easier. All of sudden, I am not spending time in the tedium of boilerplate on my way to implementing a feature. With AI, I spend most of my time code reviewing, tweaking and finally, committing, creating PRs and shipping. That 70% is the Wow piece, but the last 30% — resolving edge cases, verifying assumptions, and ensuring real-world viability — is harder than ever. Sometimes, I was fighting not just the problem but my own ignorance amplified by the AI’s hallucinations.
What’s Next
This experience got me thinking about what is next for this industry. I will try to capture all of this unformed thoughts yet in the following bullet points, which might be the topic of future articles:
Software engineering is going to look very different a year from now that it does right now with the impact of AI Coding agent. I expect smaller teams to be able to punch above their weight in delivery thanks to AI.
As my brother Ady Ngom said, engineers will transition for coders, i.e people producing lines after lines of code, to code directors, a la Hollywood. We’ll set the scene (it’s already quiet on set) and get to the Action/Cut sequence with our agents.
People, we’re almost in 2025, I was very skeptical about the impact of AI at the beginning of the year, but now I am a convert. All this prompting/writing is quite tiring. Next (and I must already be late when I say this), since we already have multimodal LLMs out there, it would be a lot simpler for me to just say this stuff out loud to my IDE. We’ll truly be entering the era of code whisperers.
With pure code writing out of the way with LLM generated code, I found the tedium of coding brought back under another form: process. By that I mean, creating relevant commit messages, PR, filling out PR templates and the likes and this is where I am expecting the next burst in the industry, tooling. Committing your changes with the right message capturing what as actually been done, and explaining it in a PR template should be a one command thing. Of course I have started building such a tool. LOL.
AI’s impact on software development is soon to become undeniable, but what about interviewing? As segfaulte’s article highlights, traditional coding challenges are becoming obsolete in the age of AI, something I ran into in my latest foray into tech interviewing. Should interviews focus more on problem-solving frameworks, collaboration, and decision-making rather than brute-force coding? This might just be the topic for my next blog post.
For now, my key takeaway is this: AI isn’t replacing engineers. It’s amplifying us. It’s making it fun to code again and reducing that gap between idea and implementation. But like any amplifier, it amplifies the good, the bad, and, yes, the ugly. So, here’s to building smarter, debugging faster, and soon literally yelling at a machine when it tries to outsmart us.
Would love to hear your thoughts. Have you had your own Tony Stark moment with AI? Let me know in the comments!



