Published: Dec 11, 2025|22 min read22 minutes read
AI code assistants are versatile tools; however, for them to be truly helpful, developers need to set some ground rules. In this article, I have gathered my own rules and best practices that accomplish just that.
I've been coding with AI models since the first Copilot release, and I've watched them grow from simple autocomplete to something that can solve genuinely hard problems. They're faster, more capable, and more accurate every month. But despite all the hype around intelligence and benchmarks, there's one thing even the best models still struggle with: following rules, especially over long-running tasks rather than single-shot generation.
As a backend engineer working in the JVM ecosystem, primarily Kotlin for the past six years, I see this even more clearly. Most of my experience comes from the insurance and e-commerce domains, where reliability and precision matter a lot. Kotlin isn't as common among AI-focused tools as languages like TypeScript, so the models often need a bit more guidance. Still, it's completely manageable once you understand how to work with them.
By rules, I mean small sets of explicit instructions that guide an AI coding model’s behavior during generation. Something between coding style guides, architectural constraints, and task-specific prompts. Rules act as a persistent contract that the model should follow across longer sessions, not just in a single prompt.
Most modern coding tools support them in various forms (e.g., “project rules”, “agent instructions”, or “system prompts”). Cursor describes them here; the substitutes of rules in Claude code will be agents or skills.
That’s why I treat AI coding agents as a mix of pair programming alternative, brainstorming partner, and, most of all, autocomplete on steroids. And like any autocomplete, the final code will only be as good as your own engineering skills. If you want consistent, predictable, production-grade output, you have to inject your best practices directly into the model through rules.
What I’m sharing here is my battle-tested approach to writing effective rules for AI coding tools. I use it across Cursor, Windsurf, Claude Code, and basically anything that depends on LLMs to generate code. These rules won’t magically make the output perfect, but they will get you to production quality much faster. Instead of rewriting half the file, you’ll be doing small, controlled refactors.
This is one of the most important lessons I’ve learned: rules only work when they are small, narrow, and scoped. LLMs have a hard time following too many instructions at once. There’s no magical number, but the pattern is obvious - the fewer rules you load into context, the better the result. This doesn’t mean compressing everything into cryptic one-liners. It means your rules should follow the same principle as good code: SRP (Single Responsibility Principle).
In my Kotlin + Cursor projects, I split rules by layers and by specific use cases, since each part of the codebase follows different conventions and has different expectations. I store all these rules in the .cursor/rules directory at the root of the project, using .mdc files. This is how I usually structure it:
General rules: project context, tech stack, and a handful of rules that apply everywhere.
Repository rules: access patterns, naming, and query practices.
Integration test rules.
Unit test rules.
This separation is not just “clean”; it prevents contradictions. If you’re writing unit tests, your agent shouldn’t even see integration-test rules because they contradict by design. Mocking is a great example. Integration tests generally avoid mocks because they focus on real interactions between components, while unit tests commonly use mocks to keep the tests isolated. Mixing those rules in one file guarantees the model will get confused.
Think of it as cognitive overload. While working on a specific task, humans don’t keep unnecessary instructions in mind, and neither do models. The more relevant information the AI has to deal with, the higher the probability it will do exactly what you want.
One very useful technique here is using glob expressions in Cursor. For example, *.kt for Kotlin files, or **/*Test.kt for test rules. This lets your tool automatically load the right rule file depending on the file you’re editing.
AI rules in Claude Code
In Claude Code, I follow the same principle, but instead of file-based rules, they use specialized agents, each with its own narrow set of instructions. CLAUDE.md acts as a general rule file, containing only the core project context and simple routing rules that tell Claude which agent to use for a given action – whether it's writing business logic, unit tests, integration tests, or working on endpoints. To begin, I recommend using the /agents command to set up your first agents.
In CLAUDE.md, I define straightforward routing rules like:
1If the user requests unit tests, use UnitTestAgent.
2For endpoint work, use EndpointAgent.
3For business logic, use BusinessLogicAgent.
This setup works extremely well because each agent operates inside its own isolated context, and the main agent delegates tasks to the appropriate specialist. As a result, every agent processes only the rules that matter for its specific responsibility, avoiding contradictions and keeping responses clean and deterministic.
Use strong words for important rules
Not all rules are equal. Some matter much more than others, and the model needs to know that. When you say “prefer”, “try to”, or “maybe”, you’ve just left the final decision to the AI, and that’s not what you want. If a rule is critical, you have to mark it as critical.
One of my most important rules, applied to many projects, is simple:
NEVER write comments in code. Without this rule, AI happily litters code with random comments, so your code can end up looking like this:
Models follow instructions better when the wording is clear and direct. Strong phrases like “always”, “never”, or “important” help because they leave less room for the model to guess what you mean.
Markdown is perfect for rule files. It’s simple, readable, and more importantly, LLMs parse it naturally. Use headings to create clear sections, and keep each section limited to one thing. A rule file that is clean and structured is much more likely to be followed than a wall of text.
How to write general rules for a new project
Whenever I start a new project, the first rule file I create is the General Rules. This file has to be curated carefully, as it applies to every part of the codebase. Only the essentials go here – nothing that could contradict layer-specific rules. Let’s use a simple example: a Cinema Reservation System. My general rules usually have four sections:
1. Project summary
Every General Rules file starts with a short introduction – three or four sentences describing what the project actually is. This gives the model a domain vocabulary before it generates anything. Even a simple phrase like “Cinema Reservation System” sets expectations.
If you tell AI:
1Design a new endpoint for seat reservations
You haven’t specified what kind of seat or what the reservation is for, yet the summary gives the model enough context to infer: “Okay, it’s probably seats for a movie showing, so I’ll expect movieId, seatId, maybe auditoriumId…”
This avoids a massive amount of ambiguity. Words like "reservation", "seat", "ticket", or "session" carry completely different meanings in different domains, unless you define the domain upfront.
Example:
1This is a Cinema Reservation System that allows customers to browse
2movie showings, select seats in auditoriums, and complete bookings.
3The system handles multiple cinemas, each with several auditoriums
4showing different movies at scheduled times.
5
6Customers can view available seats, make reservations,
7and receive booking confirmations.
2. Tech stack
This section lists the most important technologies in the project. Without this, the model may happily generate code in whatever tech it feels like. If you expect Spring Boot and Kotlin, but the prompt doesn't say it, it might generate an Express.js endpoint or a Next.js API route. The Tech stack section removes that randomness.
Example:
1- Backend: Kotlin + Spring Boot
2- Database: PostgreSQL with JPA/Hibernate
3- API: REST with OpenAPI/Swagger documentation
4- Testing: Kotest + MockK for unit tests, Testcontainers for integration tests
5- Build: Gradle with Kotlin DSL
3. Project structure
In this section, I describe the directory layout, layers, naming conventions, and how the project is organized. The idea is to let the model navigate the codebase correctly without needing frequent directory listings. If your project already has a defined structure, the model should follow it automatically.
Example:
1src/main/kotlin/cinema/
2├── api/ # REST controllers and DTOs
3├── domain/ # Domain entities and value objects
4├── application/ # Business logic layer
5├── repositories/ # Data access layer
6└── config/ # Configuration classes
7
8Naming conventions:
9- Controllers end with Controller (e.g., BookingController)
10- Use cases end with UseCase (e.g., CreateBookingUseCase)
11- Repositories end with Repository (e.g., BookingRepository)
4. General rules
Finally, this is where I put the few important rules that apply everywhere. But be very careful not to overload this section - too many global rules will conflict with the more specific ones.
Example:
1NEVER write comments in code.
2Code should be self-explanatory through clear naming.
3
4NEVER put business logic in controllers or repositories.
5Business logic belongs in the domain layer only.
6
7ALWAYS use meaningful names that reflect the domain language:
8Booking, Showing, Auditorium, Seat,
9not generic names like Item, Record, or Data.
Notice how each section is focused and practical. The project summary establishes domain vocabulary. The tech stack prevents wrong technology choices. The structure shows where things go. And the general rules are limited in number but strong - only the ones that truly apply everywhere.
Rules aren't just about code style or architecture. You can also use them to shape how the agent behaves with its tools. For example, you can tell it:
1If the user mentions the library, assume it is already installed and use it.
2Check or install it only if the user explicitly asks.
This prevents the agent from asking the same thing over and over again - "Do you want to install this?" or wasting time attempting unnecessary installations. You can also tell it to always read certain files before generating output, or to confirm assumptions before making changes. I place these rules either in the general rule set when they apply broadly, or in specific rule files when they target a particular area. This is where coding rules start evolving into agentic rules.
No matter how good your rules are, at some point, the model will break them. This is normal. Treat it the same way I treat failures in distributed systems - not "if", but "when".
Usually, the failure comes from:
Too many rules were loaded at once.
Contradictory rules.
Weak wording.
Missing domain context.
The model simply loses track in a long session.
When this happens, here are the approaches I recommend:
1. Start fresh
Sometimes the fastest way is simply to start a new session or a chat window. When the context becomes too polluted or the model has drifted too far from your intentions, the fastest fix is to restart the conversation with a clean slate. Long contexts accumulate noise, and a fresh start often fixes the problem immediately.
2. Fix the rules yourself
Often, the fix is as simple as:
Splitting a rule file into two smaller, more focused ones.
Rewriting the instruction with stronger, more direct language.
Removing rules that are no longer relevant or are too generic.
3. Commit to git regularly
This is also why I recommend committing your code changes to git often. When things go wrong, you don't want to rely on external versioning provided by tools like Cursor or other AI coding assistants. I've encountered bugs in those internal versioning systems before, and there's nothing worse than losing good work because you trusted a tool's built-in undo. Git gives you real, reliable version control that's independent of whatever AI tool you're using.
4. Ask the LLM itself
Another way to debug rule failures is to ask the model directly. You can prompt:
1I noticed you're not following rule X.
2Do you see any contradictory rules that might be confusing you?
or
1What rules are currently loaded in your context? Are any of them conflicting?
LLMs are surprisingly good at introspection. Often, they'll point out exactly which rules are fighting each other, or which instruction is overriding your intent. This saves you from manually hunting through rule files.
This is another important point: don't try to write universal rules. Projects differ, even inside the same company, even within the same language. Different teams have different styles, different naming conventions, and different architectural decisions. Your rules should reflect your actual project, not a generic theoretical standard.
That's why I advise that you refrain from using sites like cursor.directory – it's a repository for big files with whole tech stacks baked into it. These monolithic rule files go against everything I've said about keeping rules small, scoped, and contextual.
Instead, start small. Build your rules incrementally based on what your project actually needs. Let them grow organically as your codebase evolves. That's how you get rules that actually work.
Also, I’m skeptical about AGENTS.md – it doesn’t address situations where your own rules can conflict with each other inside a single tool, or where rules are too long.
Here’s the part that surprises most developers: I rarely write rules manually, but use LLMs to do that for them. I have a meta-prompt that contains rules for writing rules. That prompt was also created using an LLM. It’s long, structured, and designed to extract patterns directly from your codebase. This is much faster and far more accurate than writing everything by hand. You can find my meta-prompt here.
The process is simple. If I want to generate rules for unit tests, I provide two or three existing test files and ask the model to extract conventions. LLMs are extremely good at pattern recognition, even patterns you don’t realize you follow. And once the rules exist, I can iterate or regenerate them whenever the conventions evolve.
I strongly recommend versioning your rules; having the ability to revert to the version that worked best is incredibly helpful. Git is your friend.
Always use thinking models when you use complicated prompts like this for the best results
How I use my meta-prompt
There are two main ways to use it:
1. Writing rules from scratch
If I give the model 2–3 unit test files, it will detect naming patterns, mocking patterns, Given-When-Then structure, and turn those into rules.
Example prompt:
1Use @self-improve.mdc to create unit test rules.
2Use @unit-test-a.kt and @unit-test-b.kt as examples to infer patterns.
2. Adjusting existing rules
As your project evolves, rules should evolve too. I use the meta-prompt to update existing rule files based on adjustments discussed during development.
Example prompt:
1According to adjustments in this conversation,
2improve @unit-test-rules.mdc to follow the conventions mentioned.
Writing good rules is a cornerstone of efficient engineering. When you give them small, scoped, well-designed rules, AI tools become dramatically more useful, predictable, and consistent. The better the rules reflect your actual codebase, the more the AI feels like a teammate, less like a hallucinating intern.