Semantic Search, Repos, and Evals: The Three Primitives of AI Product Work
What is a semantic layer in AI applications?
A semantic layer turns a database into something you query by similarity rather than by exact match.
Instead of looking up a specific ID, you ask for what is closest in meaning to a sentence.
Query example: "the best clothing line for sports" returns the rows whose descriptions are most similar to that phrase.
It lets teams go beyond filtering on fields and actually read into the content of a field.
At Lemlist, this shift is reshaping GTM workflows: filtering a company database by industry was fine in a pre-GenAI world, but operators now want to semantically understand the description of the company to apply their own reasoning to who to contact.
Why do clean data warehouses matter for AI products?
Two years ago prompting was a competitive advantage.
Today it is table stakes.
The advantage now belongs to teams with clean data and a strong semantic layer on top of it , and to teams that have learned to delegate.
Levels 0–2: basic chat, browser, desktop GTM apps — ad hoc research, quick copy drafts.
Levels 3–4: projects and connectors wired into the CRM, Marketo, HubSpot, Salesforce.
The quality of your taste, your evaluation, and your repository now dictates the quality and quantity of work you generate.
Garbage in, garbage out still holds — fix the inputs before layering on sophisticated AI projects.
“Two years ago, like, prompting was a competitive advantage. Today, it's become kind of table.”
What are repos and how do AI PMs use them?
A repo in this context is any repository — it does not have to be GitHub — where teams store reusable files an agent can pull into context.
A common pattern: keep a tone of voice file in the repo, then tell the agent to look it up so a drafted email stays consistent with branding guidelines.
At Lemlist, every PM now has a dev setup identical to the tech team's and access to the entire codebase.
The most life-changing workflow the team's PM describes is "chat with codebase" — simply asking questions of the codebase to understand how ten years of software and legacy have compounded.
“The quality of your taste and eval evaluation, but also the quality of your repository is now dictating the quality and the quantity of the work that you're generating.”
How do you set up an evals framework for AI features?
Evals are the term for answering: when there is an output, do we know how to evaluate how good it is?
Can we rank it 1–5?
Can everyone do it, or only one person?
The recommended starting point from Google's Abraham Gomez: before building, define how you would evaluate the system at scale .
If evaluation today depends on one domain expert eyeballing answers, that does not scale — so the problem becomes building the AI system plus replicating that evaluator.
Document the domain understanding that lives in their head.
Refine a CLAUDE.md-style file constantly to set the right guardrails on the development side.
Run tests on outputs continuously so the analysis is genuinely valuable and not AI slop.
Invest groundwork in prompting and model usage so you can actually deliver on the claim you are making to users.
“the skill set now is going from working on the clean output, meaning the code, to actually having a clean output because of the stochasticity of AI”
Why has the PM skill set shifted toward evals and guardrails?
The skill set is moving from working on the clean output — the code — to producing a clean output despite the stochasticity of AI .
That means spending more time on evaluations, setting up guardrails, and making sure things do not go wrong.
Generative AI pushes the feature so close to the user experience that you can literally see when users are cursing at the agent.
That tight coupling to KPIs is why evaluation has to be designed in from day one — and why prototyping with AI is now part of the PM's daily job.
“now at LEMList, all the the PMs have a dev setup, um exactly the same as if we were uh developers in the in the tech team”
Frequently asked questions.
- What is a semantic layer in plain English?
- A semantic layer lets you query a database by similarity instead of by an exact ID. You ask for something like "the best clothing line for sports" and the system returns the rows whose meaning is closest to that phrase. For GTM teams, this means moving beyond filtering a company list by industry to semantically understanding each company's description so you can apply your own reasoning to who to contact.
- Does a 'repo' have to be a GitHub repo?
- The pattern is about having a repository — any repository — where you store reusable context files an agent can read. A common example is keeping a tone-of-voice file there, then telling the agent to look it up when drafting an email so the output matches your branding guidelines. At Lemlist, PMs are set up with the same access as the dev team and use the codebase itself as the repo they chat with.
- What are evals?
- Evals are how you decide whether an AI output is good or bad. The core questions: can we rank it from one to five, can everyone do it, or only one person? The practical starting point is to define how you would evaluate the system at scale before you build. If today only one domain expert can judge the answer, your project is not just building the AI — it is also replicating that evaluator's judgment.
- Why do clean data warehouses matter so much for AI?
- Because prompting is no longer the moat — clean data with a strong semantic layer on top is. The teams moving fastest on GenAI implementation are the ones with clean data lakes and warehouses, and who have learned to delegate well to humans and now apply that muscle to machines. The quality of your repository and your evaluation directly dictate the quality and quantity of work the AI produces for you.
- How is the PM job changing because of these primitives?
- At Lemlist, the PM role has profoundly changed in two to three months — moving from the traditional discovery-to-QA flow with specs and design handoffs to spending most of the day in Claude Code chatting with the agent. The most accessible starting workflow is "chat with codebase": asking the codebase questions to understand how a decade of software and legacy has compounded into the current product.
- Why are guardrails and evals more important than the code itself now?
- Because the output is stochastic. The skill set is shifting from working on the clean output — the code — to producing a clean output despite that stochasticity, which means more time on evaluations and guardrails. Teams refine files like CLAUDE.md constantly and run tests on outputs to make sure what ships is genuinely valuable and not AI slop, especially when the user-facing claim depends on it.
