Building a Sports Field Management LLM Assistant
There’s excellent information out there about sports field management, but it’s scattered across university extension websites and lengthy PDFs. What if we could make it easier to find answers? I built an LLM assistant that answers questions about sports turf management, grounded in trusted university extension documents. Try it out right here.
Sports turf assistant
Sports turf management resources and expertise at your fingertips.
It’s a RAG app
This app uses retrieval augmented generation (RAG) to improve LLM quality and reliability. RAG combines:
- A retrieval system that searches through all the given documents about sports field management to find the most relevant information for the user query
- A generative LLM that forms a coherent answer with all those pieces of information
General LLMs like ChatGPT or Claude may provide incorrect information about specialized topics, like sports field management. The advantage of this RAG app is that it always starts with verified information from university extension sources, provides direct links to the source documents, and focuses on practical, actionable advice. There are tons of articles and tutorials about building RAG systems—see Pinecone or LangChain to learn more. Let’s look at how it works in this specific application.
How it works, with an example
Let’s walk through what happens when you ask the assistant about the best grass type for low-maintenance fields.
First, your question is converted into a numerical representation (embedding) that captures its semantic meaning.
The system uses these embeddings to search through the database of extension documents and find the most relevant chunks of text. For this query, it retrieves sections about grass species selection and low input management.
The retrieved chunks are formatted into a prompt for the LLM. The prompt also includes guidance on desired tone, and the original question about what grass type is best for low maintenance fields (not shown). This long prompt is sent to the LLM.
The LLM generates an answer based on the documents provided, and the app formats metadata like links to relevant source documents.
The tech stack
1. Knowledge base
I sourced documents from university extension websites and sports field management guides. Most of the material came from the Cornell Sports Field Management site and the Cornell Guide for Commercial Turfgrass Management (PDF). These are the ideal materials because they are up to date and contain practical advice you can’t find elsewhere.
2. Processing the documents
I used Firecrawl to scrape the Cornell Sports Field Management site and convert the content to markdown. I used Marker to convert PDFs to markdown, too. I split the documents into smaller chunks by section headers, so I ended up with a few hundred text chunks. Sentence transformers models created numerical embeddings of each chunk, which were stored in a Pinecone vector database.
3. Creating the backend functionality
The backend provides a simple FastAPI endpoint that receives a question and returns an answer, following the steps shown in the diagram above. I used GPT-4o for the LLM. The same sentence transformers model is used for encoding incoming questions and searching for documents.
4. Creating the frontend UI
The frontend is a React component with a couple example questions to help users get started. Responses include nicely formatted links to the original documents for further reading.
Given more time, I would add more documents to the knowledge base, like the SFMA’s BMPs for the Sports Field Manager and comprehensive warm-season sports field guidance. The app could be much more interactive with an ability to provide feedback and iterate on the answer. Right now it handles questions one-at-a-time. Across the source information there are many useful tables and maintenance calendars. It would be useful to have the app generate formatted calendars or schedules based on user inputs.
Scaling turfgrass knowledge
The knowledge base includes about 60 web pages and couple hundred PDF pages. This is quite a small amount of information, and an interested person could simply read through them completely. It wouldn’t take that long. That said, the app is a nice way to get started and get pointed in the right direction.
RAG techniques really shine when there are thousands or millions of documents to sift through. Imagine being able to talk to the entire Turfgrass Information File or everything in the SFMA website. The tools are ready for a massive corpus of turfgrass knowledge. The biggest challenge is probably the time required to process all the text into a consistent format, especially the PDFs.
Thank you to Carl Schimenti and Frank Rossi for providing helpful feedback on early demos.