Despite the growing presence of AI and large language models (LLMs) within tax departments, spreadsheets continue to play a central role in the daily work of tax professionals. While tax departments are embracing digital transformation, many interim processes—like extracting data from ERP systems or reconciling values—still flow through spreadsheets. And despite the promise of intelligent tax tools, this reality isn’t going away anytime soon.
But with great reliance comes significant risk. High-profile cases have shown that spreadsheet errors can cost companies millions. A comprehensive academic study found that a staggering 94% of financial spreadsheets contain errors. These aren’t just innocent typos but often result in compliance breaches, miscalculations, and flawed reporting.
LLMs may offer real help in managing spreadsheets. To use them effectively and responsibly, tax professionals must understand both what these AI tools can do and where they fall short.
How LLMs Process Spreadsheets
While most LLMs can read common spreadsheet formats like CSV or Excel and answer questions about the data, they differ in two important ways: how much data they can handle at once (known as the context window) and how they process the data.
GPT‑4o has a context window of 128,000 tokens, which limits how much information it can process in a single interaction. When you upload a spreadsheet to ChatGPT powered by GPT‑4o, the model doesn’t read the file directly. Instead, it uploads the file to a secure, temporary environment that includes tools like Python and data science libraries. In this setup, GPT‑4o behaves like a Python programmer: it writes and runs code to explore your spreadsheet. It then turns the results of that code into clear, human-readable explanations. If you ask for a chart, GPT‑4o generates the code to create it and shows you the result.
Claude 3.5 Sonnet takes a different approach. It reads spreadsheet content directly as text, interpreting headers, rows, and columns without writing or running code. It currently doesn’t support chart generation or code execution, but it has a much larger context window—up to 200,000 tokens—which allows it to handle larger datasets in a single session and generate longer, more detailed responses without losing earlier information.
Based on their characteristics, GPT‑4o may be the better choice for tasks that involve complex data manipulation, calculations, or visualizations. Claude, on the other hand, is excellent for exploring and interpreting large, text-based tables, identifying patterns, and summarizing structured data, especially when working with large volumes of content that don’t require advanced computation.
But What About Limitations?
Understanding Token Limits in LLMs
LLMs have some limitations when working with spreadsheets, and the most significant hurdle is context window constraints. Think of an LLM’s context window as its short-term memory or the amount of information that can be processed in a single interaction. This information is measured in tokens, which are not the same as words. A token typically represents a few characters or parts of words. For example, 1,000 tokens is roughly equivalent to 750 words of English text.
Each LLM has a different context window size. GPT‑4o, for instance, has a context window of 128,000 tokens. Now consider a large spreadsheet with 10 columns and 100,000 rows—that’s 1 million cells. If we estimate an average of 3 tokens per cell, the total token count would be around 3 million tokens, which far exceeds the capacity of any current model, including GPT‑4o.
Even uploading a portion of such a file can push the model beyond its limit. For example, 10 columns × 20,000 rows equals 200,000 cells. At 3 tokens per cell, that’s approximately 600,000 tokens, not even counting the extra tokens needed for headers, formatting, or file structure. Since GPT‑4o can only process 128,000 tokens at once, only a small fraction of that spreadsheet can be “seen” and processed at any given time.
LLMs May “Forget” Data
When you upload a spreadsheet to GPT‑4o, the model can only interact with the data that fits within the active context window. It doesn’t see the entire file all at once but just the portion that fits within that token limit. For example, if you ask, “What is the deductible VAT amount listed in row 7,000?” but the model only received the first 5,000 rows, it won’t be able to answer because it never saw that row in the first place.
It’s also important to understand that the context window includes the entire conversation, not just your current question and the data. As the session continues and more prompts and responses are exchanged, the model may start dropping earlier parts of the conversation to stay within the 128,000-token limit. That means key data, such as the original file content, can be silently dropped as the conversation grows. This can lead to incomplete or incorrect answers, especially when your new question relies on information the model has already “forgotten.”
LLMs Don’t See Spreadsheets the Way You Do
Another limitation is that LLMs are sequence-based models. They read spreadsheets as a linear stream of text and not as a structured, two-dimensional grid. That means they can misinterpret structural relationships and cross-sheet references between cells. LLMs don’t automatically recognize that cell D20 contains a formula like =SUM(A20:C20). Similarly, they may not realize that a chart on “Sheet1” is pulling data from a table on “Sheet2,” unless this relationship is clearly described in the prompt.
LLMs and the Limits of Tax Law Understanding
Finally, LLMs don’t truly “understand” tax law. While they’ve been trained on large volumes of publicly available tax-related content, they lack the deeper legal reasoning and jurisdiction-specific knowledge that professionals rely on. They can easily make obvious mistakes like not flagging penalties or entertainment expenses as not eligible for input VAT deduction because they are not aware of country-specific rules, unless such rules are explicitly stated in the prompt. As a result, they can produce plausible but incorrect answers if relied on without expert review.
How to Use LLMs Effectively with Spreadsheets
When using LLMs to work with spreadsheets, you’ll get the best results by running them within platforms designed for data tasks, such as Python notebooks, Excel plugins, or Copilot-style interfaces. These tools allow the LLM to interact with your spreadsheet by generating Excel formulas or Python code based on your instructions. For example, you might say: “Write a formula to pull client names from Sheet2 where the VAT IDs match those names.” The tool then generates the appropriate formula, and the spreadsheet executes it just like any standard formula.
When dealing with large spreadsheets, another effective strategy is to break the data into smaller, manageable sections and ask the model to analyze each part separately. This approach helps keep the information within the model’s memory limits. Once you’ve gathered insights from each section, you can combine them manually or with the help of a follow-up AI prompt.
Another powerful method is to ask the LLM to write code to process your spreadsheet. You can then run that code in a separate environment (like a Jupyter notebook), and feed just the summarized results back into the model. This allows the LLM to focus on interpreting the findings, generating explanations, or drafting summaries without being overwhelmed by the raw data.
Spreadsheets Are Here to Stay
Spreadsheets aren’t going anywhere. They are too flexible, too accessible, and too deeply ingrained in tax operations to disappear. AI and LLMs will continue to transform the way we work with them, but they won’t replace them.
Looking ahead, we can expect smarter tools that make spreadsheets more AI-friendly. Innovations like TableLLM and SheetCompressor are paving the way. Though still in the research phase and not yet integrated into mainstream commercial tools, they signal a promising future. TableLLM is a specialized language model trained specifically to understand and reason over tabular data. Unlike general-purpose LLMs that treat tables as plain text, TableLLM recognizes the two-dimensional structure of rows, columns, and cell relationships. SheetCompressor, developed as part of Microsoft’s SpreadsheetLLM project, uses AI-driven summarization techniques to drastically reduce spreadsheet size before passing the data to an LLM. It results in up to 90% fewer tokens, while preserving the original structure and key insights.
Beyond TableLLM and SheetCompressor, the field of spreadsheet-focused AI is expanding rapidly. Experimental tools like SheetMind, SheetAgent, and TableTalk explore everything from conversational spreadsheet editing to autonomous multi-step operations. As these technologies mature, AI-powered tax departments won’t move away from spreadsheets but will use them in smarter, faster, and more efficient ways.
The opinions expressed in this article are those of the author and do not necessarily reflect the views of any organizations with which the author is affiliated.