AIs vs the SQL Server Fill Factor - It is not easy to be a AI DBA

Do you also have the feeling AI advice often tricks you as much as it helps? Today I challenged a colleague who specified a fill factor of 80 of when creating a new table with a non-clustered index (in Azure SQL/SQL Server) and we asked an AI for help.

I argued that specifying a fill factor other than the default 0/100 when creating a new empty table is useless. Why? Because we do not normally do any index maintenance like index rebuilds, and hence Azure SQL will try to fill the pages 100% as best it can when inserting/updating data regardless of fill factor.

On top of that, even if you do index maintenance (index rebuild), unless you know exactly what the usage of the table will be you will want to tune what fill factor to use to reach a good compromise between wasted space / frequency of rebuilds / fragmentation. Meaning at time of index rebuild you will set it anyway overwriting the original value. More details in this SO answer.

The AI

Instead of looking at Stackoverflow, it's nowadays also possible ask our new AI masterminds. No more wading through sketchy google results, and a comprehensive easy to understand answer provided in a chat interface. What more could you ask for?

Of course, we went to the AI to resolve our questions first. But it turned out it was not so easy... We used GPTo, but it didnt agree and didnt have a clue about how the fill factor is only relevant if you rebuild the index (or table already have a lot of data). Given its not-great answers I got inspired for this blog post and did some further experiments later in the evening. Let's see what a bunch of different AIs can do to this topic. My hopefully clear and well-thought-out question for them:

What fill factor should I use when I create an empty table with some indexes in Azure SQL and why? The table is a regular table with a GUID primary key, with more selects than inserts, but still growing in size. Please summarize your answer!

Phind instant & Phind-70B

I sometimes use Phind as a help on various topics, it's supposedly optimized for the programmer type questions I often have. It's also easy to remember its URL and the base model is free. Let's see what its instant model can do:

[Two paragraphs and 3 bullet points cut out. No mention anywhere that this is only relevant when rebuilding...]

In summary, starting with a fill factor of 80% to 90% for indexes on GUID columns is a reasonable approach for a table expected to grow to a million rows. Regular monitoring and adjustments based on actual performance and fragmentation levels are essential to ensure optimal performance and efficient storage utilization.

Not a great answer. It does have an ending remark about monitoring and adjustments, but nowhere did it mention I must do rebuild for this setting to even have an effect. Let's try the more powerful Phind-70B model instead:

For a table in Azure SQL with a GUID primary key that expects constant new inserts and growth to a million rows, setting a fill factor between 70-90% is recommended. This recommendation is based on the nature of GUIDs causing seemingly random insertions in the clustering index, leading to frequent page splits. A fill factor within this range helps prevent these "bad" page splits by leaving enough free space on each leaf-level page for future growth, thereby improving overall performance and reducing the need for frequent index maintenance operations like rebuilds. Additionally, considering the table's expected growth and the constant insertion of new records, a higher fill factor ensures that the index remains efficient without needing to allocate excessive space upfront, balancing the need for space for new entries against the desire to minimize disk usage and maintain performance

Same answer, just less text.

Gemma-1.1-7b-it

To spice it up a bit I got this AI from the LMSYS Chatbot Arena. Apparently from Google with the same tech as behind Gemini.

Selective Insert/Select Ratio: Since you have more selects than inserts, a small fill factor minimizes the impact of inserts on index size and fragmentation.

[More useless or wrong advice removed for our sanity...]

This answer is a catastrophe. Moving on.

Claude 3 Opus

Flagship model from Anthropic, this one should surely have a well researched and thought out answer?

When creating an empty table with indexes in Azure SQL, and the table has a GUID primary key, more selects than inserts, and is still growing in size, a good starting point for the fill factor is 90%. Here's a summary of the reasoning:

[Some reasoning why 90 is good, and a boilerplate about monitor the table and adjust in the end skipped...]

Some ok reasoning behind the 90% value, but forgot to mention the key part that I have to do index rebuild for it to matter.

Gemini 1.5 Pro API

Maybe Google had bad luck with the Gemma model. Can Gemini beat their own Google Search?

For your scenario (GUID PK, more reads than writes, growing table):

Recommendation: Start with the default fill factor (0) or a value around 90%.

Default (0): Lets Azure SQL Server dynamically manage fill factor based on index characteristics and workload. This is generally a good starting point, especially for tables with varying row sizes (like those with GUIDs).

GUIDs: Can lead to page fragmentation due to their random nature. Dynamic fill factor or a high value helps mitigate this.

[Some more bullets with reasonable info skipped...]

No. Same problem as with Claude above, it never mentioned I have to actually rebuild the index to have some effect. In addition, it also gives false info(!). SQL does not dynamically manage fill factor when you set it to 0, it will treat 0 as 100%. And I don't understand how a higher fill factor might mitigate the increase in fragmentation caused by guids.

GPT 4 Turbo

Finally we come to the king(?) of AIs, GPT 4 turbo. This is I think the default go-to model many uses in their daily work. Maybe this one can answer well?

In summary, start with a fill factor between 70% and 90% for your GUID-indexed Azure SQL table, leaning towards the higher end if selects vastly outnumber inserts, and adjust based on performance metrics and growth patterns. Regular monitoring and maintenance are key to optimizing and maintaining performance.

[4 bullet points and summary skipped...]

A good answer? No, I do not believe so. It doesn't mention that the setting is useless unless I do a rebuild. Reading this answer I would easily believe I should put a fill factor of 80%. Only in the very end does it mention a boilerplate about monitoring and maintenance, but this is actually the most critical.

Summary

I'm surprised (or actually not, it mostly confirms my preconceptions). Seems like all the AI's could use more training on DBA stuff!

I also need to point to Gemini's (Google) answer that not only "forgot" to mention that you need to do an index rebuild, but also had hidden some misunderstandings around its made up dynamic fill factor.

I think this highlights that while AI for sure can be a great discusison partner, its also very important to remember that its not an all-knowing oracle. It might omit the key information, or make up some convincing but misleading/false imformation like Gemeni above. Its more like your skilled but overconfident co-worker that are always 100% sure what (s)he says is true even when its not.

Victor 2024-05-30