▓▒░ 101010101 ░▒▓ 010101010 ▓▒░ 101010101 ░▒▓ 010101010 ▓▒░ 101010101 ░▒▓

    ▀ █▄ █ ▀█▀ █▀▀ █▀█ █▀ █▀█ ▄▀█ █▀▀ █▀▀     INITIALIZED...
    █ █ ▀█  █  ██▄ █▀▄ ▄█ █▀▀ █▀█ █▄▄ ██▄     █████████ 100%

░▒▓ 010101010 ▓▒░ 101010101 ░▒▓ 010101010 ▓▒░ 101010101 ░▒▓ 010101010 ▓▒░

-------------------------------------------------------------------------
           Interspace issue: #01 | Release date: 04-12-2025
-------------------------------------------------------------------------

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  I N T R O D U C T I O N  █▓▒░                                  █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

!! PLEASE FORWARD THIS TO WHOEVER YOU THINK MAY BE INTERESTED !!

Interspace is a content newletter written by Sam Irons, founder of
Interspace Studio in Sydney, Australia. The newsletter discusses topics
about content strategy, UI/UX writing, technical writing, and other
content practices.

Interspace is also a community. You've probably received this from a co-
worker (if I didn't send it to you directly). Communities of practice are
essential to keeping disciplines resilient, values-driven, and creative.
If something I've written sparks a discussion, then we're tending to and
growing that community. Welcome.

In this first issue, I go deep on information architectures, looking at
if and how we can use LLMs to simulate card sorting and tree testing.
There's lots of juicy prompts and interesting approaches to steal - from
academics, from researchers, and from me. If you love getting lost in IA,
then this issue is for you.

If you want to subscribe to future Interspace issues or read back issues,
visit this URL:

		<< http://interspacestudio.com.au/ >>

Thank you for reading,
And a huge thanks to Elle Gareghty for inspiring this first issue,
Lots of love,

Sam Irons
irons.sam@interspacestudio.com.au

!! PLEASE FORWARD THIS TO WHOEVER YOU THINK MAY BE INTERESTED !!

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  C O N T E N T S  █▓▒░                                          █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

	1 . AI, IA?!
	2 . Sample data and synthetic users
	3 . Sort my shit out - erm - I mean, cards! Sort my cards out!
	4 . Tree testing
	5 . Discussion
	6 . This month's reading
	7 . Thanks!

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  A I, I A?!  █▓▒░                                         <01>  █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

I moved back to Sydney recently, having spent 3 years in regional New
South Wales. And one of my first points of order was to take a walk in
Sydney Park with my good friend and content strategist Elle Geraghty.
Elle pat 2 puppies ("it was a good day"), and we talked about generative
AI.

If you don't know Elle, she's a bit of a legend in the Australian content
community, having run her Content Strategy consulting firm and her ever
popular Content Strategy Meetup since 2011. As we've both been thinking
deeply about the impacts of generative AI on the content community, the
walk was a perfect excuse to probe eachother.

We talked about feedback and how practitioners working with LLMs as part
of their writing and design practice might be getting incorrect signals
from what is essentially a giant autocomplete. We talked about the
opportunity that LLMs present to content designers who are language
experts that can command language models with ease. We talked about the
quirks of getting AI to check its own work, and how much of a boon AI can
be in consulting workflows. Elle's been using it to analyze her emails
and give her reminders to check in with clients.

We talked about how while the world has been disrupted, design thinking,
the double diamond, jobs to be done, and other established design and
research practices haven't. They still provide the outcomes people are
hiring products and content for. Elle pointed out that we're seeing a
willingness to experiment without a need for return using AI products, so
companies are starting with solutions instead of customer needs. That
will change soon. People will start expecting to see a return on their AI
investments. And, when that happens, those tried and true design thinking
skills will serve content designers and strategists well.

Rather than trying to jam a solution without a problem, I suggest to Elle
that we should approach a familiar problem space that we're comfortable
with and re-evaluate it with AI in mind. Where can it speed us up? Where
should we be wary of it?

Elle is an expert in information architectures - IAs. I know, I know.
AI, IA. This is going to get confusing. Stick with me. Elle's IA
expertise means that she knows the practice back to front, from card
sorting through tree testing. And she's skeptical of AI. And she has
every reason to be.

Interpreting results and building best fits from multiple studies - I
feel like that's usually more art than science. It certainly benefits
from experience and verification. I've stared at enough dendograms and
similarity matrices trying to build variant trees from my observations.
It does my head in. It makes me feel less-than-confident.

The majority of our design tools, practices, and rituals are made to
increase confidence in our decisions. So, I wonder if LLMs can help us
feel more confident about how we group like things to match mental
models, or give us a quick leg up on categorizing and organizing sets of
objects.

Elle issued the challenge: Show me what AI can do for IA. So, here's some
experimentation and thoughts about just that.

You can check out Elle's work on her website:

		<< https://www.ellegeraghty.com.au/ >>

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  S A M P L E   D A T A              █▓▒░                  <02>  █
█  ░▒▓█  &   S Y N T H E T I C   U S E R S  █▓▒░                        █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

My partner is getting a Geo-spatial Sciences degree (to add to her other
engineering degrees), and she's been building SQL databases in Python.
Turns out sample data generation is a great use case for LLMs, and it may
reduce bias introduced by researchers. Of course, it may introduce its
own  biases due to how LLMs learn and are reinforced.

Before experimenting with card sort simulations, I wanted a good set of
content to test against. So, I prompted Claude to generate sample data:

\\ You are an expert technical writer who specializes in product
\\ documentation. Generate a sample data set for the sitemap of a complex
\\ software product. This should follow DITA best practices and topic-
\\ based authoring approaches.

Claude generated an XLM sitemap of topic references. I want titles for
testing card sort approaches:

\\ Using this sitemap, generate a list of unique topic titles for each
\\ topicref.

Claude generated 156 unique topic titles from its original sitemap. How
exciting! It would've taken me a day to think up a dataset of 150+
realistic entries for testing.

Claude took the liberty of grouping these topics under sections. That's
one step too far for what I've asked it to do, but it provides a sort of
baseline for how LLMs categorize without the context we'll give it later.

The categories it produced were:
- Product overview (5 topics)
- Getting started (11 topics)
- User guide (collection)
	- Dashboard management (8 topics)
	- Data sources (14 topics)
	- Data preparation (7 topics)
	- Visualizations (8 topics)
	- Analytics and reporting (7 topics)
	- Collaboration (6 topics)
- Administration guide (collection)
	- User management (8 topics)
	- System configuration (7 topics)
	- Security (7 topics)
	- Monitoring and maintenance (7 topics)
- Developer guide (collection)
	- API documentation (14 topics)
	- SDK documentation (7 topics)
	- Extensions and plugins (6 topics)
	- Integration guide (6 topics)
- Troubleshooting (9 topics)
- Reference (10 topics)
- Appendices (5 topics)
- Resources (4 topics)

Without diving into each topic title, you can see that the data set is
familiar and feels real enough to a documentation set for a software
product. And, its assumption on how to organize the content also feels
familiar, organizing into guides based on user type (end user, admin,
developer). It feels eerily similar to how Atlassian organizes their
documentation content.

However, we don't want these categorized and organized yet (that's what
card sorting is for!). Easy to fix:

\\ Remove the groupings and randomize the order of the titles.

Done and dusted. Now we have a sample data set for card sorting and a
starting point for iterating on the organization of our data set.

LLMs excel at sample data. Content professionals can use this type of
approach for generating all kinds of dummy content content. Its a layer
of  fidelity just above Lorem Ipsum, but its is useful at scale and can
save you a heap of time.

Examples:
- Generate entries and metadata for lists of objects in a table view.
Think tables, feeds, cards and dashboards.
- Generate mock conversations or transcripts for chat or video designs.
- Generate translations to test localization layouts.
- Generate mock customer reviews or support queries.

This type of content - the type that is usually user-generated - doesn't
need to be high-quality. Its not critical to the user experience like
buttons, dialogs, warnings, and other system-supplied microcopy. In fact,
inconsistent, slightly off content can give designs a more authentic feel
when designing and prototyping. They may feel more real, as users don't
always follow best practices themselves.

Most content designers will have a real content set to leverage for AI-
augmented IA investigations. Even if this is the case, there's a
important part of testing that needs to be generated to get a
representative result. Personas! The characteristics and demographics of
card sort participants have an influence on the results and categories of
grouping (maybe less than it should, but still).

Fortunately, LLMs are getting better and better at generating synthetic
users to use in testing simulations. Here's a little prompt I've written
to generate synthetic user personas to test against:

\\ Generate 3 distinct sample user personas for users of {product name},
\\ {product description}. Include personal background information such as
\\ age, gender, ethnicity, education, and family status. Include
\\ professional information such as job title or occupation, income
\\ level, and work experience. Include user environment information such
\\ as the physical, social, and technological context of the user.
\\ Include psychographic information such as attitudes, interests,
\\ motivations, and pain points related to their context and use of
\\ {product name}.

You may be surprised at how real these personas feel. But, its always
worth sharing them with colleagues (like your research buddies!) and
tweaking them where needed to reflect your real userbase.

In my experiment, Claude generated the following high level personas:
1. Sarah Chen - The power user who lives in the tool daily, needs
advanced features and efficiency
2. Marcus Johnson - The IT gatekeeper focused on security, compliance,
and system reliability
3. Priya Desai - The business decision-maker who needs insights quickly
without technical complexity

Each description feels real to me. For example, Priya - a completely
synthetic persona remember? - is a 38 year-old marketing director in
Austin, TX. According to the generated profile, she makes $145K/year at
a SaaS company and works hybrid (2 days per week in the office). She uses
a single laptop screen at home. She's comfortable with tech - "uses
software daily but does not code", primarily working in Salesforce,
HubSpot, Google Analytics, and Slack. She values storytelling through
data visualization, is motivated to advance her career to VP or CMO, and
has trouble attributing which campaigns drive actual revenue.

The level of detail is pretty incredible, and this type of context is a
great starting point for designing. If you've ever asked, "Who am I
designing for?" and been met with crickets from your product leadership,
well, this can help you start that critical conversation. :)

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  S O R T   M Y   S H I T   O U T -    █▓▒░                <03>  █
█  ░▒▓█  E R M -   I   M E A N,  C A R D S !  █▓▒░                      █
█  ░▒▓█  S O R T   M Y   C A R D S   O U T !  █▓▒░                      █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

Thankfully, I'm not the first person to think about using LLMs for card
sorting. Researchers at the Slovak University of Technology published a
paper earlier in the year titled Card Sort Simulator: Augmenting Design
of Logical Information Architectures with Large Language Models. You can
read it in full here:

		<< https://arxiv.org/pdf/2505.09478 >>

At a broad level, these researchers took a sample set of 28 open card sort
activities provided by UXtweak, including data from some 1399
participants. They then designed a synthetic LLM system for recreating
the tests and assessing generated outputs against these real studies.
They sought to answer a few questions:

1. Do LLM-generated card sorting results align with those produced by
real participants?
2. How does the selection of the LLM model affect the similarity of LLM-
generated card sorting results to data from real participants?
3. How does inclusion of context in the prompt affect the similarity of
LLM-generated card sorting results to data from real participants?
4. How does the complexity of a card sorting study (number of cards,
complexity of card labels) affect its LLM-generated results?
5. What are the common issues that appear when LLMs are used to simulate
card sorting?

Prompt design, of course, was critical. As LLMs are want to do, if you
give them too much ambiguity, they hallucinate, creating cards that
didn't exist in the data set, or duplicating cards showing stronger
relationships than exist in the human test results.

This is one prompt they used to simulate a card sort activity, that had
the "most accurate reflections of card sorting by humans". Go ahead and
steal it/tweak it for your next experiment (just remember to replace the
{variables} with your context:

\\ Instructions:
\\ - Imagine you represent {numberOfParticipants} participants in an open
\\ card sorting study.
\\ - Group the cards provided to you into meaningful categories based on
\\ their similarities.
\\ - Assign descriptive category names that reflect the shared themes or
\\ characteristics of each category’s cards, while incorporating the
\\ perspectives and logic of the participants.
\\ - Output a list of all categories with their corresponding cards. Your
\\ output is an aggregation, obtained after the individual results from
\\ all participants were analyzed and a single “best” card sorting result
\\ was created that aligns with collective mental models of users as best
\\ as possible.
\\ - Use a reasonable number of categories (more than one).
\\ - Use all of the provided cards exactly once, each card has to belong
\\ to one category only.
\\ - Keep the cards as they are, including any grammatical mistakes or
\\ unnecessary characters.
\\ - If text in brackets appears in the card list, it presents a
\\ description of the card that is provided for context only. This
\\ description should not appear in the results.
\\
\\ Group the cards from the perspective of participants with the
\\ following context:
\\ - Demographic attributes for respondents are: {demographicAttributes}
\\ - Welcome message received by participants is: "{welcomeMessage}"
\\ - Instructions before the task received by participants are:
\\ "{instructions}"
\\ - Additional considerations: all categories have to be named (no
\\ ’Unnamed category’ entries)
\\
\\ Output rules:
\\ - Present your results in CSV format with two columns: "categoryName"
\\ and "cardName".
\\ - Enclose card and category names in quotation marks. If a quotation
\\ mark appears within the text, escape it by doubling it (""), as
\\ standardly done in CSV.
\\ - Do not use unnecessary blank lines to divide categories.
\\
\\ Example:
\\ ```
\\ In the example below, Cat and Dog were placed in the same category
\\ (Animals):
\\ categoryName,cardName
\\ Animals,Cat
\\ Animals,Dog
\\ Means of transport,Plane
\\ ```
\\
\\ Cards to sort:
\\ ```
\\ {cards}
\\ ```
\\
\\ Before displaying the CSV result, check if all cards are included
\\ exactly once - {numberOfCards} cards should be present. No duplicate
\\ are allowed.

OK, but what about results? Does this approach actually work?

Yes! And also no. But, there's promise here. Here's some of the
highlights from the researcher's findings:

* Generally, the researchers found that synthetic card sorting results
align with those produced by humans.
* LLMs may generate more clusters than humans. This can be mitigated with
prompt engineering.
* LLMs tend to group cards "into categories at conceptual levels either
too generic or too specific in comparison to real participants." Humans
created a diversity of category titles that LLMs did not. And high-level
titles generated by LLMs were not commonly found among human
participants. In other words, the groupings may be representative, but
the labels aren't so much.
* Claude performed better, but only nominally. ChatGPT may be better for
generating similarity matrices and can handle greater CSV output volume.
* Context (demographic attributes, welcome messages, etc.) made minimal
improvements to similarity with human responses - for Claude,
specifically. I still believe providing personas gives valuable context
in niche cases.

Perhaps a little counterintuitively, the more cards sorted and the
higher the complexity of card labels, the less agreement LLMs had with
human participants. This may have something to do with how humans fatigue
when conducting large or complex sorting activities, I suspect. But, it
also has something to do with the opportunities for introducing errors.

High card counts introduce more errors, such as an LLM omiting cards in
the data set or duplicating cards during its sort. LLMs may modify card
labels, or introduce extra rows or unintended spaces. "Notably, no models
hallucinated new cards into existence", which is good. It is important to
validate and verify the generated output for quality before accepting its
results.

Reinforcement learning should improve results. Run card sort prompts
as part of your regular practice, to get a small signal and/or generate
tree variants before testing with real users. The more you do, with your
LLMs' memory intact, the more you'll find similarities with real user
participants.

I plugged in my sample data set, and pointed Claude to my generated user
personas for running a synthetic card sort activity. I asked Claude to
simulate 20 card sorts, and then generate a single "best fit" result, in
accordance with the prompts presented by our Slovakian researchers.

Here's the resulting categories (to compare to our original synthetic
generated sample set):

1. Getting started & installation (11 cards)
2. Connecting to data (14 cards)
3. Preparing & transforming data (8 cards)
4. Creating dashboards & visualizations (14 cards)
3. Reports & analytics (7 cards)
4. Sharing & collaboration (9 cards)
5. User & access management (6 cards)
6. Authentication & security (13 cards)
7. System configuration & administration (10 cards)
8. Monitoring & performance (7 cards)
9. Developer tools & APIs (17 cards)
10. Customization & extensions (6 cards)
11. Integrations & webhooks (3 cards)
12. Troubleshooting & support (9 cards)
13. Product information & overview (6 cards)
14. Learning resources & best practices (4 cards)
15. Reference & documentation (10 cards)

Now, we've got 2 groupings, perfecly ready to pit against eachother. You
can probably predict what I'm thinking next – tree testing! If synthetic
card sorting is feasible (and representative) with LLMs, I wonder how
they'll fair with tree testing.

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  T R E E   T E S T I N G  █▓▒░                            <04>  █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

Back to scouring the internet, I found a paper from MeasuringU, a site
that always feels a bit like a warm blanket for me. Jeff Sauro, Will
Schiavone, and Jim Lewis used the site tree from the US Internal Revenue
Service to simulate task-based location tests using ChatGPT. The paper is
a bit old right now, but you can view it in full here:

		<< https://measuringu.com/chatgpt4-tree-test/ >>

The team's approach was to give the LLM a tree and key user tasks, then
ask it to locate where it thought the task could be completed. This is
standard for tree testing. They also asked the LLM to give a single-ease
rating (SEQ) to each task as it performed it. And, they ran this 5 times
to get results. I think our Slovakian friends have a better approach in
asking the LLM to run multiple scenarios in the background and present
aggregated results. Sauro, Schiavone, and Lewis also ran a concurrent
human study with 33 participants to compare results.

At a high level, they found:
* LLMs are "not suitable for estimating how well humans will find items
in a tree test."
* ChatGPT outperformed humans consistently on finding the correct path
for all tasks given. Significantly. Human performance was around 50%
average in the tasks given. The AI was successful in all tasks, except
one.
* ChatGPT's responses to SEQ questions strongly correlated with human
participant responses.
* ChatGPT also struggled on the task that humans identified as the most
difficult.

This paper came out in mid 2024. I wonder how much LLMs have improved
since. At any rate, here's their prompts, if you want to try something
similar yourself.

\\ Imagine you’re on the IRS website, navigating the menu while looking
\\ for some information. Learn this website structure, as I will ask you
\\ to complete navigation tasks. Use this structure as if you were a real
\\ person navigating a real menu and provide the path that you would
\\ click through to find information on each task.
\\
\\ MENU STRUCTURE
\\
\\ {Tree}

\\ Take your best guess at where each piece of information is located,
\\ even for paths that you are not certain about. You should have a path
\\ for every task. Represent the path by including a > in between each
\\ step.
\\
\\ Your tasks are as follows:
\\
\\ {Tasks}

\\ Great, now for each of the tasks you completed, indicate how difficult
\\ an average person would think it was to find each topic from
\\ 1-Very difficult to 7-Very easy.

You can see a wide difference between the prompt design of the Slovakian
researchers against the early experiments of MeasuringU writers. While
these instructions definitely represent how you might conduct a test with
humans, I feel it leaves a lot to interpretation.

The authors admit that LLMs have one great advantage - they get to see
the full tree before making guesses on where certain information lives.
Tools like Treejack progressively disclose the branches of trees.
Personally, I pay close attention to LOST-ness and other observations of
how humans click into, then back out of, certain parts of trees when
looking for information. These points of confusion can mean more than
simple task success.

Can we make a better prompt and get closer to representative results?
I'm willing to try.

Here's the prompt I've devised (stealing ideas from the card sorting
simulation above). Look out for any {variables} when recreating. You
might pick out that I've used best practice wording from Optimal
Workshop, and framed results interpretations based on definitions from
Nielsen Norman Group.

\\ Simulate a task-based location test, commonly called a tree test (or
\\ Treejack test).
\\
\\ Instructions:
\\ - Imagine you represent 6 participants in a tree test study.
\\ - Each participant is tasked to find specific information and
\\ presented with a list of topics in a heirarchy.
\\ - Navigate through the list until you arrive at one you think will
\\ help you complete the task. You should consider only the current level
\\ of heirarchy when making navigation decisions. Navigating implies that
\\ lower level list items are progressively disclosed as you choose to
\\ look further down paths in the tree.
\\ - If you take a wrong turn, go back to a higher level in the heirarchy
\\ and try again.
\\ - Identify only 1 correct place (the most appropriate list item) that
\\ you believe satisfies each task.
\\ - Undergo the exercises from the perspectives of participants that
\\ closely resemble the {generated user personas or demographic context}.
\\ - Record and count the number of steps it takes you to find that
\\ appropriate location. For example, Home > Creating Dashboards &
\\ Visualizations > Home > Reports & Analytics > Scheduling Automated
\\ Reports implies 4 steps to find the location.
\\ - Do not limit how many steps participants may take in finding their
\\ answer. Instead, simulate how a real person navigates through the tree
\\ to locate information.
\\
\\ Present the following results:
\\ - Success rate: The number and percentage of users who found the
\\ right category (or categories) for each task
\\ - Directness: The percentage of users who went to the right category
\\ immediately, without backtracking or trying any other categories
\\ - Time spent: The average amount of time elapsed from the beginning
\\ to the end of the task
\\ - Selection frequencies for each category in the tree: How many people
\\ tried each category
\\ - First click: The category most people selected first
\\ - Destination: The category most people designated as their final
\\ answer
\\
\\ Tasks for tree test study:
\\ {list of tasks}
\\
\\ List of topics for tree test study:
\\ {tree}
\\
\\ Correct mapping for tasks to locations for evaluating success in tree
\\ test study:
\\ {answer guide. Example: TASK 1: Prove marketing campaign ROI to
\\ executives → Reports & Analytics > Creating Reports}

The results from my synthetic tests look similar to the MeasuringU paper.
The success rates from synthetic users are still very high. In my case,
sythetic users saw 100% task success on 5 of 10 tasks, and 5/6 synthetic
users were successful in the remaining 5 tests. That 91.7% success rate
is something I've never seen personally as a content expert running tree
studies.

Here's a snippet of the output I received:

Participant Paths and Navigation
TASK 1: Where would you go to learn how to prove marketing campaign ROI to
executives?
Participant 1 (Priya): Home > Reports & Analytics > Creating Reports ✓
(2 steps - Direct)
Participant 2 (Sarah): Home > Creating Dashboards & Visualizations >
Home > Reports & Analytics > Creating Reports ✓ (4 steps)
Participant 3 (Priya): Home > Reports & Analytics > Creating Reports ✓
(2 steps - Direct)
Participant 4 (Sarah): Home > Reports & Analytics > Creating Reports ✓
(2 steps - Direct)
Participant 5 (Marcus): Home > Reports & Analytics > Analytics Overview >
Home > Reports & Analytics > Creating Reports ✓ (5 steps)
Participant 6 (Priya): Home > Reports & Analytics > Creating Reports ✓
(2 steps - Direct)
Results:
Success Rate: 6/6 (100%)
Directness: 4/6 (67%)
Average Steps: 2.8
Time Spent: 42 seconds (average)
First Click: Reports & Analytics (5), Creating Dashboards &
Visualizations (1)
Destination: Reports & Analytics > Creating Reports

There are obvious differences between the results and what you can expect
testing with human participants. For example, humans will usually perform
exploratory clicks when they first encounter the tree. This is a common
problem when interpreting results from human participants. You can easily
detect which task was presented first to the participant by these click
paths, where people investigate multiple sections of the tree to get a
sense of what each path contains.

I ran this same prompt against the first heirarchy that Claude generated
for our sample data before I stripped it and ran it through a card sort.
That tree was role-based and had 3 levels of heirarchy.

Interestingly, Claude automatically gave some comparison output of the
2 simulated tests, along with my simulated test results. It also
generated the following recommendations:

1. Consider a hybrid approach: Keep role-based top level
(User/Admin/Developer) but reduce sub-category depth.
2. Improve category labels for clarity:
	- Consider "Dashboard Management & Sharing" to better signal sharing
	functionality
	- Distinguish "Visualizations & Analysis" from "Reports & Scheduled
	Analytics"
3. Add wayfinding aids:
	- Breadcrumbs would help users understand where they are in the deeper
	hierarchy
	- "Popular tasks" or "Quick links" could bypass navigation for common
	tasks
4. Cross-reference related content:
	- Link Security (Administrator) with Security Settings (Getting
	Started)
	- Connect Dashboard sharing with Integration/Embedding
5. Consider task-based entry points: Create alternate navigation paths
like "I want to..." that bypass role-based structure for common workflows
6. Test middle ground: A 2-level structure (User Guide > Topics) might
balance findability with scannability better than either 1-level or
3-level hierarchies

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  D I S C U S S I O N  █▓▒░                                <05>  █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

What do I make of all of this AI IA stuff? What am I going to say to Elle
on our next park walk?

First, LLMs cannot fully replace human insights in IA investigations. But
they can supplement in a pinch and they can provide direction for content
practictioners who may not have access or resources to run in-depth
studies on their own. Optimal Workshop is something like $2000/year,
which is out of reach for freelancers or organizations that don't invest
in content tooling.

Synthetic card sorting can closely simulate real world usage. I have much
higher confidence there after diving deep into comparative studies from
the Slovak University of Technology.

Synthetic tree testing, however, should be taken with a grain of salt. I
do find there is value in the comparisons between synthetic tests. But,
LLMs show performance on synthetic tests that are just too good in terms
of participant success. They don't match my lived experience at all. More
work is needed in prompt engineering, I think, to get representative
results.

I was impressed with the LLM's ability to generate sample data to use in
these tests, including user personas. These should be validated against
your actual user base (in partnership with a research professional).

Overall, I give the approach a 4/6. I learned a lot. Mainly:

- SAMPLE DATA ✓ Yes. LLMs handle this with ease.
- CARD SORT SIMULATION. ✓ Yes. Results are close enough to be useful.
- TREE TESTING. ✗ In a pinch. Use for comparison and suggestions, not for
quantitative data.

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  T H I S   M O N T H ' S   R E A D I N G  █▓▒░            <06>  █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

Here's your monthly oracle reading from the Design Oracle!

YOUR DECEMBER PRACTICE

This month calls for empirical curiosity balanced with synthetic wisdom.
You're not abandoning traditional methods—you're expanding your toolkit.
Adapt your practices and test rigorously. Remain open to surprising
insights from both human and artificial intelligence. Strong opinions will
emerge from your ability to discern which insights each method uniquely
provides.

The cards suggest December is a month of discovery rather than
conclusions. Embrace the experimental nature of this work, document what
you learn, and trust that by year's end, you'll have developed your own
methodology for when and how to leverage LLMs in your practice.

Get your own oracle deck, drive personal insights, and get motivated with
design rituals with the Design Oracle. The Design Oracle is available for
free in the public domain at:

		<< http://design-oracle.github.com/ >>

█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█
█  ░▒▓█  T H A N K S !  █▓▒░                                            █
█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█

That's a wrap for our first issue. If you want to subscribe to new issues
or read back issues, you can find them here:

		<< http://interspacestudio.com.au/ >>

You can also check out my services and rates there. I'm available to help
your business transact on their content. I consult, I contract, I coach,
and I speak. Don't hesitate to reach out to see what I can do for your
business.

		<< irons.sam@interspacestudio.com.au >>

A huge thanks to Elle Gareghty for inspiring this first issue. Please
check out her work. She's a rock star content strategist available to help
your team get the most from their content:

		<< https://www.ellegeraghty.com.au/ >>

Until next time!

!! PLEASE FORWARD THIS TO WHOEVER YOU THINK MAY BE INTERESTED !!