CS303 Lab Notebook: 2011

Thursday, May 19, 2011

5/20 Presentation

Ahead last week, behind this week.

Preliminary Results & Graphs:

(Most of my charts/tables will not upload on blogger--see presentation in class!)

[Hypotheses]

1. Mobile device users will use more category heuristics than large-screen users.

2. Mobile device users will be more satisfied with their selection because they had fewer choices.

3. Large-screen users will perform better, as rated by independent raters.

{still to be determined}

4. Mobile device users will feel like they need more time more than large-screen users.

What I need to do:

-Run significance tests

-Finish designing format for expert reviewers

-Have my experts rate the selections

-Analyze results from experts & other data in this new context.

Thursday, May 12, 2011

This week I ran 30 participants. I've done some preliminary analysis of the data and I am finding almost no difference between the two conditions in confidence or time. I won't have my 'performance' measure until I put the selections on Mechanical Turk, and I plan to run a handful more participants first.

This is exciting and interesting because despite my hypotheses being wrong, there are two outcomes that are most probable at this point.

1. The large screen outperforms the mobile device users and the story is: "Mobile device users are just as confident as large screen users but they perform worse.."

2. There is no difference between the two and users can perform equally well on this type of task on a mobile device.

Checklist of 'What Needs to Be done'

1. I'd like to run about 10 more participants and I will do this in the next handful of days

2. Upload selections on mechanical turk

3. Run data analysis

Thursday, May 5, 2011

Second Draft

[Abstract]

In a world where technology is readily available to us in all shapes and sizes, how does device size affect our ability to accomplish certain tasks? Processing speed, input method and screen size have all been studied as factors that affect the user. However, taken altogether, how will these factors affect a search and selection task, and why? This paper will specifically examine the differences between a mobile device and a computer with a large monitor. The differences in task performance on the different devices could have important implications for which tasks we choose to perform on which devices.

[Hypotheses & How they will be measured]

1. Mobile device users will use more category heuristics than large-screen users.

This is to be measured based on how many times the user refines his search on a given task. For example, if a user searches for "Summer Dresses > Floral" under Dresses, that is a value of '2' for the number of heuristics used to refine the basic search.

2. Mobile device users will be more satisfied with their selection because they had fewer choices.

The reasoning behind this hypothesis is that the mobile device users will be able to view fewer dresses, both because the processing/network speed on the mobile device is slower and because the screen is smaller. The large screen user will be able to view many more options. Thus, based on the paradox of choice, the mobile device user will be more satisfied choosing from X options whereas the large screen user will experience decision paralysis choosing from Y options (where Y >> X). There is one confounding variable in examining the paradox of choice in practice with this particular experimental set up: time. Users on the mobile device may be frustrated by the fact that though there are Y potential choices, he can only access X of them. This is to be measured based on pre and post survey questions.

3. Large-screen users will perform better, as rated by independent raters.

This is to be measured by having mechanical turkers rate the subjects' selections based on how well they think the selection fits the criteria presented in the prompt. I am also considering having 'fashion experts' rate all of the selections. This would give me multiple modes of measurement with which to confirm this hypothesis.

4. Mobile device users will feel like they need more time more than large-screen users.

I have selected 5 minutes per task, for a total of 10 minutes, as the allotted time for this experiment. This seems to be appropriate based on my pilot experiments in which some participants ran out of time and some did not. I am keeping track of how how long it takes each participant in each task to complete. I plan on comparing on a binary scale (needed more time---did not need more time) but I will have the data to do more involved time analysis if this becomes desirable.

[Methods]

See Google doc for pre & post task survey as well as task instructions.

1. I have altered the participant instructions to just leave the tab open to his/her selection when done and I will record this later. This allows me to have the participant complete the survey on the computer but just perform the task on a mobile/large device.

2. I also updated the instructions to include a more specific description of Jamie and Matt and what they are looking for:

Excerpt

[Task 1]

Read below for what kind of dress Jamie would like:

"I'm going to my friend's wedding in July and I want to find a fun, flirty dress. I am 5'8" with a pear-shaped build. I have brown hair and olive complexion. I want something that will look hot, but not distract from my friend's big day, nothing too flashy! Help me!"

[Task 2]

Read below for what kind of shirt Matt would like:

"I'm a little bit anxious about this pool party. I want to wear a shirt that will look casual, but make me seem in style. I don't usually get too hot in the sun, so I think I'd rather cover up than bare my arms. I'm 5'10" with dirty blond hair and freckles. I have the lean build of a distance runner. Make me look good!"

3. I am offering prizes as follows: "There will be a prize for the best, most appropriate dress selection and shirt selection. The winners will get their choice of a $5 Starbucks, Jamba Juice, itunes or Philz gift card. If the same person wins both dress and shirt, he/she will get a $15 card. "

My reasoning for giving more if someone wins both shirt and dress is to incentivize someone to try hard on both tasks and overcome task fatigue.

[Materials]

I am using a iphone 4G as my mobile device and 27" monitor as my large-screen.

I am using safari web browser for both conditions.

[Further Questions] **If anyone is commenting on my blog, advice on this area would be particularly desired!

1) I'm still not sure how I can watch/observe my participants on their mobile devices. I would have to be a huge creeper!

2) I am not sure exactly how I want mechanical turkers to rate the selections. Rank order? Rate each individual on a scale or multiple scales? A 'which is better' A/B comparison?

Thursday, April 28, 2011

Updated Experiment Info--Pilot Testing

I'm interested in testing how device size, specifically screen & input size affects a search & selection task. This could have important implications for which tasks we choose to perform on which devices.

[Hypotheses]

1. Small-screen users will use more category heuristics than large-screen users.

2. Small-screen users will be more satisfied with their selection because they had fewer choices.

3. Large-screen users will perform better, as rated by independent raters.

4. Small-screen users will feel like they need more time more than large-screen users

[Methods]

See Google doc for pre & post task survey as well as task instructions.

[Materials]

I am using a iphone 4G as my 'small screen' device and 27" monitor as my large-screen.

I am using safari web browser for both conditions.

[Measurements]

1. The number of heuristics that the subject used to refine the search. (HOW besides watching?)
2. How confident the user is about his/her selections
3. How well the selections perform, as rated by independent raters*
4. Whether the user feels like he needed more time

*I intend on asking mechanical turkers to rate the dress and shirt selections. I will give them the same prompts that the subjects received. I will show them two at a time and ask which one seems more appropriate? (or a better choice?)

[Calculations & Results]

I don't have meaningful results because I haven't run anything on mechanical turk yet. But I intend on using chi square tests once I have my independent raters.

I will also use chi square to calculate confidence, category heuristics and time.

My pilot results:
Average small-screen confidence: 2.5
Average large-screen confidence: 3.6

[Further Questions]
How can I track the ways in which the subject refined his/her search besides watching?
How should I have mechanical turkers rate the selections? (ask ‘more appropriate’, ‘better’, ‘Jamie/Matt will like more’?)
Do I have them rate between two? (Strict ordering does not give me as much information)
Should I have both conditions fill out the pre & post task on the big monitor? Or do all of it on the small device?
Should I use a laptop/monitor instead of phone/monitor to help control for processing & network speed?

Thursday, April 21, 2011

Experimental Methodology

Hypothesis 1: User's will persist longer (perform more clicks) in a search if they feel as though they are on the right track.

Hypothesis 2: Even if users believe that they are on the right track, there is a somewhat universal limit to the number of clicks a user will perform before aborting search.

Hypothesis 3: User's will be more frustrated in non-scented searches than searches requiring more clicks.

Control Search: ??

Scented Search: Require the user to perform X number of clicks, where the search path is very clear and the user is confident that he is on the 'right track.'

Non-scented Search: Require the user to perform 2-3 clicks, but the correct search path is very unclear and the user is most likely not confident.

PROCEDURE:

1) Pre-task survey: gather demographic information, gather emotional/mood/task confidence information

2) Place subject in one of 3? experimental groups

3) Experimental Task

4) Post-task survey: gather emotional/mood/task confidence information

Experimental Task: This really still needs fleshing out but I think I want to ask user's to look up information about jobs posted on the CDC---I will provide them the date that the job was posted and tell them that they cannot 'search' for the job.

I think the non-scented task will be a search on imdb.

Since this is within-subjects I don't have as many variables to worry about. I will need to ask user's familiarity with both interfaces and also how attractive they perceive each interface to be (since perceived attractiveness of the interface affects search perseverance).

Background Information

In Peter Pirolli's book Exploring and Finding Information, he defines 'information scent' as "The user's use of environmental cues in judging information courses and navigating through information spaces."

In my experiment, for my 'scented search' condition I want to use what I will call "strong scent." I am defining this to include searching through lists organized by one of the five ways to organize information: category, time, location, alphabet and hierarchy (Universal Principles of Design or Information Anxiety). Of course, some of these are more strongly scented (alphabet or numeric) than others, such as category.

Additionally, I want my scented search interface to be what Jef Raskin defines as a "zooming interface paradigm (ZIP)" in The Humane Interface (pg 153). In fact, in chapter 6-2, he neatly describes the two types of interface I would like to test (I think).

The non-scented: He describes navigating through the interface like a maze--"I often find, deep in a submenu, a command or a check-box that solves a problem I am having...We are not good at remembering long sequences of turnings, which is why mazes make good puzzles and why present navigational schemes, used both within computers and on the web, often flummox the user." (152)

The scented: "The antithesis of a maze is a situation in which you can see your goal and the path to get there, one that preserves your sense of location while under way, making it equally easy to get back."

Thursday, April 14, 2011

Search Perseverance & Design Attractiveness

I found an interesting article that claims that "perceived attractiveness of web interface design" has a positive significant effect on search perseverance. See paper here

This is something that, even if I do not directly investigate, I need to keep in mind as the design attractiveness could therefore be a mediating factor in search perseverance.

Ground Zero

I'm still trying to hone in on exactly what I want to research for the next 7+ weeks. I am interested in the claims that Jared Spool makes in The Scent of Information. He states, "Users expect each click to lead to information that is more specific. They do not mind clicking through large numbers of pages if they feel they are getting closer to their goal with each click" (pg 25). And in fact, in his experiments, he measures users' confidence--claiming that users gain confidence at each step if they feel as though they are on the right track.

My curiosity wants to unpack this claim with the aim of investigating how feedback affects this confidence and search perseverance. I envision using different forms of feedback, varying the amount of information, in order to see how that affects search perseverance.

I am also interested in exploring users' emotions beyond 'confidence.' How is their frustration level affected by more clicks/steps? Does this depend on whether or not users have confidence in that click? For example, are users more frustrated in performing an extra click when they know it will take them closer to their goal or when they are unsure?