Tuesday, January 27, 2015

What Made HomeworkTutor So Good?

The Background

The Tilton's Algebra web site (TA) is a reincarnation of a desktop application sold for Macs and PCs back in the 80s and 90s.

That application was called Algebra I HomeworkTutor. I am a programmer, not a marketer.

It was reasonably successful but I am no businessman, either. I tried to do it all without raising money, and this software is challenging (think years of development, not weeks). Periodically I went to work on other great projects which lasted years before I could hunker down for another push.

The most recent push has lasted fifteen months. (This app is intense.)

The Question

While working on the new version, several times former users of HwT tracked me down (with no little effort) to find out if the software was still available, or in one case to ask me to give a talk on the product. The message was consistent: there was a lot of Algebra software out there but nothing like HwT.

I was asked recently to document as best I could why HwT was so powerful, even though it had none of the new features of TA such as embedded video, an on-line forum, and a "levelling up" process to ensure mastery as well as draw students in.

The Answer

Here is what I have learned, for what it is worth, from anecdotal feedback, occasional published research, and a small amount of experience myself observing students using my software and other tutoring systems. In order of increasing importance, here is my understanding of why HwT succeeded in the 90s to the extent it did and why there is still nothing like it in 2015.

Step-by-step error detection and forced correction of mistakes. 

With HwT, students entered each step of their solution, not just the answer, then checked their work for correctness. When correct, strugglers got encouragement if they were in doubt, greatly reducing anxiety.

If their work was wrong, it just said so, and students could not proceed to the next step without correcting the errant one. This forced correction was transformative and universally popular with teachers. Without this, struggling, discouraged students just plowed through worksheets making mistakes interested only in producing something to turn in so they did not get a zero, the ball now in the teacher's court to correct all the papers and try to reteach the material.

Student engagement.

When mistakes were made, it was up to the student to fix the mistake. Hints and solutions of examples were available, but they had to ask for them and understand them. Importantly, the hints were just that; with most software so-called "hints" actually tell the student the next step.

Furthermore, HwT simply waited until they fixed their mistake, allowing unlimited retries.
Most software stops the process after a certain number of failed efforts, presumably to avoid discouraging the student. HwT trusted the learner to decide for themselves when to ask the teacher for help, and anyway, research shows the struggle is important to the permanence of learning once the student has their "Aha!" moment.

Limited help--but not too little.

The software does offer subtle hints and solved examples for students to learn from, but these require the student to dig into their memory of the teacher's presentation for any benefit to be had. The student struggle is guided, but in the end they have to assemble the solution. I have recently seen research on the value of frequent quizzes: apparently the act of pulling content from memory strengthens the command of that content.

Student control.

I learned this one from the first student I observed using my software on-site at a school. She had the difficulty level set to "Easy" and was doing problem after problem successfully. I suggested she try an "Average" problem and she nicely let me know that she would do a few more easy ones before advancing.

I realized her comfort level was simply different than mine. For the many students traumatized by math it seems better for them to practice "until they themselves knew they were proficient", as one teacher put it.

There is much talk these days of data mining and adaptive learning and software customizing the learning experience automatically. I would be curious if the implicit loss of student agency reduces engagement, effort, and finally results.

Quantity of practice. 

While "time on task" (ToT) is being challenged as necessarily correlated with greater learning (Kohn being a good example), even Kohn acknowledges that if the student engagement is there then the learning does follow.

I am speculating here, but I suspect students using HwT did more problems, achieving a greater quantity of practice as well as the quality discussed above. For one thing, the problems just pop up ready to be worked -- no copying them out of the book onto paper.

And now with TA and its so-called "missions" (summative tests) generated at random, students can do problem after problem trying to pass a mission, much as they play video games for days trying to get past a tough level. The old line about "I do and I understand" is as true as it is old, so I think increased time "on task" is vital.

Speaking of which, the DragonBox Algebra people reported some fascinating numbers from their "challenges" in which thousands of students did tens of thousands of problems. They said something like 93% of the students completed the game, but some took ten times as long and did six times as many problems as the fastest.

Of course for that we need software generating problems and evaluating student work or the load on teachers would be untenable.

Is the anxiety eliminated?

I was not a math major in college so when I decided to move from elementary education to math I had to take a few courses at a local college. One course was differential equations, and I distinctly remember being half-way down a page on one problem doing calculus as a shaky prerequisite skill and feeling terribly uneasy about the whole thing.

It turned out in each case that I was doing fine, but there it was: without any feedback on each step, and with a lack of confidence in my calculus, I experienced math anxiety  for myself.

So I am curious how students will react to step-by-step correction: is knowing they have made a mistake OK, as long as they know? Or will they still report anxiety? Also, how much does it help to have the software say (in training mode) "OK so far."? As a private tutor many a time my clients did a step correctly but then looked at me in doubt. That was the anxiety I experienced doing differential equations.

I think some students will still be upset when they get mistakes, so the cure may not be perfect, but I will be curious to see if the reports are of anxiety or frustration. My hope is that getting the anxiety out of the way will draw out more perseverance and ameliorate even the frustration.

Two Sigma Problem?

Bloom (1984) identified a so-called Two Sigma Problem: how do we come up with an instructional method as effective as a combination of mastery-based progress and good individual tutoring, without hiring a tutor for every math student?

I have mentioned that I have done scant observation of my software in the field. This lack of field-testing was possible because I came at the design from the other direction and simply did my best to recreate in software the experience I provided as a private tutor.

One teacher reported that she put stragglers who had fallen off the pace on HwT and after a number of weeks they actually caught up with and rejoined the mainstream. Large-scale tests of Cognitive Tutor and Khan Academy have failed to demonstrate much benefit at all, but HwT helped failing students catch up with and rejoin the mainstream.

It would be interesting to see if students report any sense of being tutored privately by the software.

Good tutoring

Van Lehn (2011) did not find the two sigma effect Bloom et al reported. Instead, he found effects of 0.76 for software tutors and 0.79 for human tutors. But Bloom explicitly claimed to have used "good human tutors", while Van Lehn documented a wide variation in the quality of the human tutors. For example, often the one-on-one sessions involved very little student contribution -- the tutoring was more of a one-on-one lecture.

My style even when lecturing is to constantly call on students for contributions to the work I am doing at the board, and my style as a tutor was to help students when they got stuck by asking them leading questions they actually had to answer and then if all went well realize how they could get unstuck.

While a good human tutor will always be better than a good software tutor, embodying quality tutoring pedagogy in software makes it more reliable and more widely available. Over time it can be enhanced in the light of experience, capturing even better tutoring in the software asset.

Gaming, good and bad.

On one rare on-site visit I saw students also using Where In the World is Carmen Sandiego? Students had learned that if they asked for enough hints the software would just tell them the answer, so they did that until they ran out of hints. Then they would call out to the teacher to tell them, so they could build back up their reservoir.

One study I saw on Cognitive Tutor said students did not ask the software for help because CT deducted points for hints. Instead, students asked the teacher for help (who, again, provided it). CT  also provides answers after three mistakes, while HwT just lets them flail.

One factor we see is that a teacher can inadvertently undercut the software. That happened as well in the infamous "Benny Incident" where the teacher neglected to keep an eye on a student's fascinating internal dialogue with a tutoring system. (Shaugnessy, 1973.) The student was indeed bright, the questions were often multiple-choice, and the passing threshold was only 80% so the student was able sail through the automated instruction despite grievous deficits in understanding.

So that is the bad sense of gaming. On the good side, new for TA is a so-called "Mission Mode". Here there is no multiple-choice, the standard of success is higher than Benny encountered, the help non-existent, and the tolerance for error nil. But students can try a mission as often as they like, quit a mission going badly any time they like, and even when they fail they get a "Personal Best" certificate if they get that far. So like video game missions, the standard is unbending but the failures are simply stepping stones to eventual success, with progress drawing the student into more and more practice.

Mastery-based.

Last but (not as advertised) not least: a lot of the above boils down to students proceeding at their own pace in what is commonly referred to as the mastery-based model. Bloom felt the mastery model was more important than the private tutoring, perhaps precisely because of the variability of tutor quality documented by Van Lehn. As the DragonBox Algebra results show, over 90% of students could master the game given enough time.

This in turn aligns with what most math educators believe: Algebra is not that hard. I would be interested in whether students who struggled with Algebra before using the software reported a change in attitude in which they changed their assessment of the difficulty of Algebra.

Summary

There is a lot of powerful Algebra software out there, but Algebra failure rates are as bad as ever. Two-year colleges are forced to offer multiple courses even in arithmetic, and the AMATYC has just recommended dropping the Algebra requirement for non-STEM majors, a step already taken by the California college system and elsewhere. So why is all that Algebra software not working, when HwT did back in the 90s?

One easy reason: the only other software I know of that checks intermediate steps is MathSpace. Without step-by-step feedback, most of the wins delineated above disappear.

Other than that we have a good research question: which other elements of HwT were so instrumental in its success? Above are my guesses as to where lies the answer, but it is all anecdotal and seat-of-the-pants. More experience and data on how exactly students and teachers use the software is needed.

I suspect we will find the following:
  • Students report less anxiety.
  • Different students will use different kinds of help (trial and error, video, canned hints, solved examples, and the public forum).
  • Students will do many more problems.
  • Student performance will be better and more uniform, but with a normal distribution of how much practice is needed to achieve that performance.
  • Students will enjoy math in and of itself, as a puzzle.
  • Comparison with adaptive tools will show student agency is more important than precise, automatic throttling of content.