Tuesday, September 16, 2014

Bug Story

You want to know what a bug is like? Friends and family all marvel at how many years I have worked on my Algebra expert system, and I have heard families of others marvel the same about others. So  here is the story of one bug report. Perhaps you will get a feel for what we do.

Notice how way leads onto way, the investigation starting with the obvious and then branching out as other things catch the eye. Apologies for the impressionistic stream of consciousness quality, but I could not both live this bug and eloquently record it. Free beers to whoever can name the two new England writers I just plagiarized.

The obvious start: Sarah my ace QA and all round general muse reported Pivotal tracker bug 78828414.

Her complaint is that the equation is obviously a contradiction, so the app is wrong when it says it is not (the red background and "Incorrect")


My first reaction was that the software should not have said "wrong", it should have said "you can do more work." But then I realized this was a mastery "Mission", and during Missions if one declares something to be the answer and it is mathematically consistent but not the final answer, it is marked wrong -- part of knowing math is knowing when one has reached the answer.

So Round #1 leads to Task #1: Instead of saying "Incorrect.", the app should say, "That work is Ok but more work was needed to reach the final answer" or something. It can still mark the problem wrong since it is a mission and we do not give second chances on missions, but it needs to be clearer that the work entered was not mathematically unsound.

I checked in the non-mission areas of the app and it did say "You can do more work."

Round #2 is between me and Sarah, with her assertion that -3t-60=-3t is obviously a contradiction. The problem I see is that -3(t+5)-45=-3t is "obviously a contradiction" to some people. What we need (and task #2 is to make explicit) is for the variable to be eliminated from the equation before pronouncing either contradiction or identity. Of course the variable remains if our result is "conditional".

Task #2 is to say not just "You can do more work", we have to explain about eliminating the variable.

So far so good, I just need to communicate better. Well, it is not "just". It matters. Over on Dan Meyer's blog software that (allegedly) gives bad feedback is (un-)justifiably taking a beating. Precise feedback is deadly important and I always fix these ambiguities as users make clear ones I have missed.

Then the wheels came off.

Rather than mess with a mission, I just went to the freestyle section and typed in my own problem: 3t-10=3t-20. Next step 3t+10=3t was marked wrong. 10=0 was marked wrong. My software just cannot do this easier problem! (I love it when the impossible happens--it is actually a clue I use in the debugging.) So...

Task #3: Fix 3t-10=3t-20 [After fixing Task 12 I thought this would just work, but I found another problem: the "Contradiction" averral button generates its answer OK but with a structural excess that throws off the engine. After fxing that (task 13), it just works.]

Task #13: have the contra and other classification buttons generate the right structure.

Plugging that problem into my batch tester used for debugging the maths engine, I used my text syntax to tell the engine that the problem answer should be "cntd:10=0" which means "the contradiction 10=0". Looking at the debug output, I see "SLVD:cntd:10=0". SLVD is used for straight equations, btw. So...

Task #4: What is with "SLVD:cntd:10=0"? Hopefully that is a feature, but even then it should be CNTD:SLVD:10=0.  [Turns out: SLVD was a bug in how I specified the test. );10m to find. ]

Anyway, back in the freestyle section when my engine rejected the correct steps I asked it to solve a similar problem. (The app could do their homework for them but will not.) But given a contradiction to emulate...it created a conditional instead. And the instructions showed it did not even try to create an exercise in conditionals, where sometimes a conditional is the result.

Task #5: If asked to solve a similar problem, do not vary the classification from the original. ie, If asked for a problem similar to a contradiction, generate a contradiction. (Task 8 may fix this, but I doubt it.) [Right, new work was needed. Teribly hard-coded, but how many oddball rpoblem types are there in the world? 20 minutes]

I had it solve the condtional anyway. In doing so, it did not use the available screen space, it started scrolling.

Task #6: Use available screen space in solved examples. [It has been a full day and dozens of lines of code. I'll make a PT story for this and the next.]

As it solved the problem and started scrolling, it did not automatically scroll down to show each new step. Not sure why, I have solved that before (pretty easy, actually).

Task #7: Make sure solved examples autoscroll to show each new step. [PT story]

When it got to the end it just said "Solved", it did not classify the solution. Perhaps this is because, looking back just now, I see it did not even create a problem with the instructions to "classify the result".

Task #8: when making a problem similar to a an equation classify problem, the new problem should have the same instructions. [Ah, it was not even trying, it was just trying to match the transformation at hand. That makes sense, but in this case was too myopic. Big overhaul, but just 20m to my surprise. What can I say? I write great code! Can I say that? NO! Sorry.]

Task #9: After Task #8, check that the tutor now classifies equations when solving similar examples. Looking at the solutions done by the engine in other sections, this should be OK. Task #8 may also fix Task #5, so we will do Task 8 first. [yep, it Just Worked(tm).]

So finally I let the test harness run on the broken problem and what do I see? The engine does not even come up with an answer. This can happen. If the problem is outside the engine's skill set it will come to a point where (a) it cannot think of anything more to do but (b) it knows it has not reached an answer -- the variable to solve for has not been isolated, for example. So it does not offer an answer at all. But then why is it telling anyone they are wrong?

Task #10: Why is the app saying wrong if it cannot solve the problem. (We should fix this first while it still cannot solve the problem so the conditions will be realistic.) [This was only because of the mistake I made setting up the test. In the actual case it was solving the problem. Pretty sure had it been unable to solve it would confess (just worked on that code last week pursuant to the Meyers blog brouhaha.]

Are we done? I wish. I do not like testing (so thank God for Sarah) but I have enough experience to have some instinct for how things can go wrong.

On most problems there is just one way one can say one is done with a problem. One avers that an expression is in simplest form, factored, or solved. So for anti-click convenience I allow the user to hit the "End" key and then I treat that as the one averral possible.

In this case the student must choose from conditional, contradiction, or unconditional. Working on the problem the engine could handle, when I got to the answer and was about to click "contradiction"  I had a thought: What will happen if I hit the "end" key? Crash? Proper message?

Silence. So...

Task #11: Tell user to pick an option (click or tab/enter) if there is more than one. Do *not* silently ignore them.

Ok, now let's see what else comes up, given the tendency of issues to exponentially explode. Everything that follows the line in the sand arose while wading through the above.

---------------------- line in the sand -----------------------------------

Test driver used leading cntd to gen the contradiction but did not strip it off to generate the operand. Lucky an infinite loop did not arise. Fixing that, we see a new problem: cntd:-10=-20 is not recognized as equivalent to cntd:10=0.

Task #12: Why  cntd:-10=-20 not recognized as equivalent to cntd:10=0? [It was just allowing for left=left and right=right or left/right and right/left. Made this smarter. 30m after a nap]

Task #14: after entering the statement, answers cntd and idnt are available, but not cond. I suspect it is being too helpful by noting that the variable has not yet been isolated. ... [OK, Kinda. It does not want them classifying the equation until it has been solved! Exactly what Sarah is trying to get away with, but with a contradiction!! So I will just always have the conditional choice enabled and then deal with premature classification -- the original report!! I love this game. 15minutes.]

Task #15: Well it says do more work if we are at 3x-5=3x-5, but it does not if we get to 3x=3x. That is odd. Investigating.

Summary

The above is as much as I was able to document. The hand-to-hand fighting continued into the night and the next day and then with three known minor (they can wait) issues remaining I declared victory and deployed.

To the kids programming at home, here are your takeaways:
  • Never fix a bug. Understand how the situation arose and how things can be arranged such that they never happen again.
  • When the user reports a "bug" that turns out to be a feature you still get an RFE: do not confuse the user that way.
  • You know how programs fail. Outlier behavior. Hit the "End" key when three endings are possible. Afraid it might not work? I know. :)
  • Never shrug off a small misbehavior. Fix it. You might be surprised what you find.
  • Never leave the unexplained unexplained. You will almost certainly be surprised at the explanation.
  • Don't even tolerate misbehavior in diagnostic tools. Run them to ground. Run everything to ground.
If that sounds like work (a) it is and (b) go ahead, ignore me, then come back in five years and read this again when you have lived the hell of software developed outside those rules.

As for ed tech, hey, the above is why you do not have very much good ed tech. Ironically, the blogger attacked one of the best in the game. At the same time he is writing new attacks they are addressing all his concerns.









1 comment:

  1. We need more of these design scenarios. I'm sure Dan had lots of these as well with Dave ?) and Desmos. I often find the open ended tasks a 'cop out', because it's a lot of work to improve feedback we keep it at yes/no or useless process feedback ("have another look at page 3" or "think again")

    This particularly example with contradictions, 'no solutions' and 'all real numbers' is one of the harder things, as there often is some equivalence checking but no solutions to check ;-) reminds me very much of the process with the DME, whom I still hold to have one of the best feedback available (and authorable by the teacher).

    ReplyDelete