Tuesday, September 16, 2014

Bug Story

You want to know what a bug is like? Friends and family all marvel at how many years I have worked on my Algebra expert system, and I have heard families of others marvel the same about others. So  here is the story of one bug report. Perhaps you will get a feel for what we do.

Notice how way leads onto way, the investigation starting with the obvious and then branching out as other things catch the eye. Apologies for the impressionistic stream of consciousness quality, but I could not both live this bug and eloquently record it. Free beers to whoever can name the two new England writers I just plagiarized.

The obvious start: Sarah my ace QA and all round general muse reported Pivotal tracker bug 78828414.

Her complaint is that the equation is obviously a contradiction, so the app is wrong when it says it is not (the red background and "Incorrect")


My first reaction was that the software should not have said "wrong", it should have said "you can do more work." But then I realized this was a mastery "Mission", and during Missions if one declares something to be the answer and it is mathematically consistent but not the final answer, it is marked wrong -- part of knowing math is knowing when one has reached the answer.

So Round #1 leads to Task #1: Instead of saying "Incorrect.", the app should say, "That work is Ok but more work was needed to reach the final answer" or something. It can still mark the problem wrong since it is a mission and we do not give second chances on missions, but it needs to be clearer that the work entered was not mathematically unsound.

I checked in the non-mission areas of the app and it did say "You can do more work."

Round #2 is between me and Sarah, with her assertion that -3t-60=-3t is obviously a contradiction. The problem I see is that -3(t+5)-45=-3t is "obviously a contradiction" to some people. What we need (and task #2 is to make explicit) is for the variable to be eliminated from the equation before pronouncing either contradiction or identity. Of course the variable remains if our result is "conditional".

Task #2 is to say not just "You can do more work", we have to explain about eliminating the variable.

So far so good, I just need to communicate better. Well, it is not "just". It matters. Over on Dan Meyer's blog software that (allegedly) gives bad feedback is (un-)justifiably taking a beating. Precise feedback is deadly important and I always fix these ambiguities as users make clear ones I have missed.

Then the wheels came off.

Rather than mess with a mission, I just went to the freestyle section and typed in my own problem: 3t-10=3t-20. Next step 3t+10=3t was marked wrong. 10=0 was marked wrong. My software just cannot do this easier problem! (I love it when the impossible happens--it is actually a clue I use in the debugging.) So...

Task #3: Fix 3t-10=3t-20 [After fixing Task 12 I thought this would just work, but I found another problem: the "Contradiction" averral button generates its answer OK but with a structural excess that throws off the engine. After fxing that (task 13), it just works.]

Task #13: have the contra and other classification buttons generate the right structure.

Plugging that problem into my batch tester used for debugging the maths engine, I used my text syntax to tell the engine that the problem answer should be "cntd:10=0" which means "the contradiction 10=0". Looking at the debug output, I see "SLVD:cntd:10=0". SLVD is used for straight equations, btw. So...

Task #4: What is with "SLVD:cntd:10=0"? Hopefully that is a feature, but even then it should be CNTD:SLVD:10=0.  [Turns out: SLVD was a bug in how I specified the test. );10m to find. ]

Anyway, back in the freestyle section when my engine rejected the correct steps I asked it to solve a similar problem. (The app could do their homework for them but will not.) But given a contradiction to emulate...it created a conditional instead. And the instructions showed it did not even try to create an exercise in conditionals, where sometimes a conditional is the result.

Task #5: If asked to solve a similar problem, do not vary the classification from the original. ie, If asked for a problem similar to a contradiction, generate a contradiction. (Task 8 may fix this, but I doubt it.) [Right, new work was needed. Teribly hard-coded, but how many oddball rpoblem types are there in the world? 20 minutes]

I had it solve the condtional anyway. In doing so, it did not use the available screen space, it started scrolling.

Task #6: Use available screen space in solved examples. [It has been a full day and dozens of lines of code. I'll make a PT story for this and the next.]

As it solved the problem and started scrolling, it did not automatically scroll down to show each new step. Not sure why, I have solved that before (pretty easy, actually).

Task #7: Make sure solved examples autoscroll to show each new step. [PT story]

When it got to the end it just said "Solved", it did not classify the solution. Perhaps this is because, looking back just now, I see it did not even create a problem with the instructions to "classify the result".

Task #8: when making a problem similar to a an equation classify problem, the new problem should have the same instructions. [Ah, it was not even trying, it was just trying to match the transformation at hand. That makes sense, but in this case was too myopic. Big overhaul, but just 20m to my surprise. What can I say? I write great code! Can I say that? NO! Sorry.]

Task #9: After Task #8, check that the tutor now classifies equations when solving similar examples. Looking at the solutions done by the engine in other sections, this should be OK. Task #8 may also fix Task #5, so we will do Task 8 first. [yep, it Just Worked(tm).]

So finally I let the test harness run on the broken problem and what do I see? The engine does not even come up with an answer. This can happen. If the problem is outside the engine's skill set it will come to a point where (a) it cannot think of anything more to do but (b) it knows it has not reached an answer -- the variable to solve for has not been isolated, for example. So it does not offer an answer at all. But then why is it telling anyone they are wrong?

Task #10: Why is the app saying wrong if it cannot solve the problem. (We should fix this first while it still cannot solve the problem so the conditions will be realistic.) [This was only because of the mistake I made setting up the test. In the actual case it was solving the problem. Pretty sure had it been unable to solve it would confess (just worked on that code last week pursuant to the Meyers blog brouhaha.]

Are we done? I wish. I do not like testing (so thank God for Sarah) but I have enough experience to have some instinct for how things can go wrong.

On most problems there is just one way one can say one is done with a problem. One avers that an expression is in simplest form, factored, or solved. So for anti-click convenience I allow the user to hit the "End" key and then I treat that as the one averral possible.

In this case the student must choose from conditional, contradiction, or unconditional. Working on the problem the engine could handle, when I got to the answer and was about to click "contradiction"  I had a thought: What will happen if I hit the "end" key? Crash? Proper message?

Silence. So...

Task #11: Tell user to pick an option (click or tab/enter) if there is more than one. Do *not* silently ignore them.

Ok, now let's see what else comes up, given the tendency of issues to exponentially explode. Everything that follows the line in the sand arose while wading through the above.

---------------------- line in the sand -----------------------------------

Test driver used leading cntd to gen the contradiction but did not strip it off to generate the operand. Lucky an infinite loop did not arise. Fixing that, we see a new problem: cntd:-10=-20 is not recognized as equivalent to cntd:10=0.

Task #12: Why  cntd:-10=-20 not recognized as equivalent to cntd:10=0? [It was just allowing for left=left and right=right or left/right and right/left. Made this smarter. 30m after a nap]

Task #14: after entering the statement, answers cntd and idnt are available, but not cond. I suspect it is being too helpful by noting that the variable has not yet been isolated. ... [OK, Kinda. It does not want them classifying the equation until it has been solved! Exactly what Sarah is trying to get away with, but with a contradiction!! So I will just always have the conditional choice enabled and then deal with premature classification -- the original report!! I love this game. 15minutes.]

Task #15: Well it says do more work if we are at 3x-5=3x-5, but it does not if we get to 3x=3x. That is odd. Investigating.

Summary

The above is as much as I was able to document. The hand-to-hand fighting continued into the night and the next day and then with three known minor (they can wait) issues remaining I declared victory and deployed.

To the kids programming at home, here are your takeaways:
  • Never fix a bug. Understand how the situation arose and how things can be arranged such that they never happen again.
  • When the user reports a "bug" that turns out to be a feature you still get an RFE: do not confuse the user that way.
  • You know how programs fail. Outlier behavior. Hit the "End" key when three endings are possible. Afraid it might not work? I know. :)
  • Never shrug off a small misbehavior. Fix it. You might be surprised what you find.
  • Never leave the unexplained unexplained. You will almost certainly be surprised at the explanation.
  • Don't even tolerate misbehavior in diagnostic tools. Run them to ground. Run everything to ground.
If that sounds like work (a) it is and (b) go ahead, ignore me, then come back in five years and read this again when you have lived the hell of software developed outside those rules.

As for ed tech, hey, the above is why you do not have very much good ed tech. Ironically, the blogger attacked one of the best in the game. At the same time he is writing new attacks they are addressing all his concerns.









Saturday, September 6, 2014

"I get in trouble for the B-plusses"

"I get in trouble for those," said Mr. Visco, pointing to the scant B-pluses sprinkled here and there in a sea of A's.

It was somewhere around 1980 and now a teacher myself I was back to visit my favorite teachers at my '69 alma mater high school in Tenafly, New Jersey, an upper middle class bedroom community for NYC feeding students into the Ivy League.

Not sure how grade inflation came up, but his remark came after I doubted some assertion of his that he, one of the toughest teachers in the school, could no longer grade honestly.  So he had pulled out his gradebook and showed me the sea of A's.

"But you have tenure," I floundered. "How can they control you?"

"There are things they can do," he said. I remember only one of several. "They can take away the honors class."

This is a very long story. It begins with the US high school class of 1971, the first year not subjected to the draft to go fight in Vietnam, the war we were supposed to learn from before...I digress.

Once the draft was no longer a threat to the middle class, we looked around and what did we see?

First, traditional authorities had lost all credibility. This mostly had to do with a federal government leading us into the hell of Vietnam, but along the way also the rejection of police and college authority. Folks dictating rigid sexual mores and drug abstinence were now scorned. Not only Vietnam was involved. A little thing known as the Civil Rights movement also knocked governments on their heels. We the People were no longer impressed by City Hall in any form, and we were indeed ready to fight.

Second, whoa. Look at us! We are prosperous! A population that had lived for twelve years with the depression starting in 1929 first had the economy stimulated by World War II and then, when that nightmare ended, saw a wonderful period of growth first forming a solid middle class and by the late sixties a solid upper middle class.

What do newly empowered prosperous people do with institutions they no longer revere, such as the school boards of upper middle class communities? School boards for which our affluent populations vote? Our affluent parents start dictating new standards guaranteed to get their children 4.0 averages and a shot at the Ivy League.

Back to my high school. Mr. Visco was tough but he was also a delightful flake (and future principal of the high school!!!). I still did not buy it, his gradebook evidence notwithstanding. My next stop was Mrs. Willens, my French teacher. I have heard about nuns in parochial school and I am pretty sure Mrs. Willens taught them tyranny. I was a great student and a favorite of hers and she absolutely terrorified me.

I asked her about the grading silliness Visco had described. Mrs. Willens without hesitation confirmed every word. If they had had video cameras everywhere back then you would have seen my jaw literally drop.

Need something more than anecdotal evidence from two great, tough teachers in a feeder high school for top colleges? Easy. Look at the surge from the mid-sixties on, especially leading up to the mid-70s. Notice also that it is much worse in private schools, where the tuition paid and unvarying Ivy goal brings even more pressure on schools and teachers to churn out 4.0 averages.

Google prize to the person who can find the NY Times 90s-ish story on the New England admissions officer who said they did not even look at grades any more, just SATs.

So what happened when parents started getting teachers in trouble for B-plusses? You might think kids would work less and learn less. You might be right. It was described beautifully in A Nation At Risk.

The funny thing is that the CCSS crowd says the problem is accountability. Ha! You want to talk about accountability? In small communities with local school boards and separate school budget votes we have nothing but accountability and what the parents have demanded and gotten is grade inflation. The parents have afflicted the teachers. Accountabilize that.

Sadly, history has been rewritten. Naughty, lazy, incompetent schools and teachers have been misleading the parents! By giving them the grades they demanded! Here is an otherwise solid story where the author is wholly oblivious to parental strong-arming of schools and teachers.

Losing this bit of history does more than cause us to misplace blame. It has also produced the motive force of CCSS: accountability of schools and teachers, complete with the threat of closings and terminations. The message of CCSS is, "Here are some great new standards, we are not going to tell you how to teach them we are going to fire you or close your school if you do poorly on the questionable tests we also did not specify."

Don't you just love it when government swings into action on something they know nothing about?

CCSS did not understand that parents had provoked the mess, so they took a threatening stance facing teachers using ominous terms like "high stakes testing" thinking they had the parents at their back. Yes, the parents at your back. With arrows strung and bows drawn. It is now late 2014 and after five long years the scores are coming in and CCSS -- surprise, surprise -- is under attack.

CCSS needs some Pogo: "We have met the enemy and he is us."

The parents are getting what they demanded, and no one who does not understand that impetus has a prayer of turning things around.

I know what Arne Duncan means when he mocks upper middle-class white moms discovering their little darlings are not geniuses, but he is confusing two distinct upper middle-class phenomena. The first was not moms being deceived by schools who said their children were gifted, it was those moms and dads leaning on school boards shaking them down for the A's that would get their kids into Ivy League schools.

The second, unrelated phenomenon was the mistake of motivating folks with "My kid is an honor student" bumper stickers. One would rather face the mom of a starving tiger cub than tell a parent of a kid with a B-minus average they cannot have a bumper sticker.

It's a paper chase, right? We all want the best for our kids, perfectly natural. They do not need to know anything to get into Harvard, they need a 4.0 average! Grade by grade, paper by paper: "Why is this not an A? Show me what is wrong with this?". Of course A means "exceptional" and C means "nothing really wrong with it", but the standard had shifted. Everyone was now "A until proven unsatisfactory".

I saw this first-hand in 1973, years earlier, in one of my last undergraduate classes. A student-friendly, socially-correct, left-wing professor was spending class time on a defense. Of some new research result? No, of the scores he had given on our most recent test. I had not done very well, but I was from the outset appalled at the idea of him defending his scoring.

His process involved reading a question from the essay exam and then reading an answer to which he had given full credit. There was one question worth seven points, and when he read the answer I thought I was listening to a speech by Bill Clinton. Finally, it came to an end.

"That's an awful lot for a seven," said one of the leaders of the scoring rebellion.

"Isn't that what a seven is supposed to mean?" I asked from the back of the room, surprising even myself.

The rebel blustered, the professor retreated under cover of the fire I had unintentionally provided.

"I have nothing to add to your dialogue," he said.

But there it was. 1973 and the new standard for "full marks" was "I covered the minimum."

By the way, if upper middle class Ivy-seeking parents are not being misled, perhaps the inner city parents are? That was where I did the bulk of my teaching, and there the story was different. One of the greatest teachersI had ever known  (an African-American, FWIW) filled me in: no one failed. If they put in their time, they got their high school degree. Why?

Think Wizard of Oz. He gave Scarecrow a diploma, and explained rather accurately that that was the only thing he needed.  This teacher was explaining that, no matter what, the kids would get their high school diplomas, so they would have a shot in the job market.

The CCSS crowd is about to discover that standards and tests alone will not change the world, because tougher standards and tests will just fail lots of students.

Their premise is that schools and teachers will be able to get students up to the standards as long as they are threatened with closings and termination, that it is just a matter of will.

Nonsense, but that's another blog entry.