Here's how I handled my scores.
Polish is an overall measure of how completed, how professional the game is. If I find bugs, it makes the game look less polished, ditto with bad maps, or low quality music, badly cut sprites, poor dialogue, poor game balance, et cetera. The more fragile your game appears, the less polish points I give it.
Playability is an overall measure of how the game flows. If a game cannot be continued due to game breaking bugs, poor balance, lack of direction, or poor direction, it will lose playability points. The smoother your game plays, the less the progression hiccups, the more playability points I give it.
Entertainment is self explanatory. If I had fun, you score points, if I hated it, then low points.
Ingenuity is the one that seems to be tripping people up, so I'll give my take. Ingenuity is an overall measure of how fresh the game felt, and how clever it is. It doesn't matter if the core concepts are tried and true if they are applied in a fresh and interesting way. See, all games, all stories have basically been done. We can't grade this based on "I've never seen it before" or "I've seen it before." Clever use of eventing systems will net you ingenuity points. Clever applications of gameplay as a storytelling method will net you ingenuity points. Clever storylines will net you ingenuity points. Clever use of artwork and music will net you ingenuity points. This category represents breaking away from the RPG Maker RTP standard and doing something that takes effort, thought, and...well, ingenuity. It is NOT plain and simply originality but originality does factor in.
---
The entire purpose behind this four-category judgement system was to reward people who put in a lot of effort even if the judge didn't necessarily have fun. In early GIAW events, entertainment bias riddled the scores, and otherwise well made games were given terrible scores because some of the judges didn't enjoy them. This system also allows for flexibility, such as with "the beta tester" where for example, polish cannot be judged as black and white, due to the concept.
I hope this clears up my perspective at least.