taw's blog: 2007

Wednesday, December 26, 2007

Looking for a new computer

Because my hatred towards my Macbook reached unprecedented level, and I didn't take my old desktop with me to London, I' looking for a new computer. I probably won't be buying it for another few weeks, but here's a rough spec/wishlist (which reads a lot like "why Macbook sucks" list):

The options are - one good laptop for work+home, or shitty laptop at work + good desktop at home. Right now I think a good all-purpose laptop would be a better solution.
At least 2GB RAM, 4GB even better. That's really non-negotiable, computers with less than that are good for little more than web browsing.
Probably dual boot Ubuntu (for 90% of normal use)/XP (for occasional gaming/compatibility testing). I've seen Vista and I don't like it. Most laptops these days come with Vista by default, but I can always wipe it out.
At least 1440x900 or even better 1680x1050. Macbook resolution of 1280x800 is painful.
I don't really care about screen size that much, resolution is much more important. 17-inch screen usually means bigger keyboard and that's a nice thing to have, even if 15-inch screen with good resolution is all right.
I would like to play something more recent than Quake 3, so integrated GPU is completely out of a question. 8600M GT with 256MB memory would certainly be nice. Something somewhat less powerful is probably going to be fine too.
160/200 GB disk would be nice for dual boot system, 5400 rpm is good enough on Linux as it has really awesome I/O caching unlike OSX. I'm not sure about XP.
I don't remember the last time I've seen a CPU-bound program, vast majority being limited by memory or I/O or GPU. So Intel Core 2 Duo 2.0 GHz or its moral equivalent by AMD should be just fine.
At least 3 USB ports, with 4 or more points scoring extra points.
DVD burner and real DVD drive (not annoying slit like in Macbook) would be a big plus.
I woudn't care if it was a desktop but for a laptop 3-year guarantee is pretty much necessary.

Monday, December 24, 2007

Javascript version of jrpg

I have just written a very small Javascript version of jrpg, which can be accessed here:

Javascript version of jrpg

If you want to try it out, you will need a decent browser like Firefox 2, and Japanese fonts. It might work with other browsers, but I haven't done any testing yet. Just like in pygame version you can move around using arrow keys, and if they don't work by using numeric keyboard with numlock on. Battles are fought by keyboard and enter key.

The game is a very small demo, allowing you to only go around a very small map, collecting coins and fighting hiragana demons. Most of the cool features from the full pygame version are not implemented, so if you haven't seen it yet go ahead and download it. I did it because I wanted to see what's possible with pure Javascript+CSS, and relearn modern Javascript.

Things I learned from the experience:

I really like jQuery. Mixing DOM and jQuery operations could be made somewhat easier, but even without that jQuery really greatly improves the code.
Declarative specifications of presentation and semi-declarative specifications of behaviour are extremely convenient way of creating user interfaces. They definitely beat pygame in convenience.
Having fully powerful layout engine, and font engine is a great improvement over pygame.
Javascript's OO system is quite nice to use in spite of being so different from other OO systems. Definitely beats Java's.
Functional programming in Javascript is possible, but much more painful than in any other language I know due to its strictly C-like syntax, except for the languages which do not support functional programming at all like Python, Java, or Prolog. It would be nice if future version of Javascript at least introduced implicit return, and renamed function to fun. Compare Javascript foo.map(function(x){return x+1}) with hypothetical improved Javascript foo.map(fun(x){x+1}) and Ruby foo.map{|x| x+1}. Bad syntax discourages using high order function a lot, and creating your own high order functions even more.
Destructuring assignment is an awesome feature that is not normally noticed when it's there, but it hurts a lot when it's missing.
Lack of string interpolation makes string building code look like crap.
Firebug is even more awesome than I thought.

I'm not sure how far I'd like to go with the experiment. I don't see any reasons why jrpg couldn't be reimplemented in Javascript+CSS, and it would definitely expand the possibilities, especially with displaying messages. On the other hand performance might be much worse than pygame version (or not), and standalone application is in many ways more convenient than a web-based one (fonts can be bundled, no browser upgrade needed).

Wednesday, December 19, 2007

My Macbook died

Hard drive in my Macbook died. One of the most annoying things about laptops is that there's no way of fixing them on your own, even if it's something trivial like replacing a hard drive with a new one. I'm going to see how good Apple customer support really is.

Frankly I have been disappointed by pretty much everything Apple. Their hardware is underpowered relative to price, comes in weird configurations, and doesn't even look that pretty after some use because it's so easily scratchable. For example Macbooks don't have a real GPU, but they have a totally useless FireWire port instead of a third USB. Even iPods which seem to be the most popular Apple product all come with crappy earphones and no USB port.

Apple software also ranges from very bad like Safari (why couldn't they simply bundle reskinned Firefox ?) through really horrible like OS X to iTunes which is able to define its own category of suckiness way beyond any other program I know. I haven't ever touched iPhone and I do not intend to, so I'm not going to say anything about it. The only piece of Mac software that I actually liked was TextMate, which obviously wasn't created by Apple. If it was it would be an AppleScript Editor.

Basically Apple sucks and no amount of TextMate bundles and "cool" marketing are going to change that.

Wednesday, December 05, 2007

Civilization II

Because the damn Macbook comes with integrated GPU instead of even a very cheap real one I can only play very old games on it. So I'm kinda going through the old titles, at least until I can get a decent box.

I remember playing 4X games like Civilization, Civilization II, Colonization (about which I wrote some time ago), Master of Magic, and Freeciv a lot when I was younger. Now that I look at them again the original Civilization seems rather unplayable due to VGA graphics, and as the gameplay isn't much different I tried Civilization II.

The first thing I noticed is what a bunch of lies the unit statistics are. You see - in the original Civilization units had attack and defense points. In Civilization II they wanted to keep these points essentially unchanged so it would look familiar to the player base, but to rebalance the game by changing their meanings. So they introduced hit points and firepower. A quick math shows that the effective attack and defense ratings follow these rules:

effective attack = attack × firepower × hitpoints
effective defense = defense × firepower × hitpoints

Here's a full table of units, ordered by more or less their battle power.

Name	Effective attack	Effective defense	Nominal attack	Nominal defense	Firepower	Hitpoints
Battleship	96	96	12	12	2	4
Nuclear Missle	99	0	99	0	1	1
AEGIS Cruiser	48	48	8	8	2	3
Howitzer	72	12	12	2	2	3
Carrier	8	72	1	9	2	4
Submarine	60	12	10	2	2	3
Cruiser	36	36	6	6	2	3
Stealth Bomber	56	12	14	3	2	2
Cruise Missle	60	0	20	0	3	1
Bomber	48	4	12	1	2	2
Helicopter	40	12	10	3	2	2
Armor	30	15	10	5	1	3
Artillery	40	4	10	1	2	2
Stealth Fighter	32	12	8	3	2	2
Mechanized Infantry	18	18	6	6	1	3
Marines	16	10	8	5	1	2
Fighter	16	8	4	2	2	2
Ironclad	12	12	4	4	1	3
Destroyer	12	12	4	4	1	3
Cavalry	16	6	8	3	1	2
Paratroopers	12	8	6	4	1	2
Alpine Troops	10	10	5	5	1	2
Cannon	16	2	8	1	1	2
Riflemen	10	8	5	4	1	2
Partisans	8	8	4	4	1	2
Fanatics	8	8	4	4	1	2
Dragoons	10	4	5	2	1	2
Frigate	8	4	4	2	1	2
Musketeers	6	6	3	3	1	2
Transport	0	9	0	3	1	3
Catapult	6	1	6	1	1	1
Crusaders	5	1	5	1	1	1
Legion	4	2	4	2	1	1
Knights	4	2	4	2	1	1
Elephant	4	1	4	1	1	1
Archers	3	2	3	2	1	1
Chariot	3	1	3	1	1	1
Galleon	0	4	0	2	1	2
Engineers	0	4	0	2	1	2
Horsemen	2	1	2	1	1	1
Caravel	2	1	2	1	1	1
Pikemen	1	2	1	2	1	1
Phalanx	1	2	1	2	1	1
Warriors	1	1	1	1	1	1
Trireme	1	1	1	1	1	1
Settlers	0	2	0	1	1	2
Freight	0	1	0	1	1	1
Explorer	0	1	0	1	1	1
Caravan	0	1	0	1	1	1
Spy	0	0	0	0	1	1
Diplomat	0	0	0	0	1	1

So a cruise missle with nominal attack of 20 (effective 60) usually won't be able to take carrier of nominal defense 9 (effective 72), let alone a Battleship of nominal defense 12 (effective 96), even though by the original rules it should sink both without any trouble. On the other hand Ironclads at 12/12 (nominal 3/3) will easily take down every single land unit from their era, what every Civilization II player probably knows - but it was definitely not so in the original Civilization.

This isn't that bad now that I know it, but I feel so cheated. I remember playing Civilization II so many times while believing in the official attack and defense strengths, and wondering why I'm so lucky or unlucky (depending on circumstances).

When it comes to cheating, Civilization II does plenty of it. First, computer players totally ignore the fog of war. That's especially annoying when playing WW2 scenario, because their submarines are invisible and can sink your ships easily, but your submarines have big bullseyes on them. In the real game AI would be totally exterminated by the time submarines are invented of course.

Also computer players pay less for production and city growth (depending on difficulty level), don't suffer from the annoying 50% production switch penalty, and cheat in many other ways. Of course all that is understandable - the AI is so horrible compared to recent games it had to massively cheat to be any challenge.

But let's leave the question of cheating, and concentrate on gameplay. Unlike Freeciv which requires a very specific strategy (smallpoxing + We Love the Consul) to have any chance, Civilization II is so simple that a wide range of strategies lead to victory. Unless you're very unlucky early in game it's difficult not to win. The simplest strategy is plenty of settlers for expansion (and later roads/irigation), Republic, and as much spent on research as possible. Easy victory can be achieved by building Marco Polo's Embassy, which costs only 200 and lets you exchange technology with all computer players. This lets you get ten technologies by giving five players only two technologies each. This is normally a game winning event. Wonders which reduce unhappiness are also very useful and you should get them all.

It's often easier to simply buy other civilizations' cities and units than try to conquer them. An unfortunate thing about conquest is that it kills large part of the population and destroys many city impovements, and bribery lets you avoid it. You can also steal technologies with diplomats/spies. Normally there's no point bothering with sabotage.

Armies are expensive, so it's usually cheapest to connect all your cities with roads and railroads and have only a few units which you can move wherever they're needed quickly. AI is totally incapable of performing any serious offensive so you can usually defend yourself without too much trouble. Conquest is somewhat more difficult because AI cities tend to have plenty of armies inside (which just sit there doing nothing). Diplomats/Spies are one easy way of conquering cities. Ironclads/Cruisers are another very effective way - just kill everybody inside with naval bombardment and then transport some land unit to take the city, or Paratroop one if you have Paratroopers. Ironclad's effective attack strength of 12 can easily handle most fortified units of the era, and naval attacks apparently ignore City Walls. Cruiser's effective attack of 36 does the same trick in the later era. One more thing - naval units defending cities have firepower reduced to one. This halves effective defense of Cruisers, AEGIS Cruisers, Carriers, Battleships, and Submarines, but does not affect other ships like Ironclads, Destroyers, and Transports which already have firepower one. So it's not such a big deal.

Consul's Day apparently works in Civilization II, but it makes sense to refrain from using it because it becomes just soo damn easy. For people who don't know it - in Republic/Democracy if over 50% of citizens are happy and nobody is unhappy, a city starts growing 1 population a turn. Just max luxury rate for a couple of turn and reassign some people to Elvis roles until you get that in every city. It's just way too easy.

I could go on talking about strategy but the game is basically so easy that you should have absolutely no trouble winning it. Overall if you feel nostalgic and don't want too much of a challenge, go for it !

Wednesday, November 28, 2007

Installing Oracle 10g Enterprise Edition on Ubuntu 7.10 gutsy gibbon

As Trampoline Systems is doing "Enterprise" software we don't have a luxury of choosing whatever database we want (most likely mysql or postgresql, at least as far am I'm concerned), but need to support whichever database our customers use. I don't think there's any objective reason for customers being interested in database used by application any more than they should be interested in its programming language or ORM library, but for some reason the current tradition says that in "Enterprise" software applications can use any programming language and ORM they want, but the customer determines the database.

Now some people who never tried it claim that moving Rails application from one database to another is just a matter of a single line configuration change. How do I know they never tried it ? Because we did. At the very least it requires all tests to be run twice, once with mysql and the second time with Oracle. And number of things which work differently is just enormous. To make things worse most people here use Intel Macs, and there are not even Oracle clients for them, let alone servers, so when any problems happen they need to be fixed on a Linux server over ssh or on a borrowed Windows machine. It all works, but it's far from trivial.

Now since I wiped out OSX on my machine and installed Ubuntu, I thought I might be able to install Oracle 10g Enterprise Edition on it. Unfortunately Ubuntu is not an officially supported platform for Oracle, and Express Edition is apparently different enough that we don't want to use it. In the end it all worked quite less with just a little bit of hacking.

First you need to get Linux installer (10201_database_linux32.zip) and unpack it. The installer relies on some binaries being in /bin instead of /usr/bin so you need to make a few links.

$ sudo ln -s /usr/bin/basename /bin/basename
$ sudo ln -s /usr/bin/awk /bin/awk

Now you need to add a "nobody" group. Debian-based system use nobody.nogroup instead of nobody.nobody, but we aren't losing any security by creating an extra group.

$ sudo addgroup nobody

Now enter installer directory (database), and run the installer. There are many questions, but they are straightforward to answer even if you know little about Oracle. You do not need to do it as root.

$ ./runInstaller -ignoresysprereqs

After you do that the installer will ask you to run two scripts as root. Paths to these scripts (shown in installer dialog) depend on your system, for me they were:

$ sudo /home/taw/oraInventory/orainstRoot.sh
$ sudo /home/taw/oracle/product/10.2.0/db_1/root.sh

Now test if it works with your Rails. If it doesn't due to being unable to load oci8lib.so either recompile OCI8, or more easily add the following line to your .bashrc:

export LD_LIBRARY_PATH=/home/taw/oracle/product/10.2.0/db_1/lib/

By this point you should have Oracle working. The only thing missing is getting it working again after reboot. Just create a /etc/init.d/oracle like this one (correct path and oracle account):

#!/bin/bash

ORACLE_HOME=/home/taw/oracle/product/10.2.0/db_1
ORACLE_OWNR=taw

# if the executables do not exist -- display error

if [ ! -f $ORACLE_HOME/bin/dbstart -o ! -d $ORACLE_HOME ]
then
       echo "Oracle startup: cannot start"
       exit 1
fi

# depending on parameter -- startup, shutdown, restart
# of the instance and listener or usage display

case "$1" in
   start)
       # Oracle listener and instance startup
       echo -n "Starting Oracle: "
       su - $ORACLE_OWNR -c "$ORACLE_HOME/bin/lsnrctl start"
       su - $ORACLE_OWNR -c $ORACLE_HOME/bin/dbstart
       touch /var/lock/subsys_oracle
       echo "OK"
       ;;
   stop)
       # Oracle listener and instance shutdown
       echo -n "Shutdown Oracle: "
       su - $ORACLE_OWNR -c "$ORACLE_HOME/bin/lsnrctl stop"
       su - $ORACLE_OWNR -c $ORACLE_HOME/bin/dbshut
       rm -f /var/lock/subsys_oracle
       echo "OK"
       ;;
   reload|restart)
       $0 stop
       $0 start
       ;;
   *)
       echo "Usage: $0 start|stop|restart|reload"
       exit 1
esac
exit 0

and create symlinks to it:

$ sudo ln -s /etc/init.d/oracle /etc/rc0.d/K01oracle
$ sudo ln -s /etc/init.d/oracle /etc/rc2.d/S99oracle
$ sudo ln -s /etc/init.d/oracle /etc/rc6.d/K01oracle

Congratulations to the first person to finish jrpg

Congratulations to Stephen Christenson, who is the first person to finish jrpg, or at least the first to inform me of doing so. According to the savefile he sent me it didn't take that much kanji demon bashing to complete all quests (at least the crystal ball thinks so), so I should probably add a few more maps and quests.

In the last few months I got multiple feature requests for jrpg. Some of them like kana keyboard support seem relatively straightforward (if the operating system, SDL and PyGame support them). Fixing wrapping of long examples was requested often, and it shouldn't be that hard. The most popular request that simply cannot be done in a reasonable amount of work is providing translations for all words in the game.

It's really nice hearing from people who enjoy jrpg. Have fun and good learning.

Saturday, November 17, 2007

Mac vs Ubuntu

I wiped out OS X on my Macbook and installed kUbuntu. I just couldn't stand it any more.

The good:

Packaging system is so massively better. Installing mysql is sudo apt-get install mysql-server-5.0 not spend a few hours on it. Upgrading actually works. Packages don't fail in the middle of installation, possibly fucking the entire system.
No need to register at random websites to have gcc. Did you know that they will spam you if you register, and "Unsubscribe" link in their spam won't work ? Now you do know. Oh and the spam is not even about Macs, it's about some crappy phones.
Window manager is way more functional. It manages focus of windows not applications. Managing focus of applications is simply braindead when application is something like Terminal or Textmate or Finder, windows of which live completely independent life. As far as my experience go, almost every single application with multiple windows should have them managed independently.
Middle click for the win ! Command-C, Command-V is so annoying. Unfortunately crappy Mac trackpad has only one button so I cannot middle-click it even with the standard "press both buttons" trick. Well, It's not like I planned to buy any more Apple hardware every.
Sane terminal emulator, with all keys working properly.
The system feels much faster, almost as if I upgraded the hardware.
Many applications are massively better. Amarok instead of some horrible piece of crap that I didn't use anyway because even command line music playing with mplayer was more convenient. Xchat instead of Colloquy. Nobody uses Safari even on OS X so Firefox doesn't really count as a change.
Polish Dvorak keyboard works.
All packages are reasonably up to date. I don't have to choose between ancient Ruby and working Java (Tiger), or recent Ruby and broken Java (Leopard), or spending way too much time to get it all working properly.

The bad. Most of it is Apple proprietary hardware's fault, not Ubuntu's fault:

Font rendering in kUbuntu is very ugly compared to OS X. They should really fix it.
No TextMate any more. I will need to find a decent replacement.
Interface looks somewhat less pretty. It doesn't have to look like Mac, but some nicer themes would be good.
I couldn't get secondary monitory working other than in shadowing mode. If I try to put secondary 1680x1050 monitor side by side to 1280x800 laptop screen I get error that total screen size would be bigger than the allowed maximum of 1680x1680. GPU driver issue ?
Wireless doesn't work out of the box, needed downloading some driver. Not very difficult, but downloading things when internet is down is an unnecessary complication.
Hibernate-on-lid-closure doesn't work out of the box.
Ubuntu doesn't autodetect FireWire-connected external hard drive (500GB LaCie). The same hard drive works when connected by USB 2.0. I blame Apple for including only 2 USB and 1 FireWire instead of at least 4 USBs. Mouse and keyboard alone take two slots for fuck's sake.

The random:

KDE used to have all configuration in one place - Control Center and it was good. Now it's randomly divided into Menu/System Settings, Menu/Settings, Menu/System.
It will take some time to get used to keyboard shortcuts being on Alt/Control instead of Command/Control/Alt system.

Wednesday, October 17, 2007

Dinosaurs

I'm not talking about Fortran, I'm talking about the actual reptiles here. Or protobirds. For as long as I can remember I believed that fossilized bones are just a poor substitute for real genetic information and people use it only because dinosaurs happened to die out without living too many descendant lines. But having seen this BBC documentary I was shocked how much behavioral information they could get from just a bunch of bones plus some experimentation. In addition to scientific value it's simply awesome. Links: part 1, part 2.

Tuesday, October 09, 2007

Two months with Ruby on Rails

I've been coding Ruby on Rails for the last two months and this rant is long overdue. There are just so many thing that are wrong with Ruby on Rails. Being better than PHP or J2EE is not enough to get away from a quick bashing on my blog.

Views

I don't hate HAML any more. Total hatred was my first reaction to it but I more or less got used to it. The main problem with HAML, RHTML and probably all other solutions is not providing any sort of XSS protection. Hand-escaping all strings in views is almost PHP SQL injection hell all over again. The few times one needs to insert raw HTML in the output are far outweighed by the huge security problems caused by "insecure by default" model. And it wouldn't be hard to implement secure templating - just make a subclass of String meant for raw html and make default String and anything wichh to_ses get HTML-escaped.

Both HAML and RHTML are very powerful as templating languages. Ruby is simply very well suited for the job. Completely unlike Python and Java which needs hundreds of lame templating languages. With a few partials, helpers and RJS snippets it's usually hard to imagine a shorter and more natural way of writing it all.

One more nice thing about ERB - it can be used pretty much everywhere, like in database.yml for switching database adapter depending on whether it's JRuby or matz's Ruby, or in SQL snippets meant for initializing database. Maybe that's not a huge deal but there's not other language where it's so natural to do.

Controllers

Controllers are basically small bags of actions which actually do stuff. Separation between controllers and views has one huge wart - flash. It doesn't clearly belong in either. And if you want things like markup and links inside flash messages - a perfectly reasonable thing to do - it gets uglier that Perl on a bad day. I'm not sure what's the right way to do it, flash partials maybe ? Or full set of link helpers available inside controller.

Functional testing reuses controller objects without cleaning out their instance variables between requests. That's just wrong. It also reuses request and response objects for no particular reason. Oh, and it silently ignores 403s, flash[:error], doesn't follow redirects, and relies on cross-site request forgery to get any testing done - if forms include a security token in hidden field you cannot test by directly posting, you must use the actual form ! This is probably the most broken part of Ruby on Rails.

A good functional test would look something like that:

def test_that_edit_respects_item_ownershit
  login
  get :edit, :id => your_item
  form.fill_and_sumbit :x => "Foo", :y => "Bar"
  assert_raises Error403 {
    get :edit, :id => somebody_elses_item
    form.fill_and_sumbit :x => "Foo", :y => "Bar"
  }
  assert_equal "This item belongs to someone else.", flash[:error]
end

but it's a long way to get anywhere near that point. form-test-helper is probably a good start.

Models

Models in Ruby on Rails are based on an idea that you can either have a very high way view or go raw SQL but nothing in between.

Raw SQL wouldn't be that bad if they at least handled security somehow (execute accepting "%" would be a good start), strings returned from SQL were converted to Ruby objects (surprisingly timestamps get returned as Ruby objects in some databases), and results of execute supported map method. Or wait - it would be bad, it's SQL after all. Why are we still using RDBMSes in 2007, weren't they supposed to die together with Fortran or something ?

Good thing about models is how easy it is to move code between controllers and models. This barrier is much more permeable than controller-view barrier, resulting in easier refactoring and code looking better. Controller-view refactoring is usually much harder.

There's a lot of stuff that doesn't clearly fit in MVC like extensions to core classes, objects that are not backed by database, helpers and so on. It would be nice if it there was a place for putting tests of it.

Routes

There are two ways of doing routes, both bad. One is the old way as seen on screencasts. It ends with paths like /posts/123/destroy which are then fetched by web spiders deleting all your database. The new way is trying to make every controller fit REST model, so you end with DELETE /post_sharabilities/456 or something as stupid. If there is a good way of routing stuff I haven't seen it yet.

A good thing is that you can pretty much ignore it, use simple routes and filter out GETs in the controller. Controller filters are simply awesome, model filters are pretty good too. You can use them to handle things like authorization. One thing they unfortunately cannot handle is data integrity. Unfortunately Active Record hooks are too weak to handle things like ensuring that each person has exactly one primary email address. Why cannot RDBMSes just die ?

Testing

The first annoying thing about Ruby on Rails testing are fixtures. Each test runs inside a transaction so why are they wiped out and reapplied once for each test class ? And they really do not scale. There must be a better solution but I'm not sure what is it. One thing is certain - while mocha is great mocks aren't it, as often hundreds of objects must exist at the same time for testing to be useful.

Permeability of model-controller barrier also means that many things are only tested in controller tests (called "functional" tests in Ruby on Rals but I'm not sure if I like this abuse of terminology). The result - 90%+ in rcov report while half of the model methods are not tested in isolation.

Rake and Capistrano

Rake is simply awesome. It is to other building tools what Rails is to J2EE. Capistrano on the other hand, I have no idea why wasn't simply implemented on top of rake. Maybe it's time to take a look at Vlad the Deployer.

Plugins

Another nice thing about Rails is great hackability. Most behaviors can be easily hacked and most hacks can be easily extracted to plugins. A few things like schema dumper weren't that easily extensible but overall most of the stuff I wanted to hack was very simple to hack. It's also a great thing how 30 independently developed plugins each monkeypatching some Ruby or Rails behavior can work together with almost no conflicts.

Documentaton and console

Code grepping is usually the best documentation. api.rubyonrails.org was sometimes helpful but not always. Trying things out in script/console was usually enough to explore and debug model. Unfortunately it doesn't work with controllers as path helpers and controller action runners are simply not defined there, so I cannot jump from a failing test to console to find out what's going on.

Other stuff

TextMate is great. Usually I hate every program I spend more than half an hour with. In this case I only somewhat dislike some parts of it, what probably means less fussy programmers will just love it ;-)

Monday, August 27, 2007

Resurrection of libgmp-ruby

I've just republished tarball of libgmp-ruby (Ruby bindings to GNU Multiple Precision Arithmetic Library). It's a very old package (literally Ruby 1.6 old), but the server hosting it died and I never quite got to republishing it before.

It is available for download in tarball format. To compile the package use:

$ ./extconf.rb
$ make
$ make install

It used to build Debian packages. I don't know if they still build or if some tweaking is necessary. GEMs are not provided, as the package is older than Ruby Gems. Some day I'll get to updatng it and providing DEBs and GEMs.

Monday, August 20, 2007

Dynamically typed road traffic

I moved to London a few weeks ago. I live at 82 Mildmay Road, Islington, London, N1 4NG, I code Ruby on Rails at Trampoline Systems, I have a beautiful kitten girl Cloud (the blue-eyed white furry creature above), and I use a MacBook.

Let's start from the culture shock part. The British start working at saner hours than Polish, somewhere between 9:30 and 12:00 instead of 7:00 to 8:00. They start their day by eating "English breakfast", which consists of fried eggs, oversalted beacon, sausage made of 50% recycled plastic bottles and 50% soy protein isolate (and definitely no meat), baked beans, half-cooked mushrooms, black pudding (no idea what it is made of), tomatoes prepared in a way that makes them lose the tomato taste, semi-sweet toasts, and a few other weird things. The whole thing is huge, hard to digest, and completely unsuitable for a breakfast. At least that was what I thought at first - now I kinda like eggs, baked beans and semi-sweat toasts.

The next culture shock was in moving around. Pedestrians don't care about traffic lights. On the continent it's expected for people to wait till the light is green before crossing the street. In London nobody does so - people just check if the road is free and if it is they go. As most streets in the city center seem to have pedestrian islands in the middle, it is enough if just a single lane is free from traffic. At first I thought it will certainly lead to huge increase in traffic accidents, but it seems the British roads are actually safer than most of the continent. That's lot like static versus dynamic typing - instead of statically checking "type" of road (RedRoad or GreenRoad) you check if it responds correctly to :pedestrian message and cross if to does. Much more efficient after some getting used to.

Switching from Ubuntu Linux to Mac was weird. Macs have one big advantage over Linuxes - TextMate. As far as I can tell it's the only advantage. Other than that:

They lack single package management system like apt-get. One needs to use a mix of fink, port, gems, binary packages, hand-compiled packages, and I still couldn't install Amarok.
No copy & paste by select and middle-click is annoying.
Safari sucks almost as much as IE4, Macs are pretty much unusable without Firefox.
Macs are not a very good Unix. Packages are outdated and unupgradable (Ruby 1.8.2 from 2004 on a laptop sold in 2007 - wtf?). Basic utilities like find and cp don't accept standard GNU flags. Locale is very annoyingly not UTF-8 without some work. There's no good terminal (neither the builtin one nor iTerm are anywhere near konsole). Filesystem is case-insensitive (yuck). There's no strace and debugging options are limited compared to Linux.
There's no good music player. iTunes is a stinky pile of donkey shit compared to the most awesome Amarok.
There's no good iPod client. iTunes sucks compared to even gtkpod. iTunes sucks compared to everything.
MacBook screen is very small. MacBook trackpad is horrible (not unlike trackpads in all other laptops). Control vs Command distinction is annoying even after a few weeks (Control-D but Command-C, huh ?).

Did I mention TextMate ? The good things are TextMate, TextMate, magsafe connector for power supply, and TextMate. I've tried pretty much every Linux text editor out there and
TextMate is far better than any of them. Maybe even good enough to make me say on Mac.

Monday, July 16, 2007

Who reads my blog - Redditers and Googlers

More than a year ago when I started this blog and I had no idea that anybody would actually read it, but it seems to be doing quite well. According to Google Analytics over the year there were over 90 thousand page views by over 50 thousand visitors. Recently there are about 420 page views daily, or one every three and half minutes. I don't think I have that many friends are relatives, so who reads my blog ?

There seem to be two distinct populations - Redditers, and Googlers. Excluding "direct traffic", which simply means that for whatever reason referrer was not recorder, 35% of visitors come from Google, and 32% from Reddit. The next three sources DZone, Daring Fireball and del.icio.us provide only 6.6%, 3.4% and 1.7% of visits, respectively.

The full story of article's readership look something like that:

Article is published. I submit it to del.icio.us and usually also to reddit
If Redditers like the article it gets to the main page. I have absolutely no idea which articles Redditers will like and which they won't. Actually I less than no idea - things I consider very interesting almost invariably get downvoted, while random rants I wrote when angry or bored get tens of points. So I submit pretty much everything programming-related and let them decide. My karma from doing so is highly positive, so it's probably not considered a very abusive practice
In the next day or two it gets a lot of views from Redditers
People submit it to other reddit-like websites, or write answers to it, and it stays popular for a few more days
There's a sudden drop in popularity, as people move on to other things
Google indexes the article, and a steady flow of Google visits starts. The flow is not wide, but it seems to last pretty much indefinitely

To get some numbers I scrapped Google Analytics reports - Google Analytics has no real API, and it became even more difficult to use programmatically after the update, but I somehow managed to extract the information I want (Google Analytics cookie extracted using Firebug).

require 'time'

$cookie = File.read("/home/taw/ga_cookie").chomp

def wget(url, fn)
  system 'wget', '--header', $cookie, url, '-O', fn unless File.exists?(fn)
  File.read(fn)
end

def each_day(first_day)
  day = Time.now.gmtime
  day_number = 0
  while true
    day_s = day.strftime('%Y%m%d')
    break if day_s < first_day
    yield day_s, day_number
    day_number += 1
    day -= 24*60*60
  end
end

def get_data_for(day)
  url = "https://www.google.com/analytics/reporting/export?fmt=3&id=1222880&pdr=#{day}-#{day}&cmp=average&rpt=TopContentReport&trows=500"
  fn = "results-#{day}"
  res = wget(url, fn)
  header_finished = false
  res.each{|line|
    unless header_finished
      header_finished = true if line =~ /\AURL\tPage Views\tUnique Page Views\t/
      next
    end
    url, page_views, unique_page_views, = line.split(/\t/)
    next unless page_views # Skip the final line
    next unless url =~ %r[\A/\d{4}/\d{2}/]
    next if url =~ /\?/
    yield(url, page_views.to_i)
  }
end

$stats = {}

each_day('20060923') {|date, day_number|
  get_data_for(date){|url, page_views|
    $stats[url] ||= []
    $stats[url][day_number] = page_views
  }
}

$stats_by_post_age = []

$stats.each{|url, stats|
  stats.reverse.each_with_index{|page_views, age|
    page_views ||= 0
    $stats_by_post_age[age] ||= 0
    $stats_by_post_age[age] += page_views
  }
}

total_page_views = $stats_by_post_age.inject{|a,b| a+b}
p $stats_by_post_age.map{|x| 0.01 * (10000 * x.to_f/total_page_views).to_i}

And the not very surprising results:

22.26% of page views are in the day article is published. As the article could have been published on any time of the day (just after midnight to just before midnight), on average that's article's first 12 hours.
It falls rapidly to 11.47% and 4.28% over the next two days
In the following ten days the numbers are 2.03%, 1.82%, 1.46%, 1.49%, 1.25%, 0.99%, 0.86%, 0.72%, 0.95%, 0.81%. By that time more than half visits occurred.
In the following weeks the number gradually decreases, but I think it's more due to many posts not being online long enough than due to actual popularity loss. Maybe I'll run some statistics to test this hypothesis some day.

You should be able to adapt this script to your blog if you want to know how the numbers looks for your blog.

Sunday, July 15, 2007

Short rant on video game usability and 3D acceleration

There's one thing that pretty much every PC game does, and what I really hate. It's using "constant rendering quality" paradigm instead of "constant FPS" paradigm.

PC hardware differs a lot, with some people using older hardware and wanting to play games even if the rendering is only so-so, while other who have just bought shiny new graphics cards demanding really awesome effects from them, more to impress their friends and stimulate graphics card manufacturing than to actually improve gameplay. What pretty much everybody wants is the highest rendering quality that still gives them reasonably FPS rate.

That's what game engines should do - monitor FPS and increase or decrease rendering quality if FPS is not in some predefined range. But not a single game I know does so. Instead they all opt for providing "constant rendering quality" - maintaining some level of rendering quality whether the game gets unusably slow, or has a lot of free GPU cycles. Often both situations happen as player moves from one location to another. Changing graphics setup every few minutes would distract too much from playing, so most old hardware owners either set the quality low enough that they always have good FPS, even if for 90% of the game GPU is half idle, or accept occasional low FPS in exchange for better rendering quality. Or they solve this software problem in hardware and buy a better graphics card.

Oh, and the graphics setup. Instead of having one big "I want that many FPS" slider and then the game filling in details, there are usually dozens of confusing options - some of them affecting rendering speed considerably, others barely at all.

Time to get a new card, the one bought year ago isn't good enough any more.

Saturday, July 14, 2007

Truth, falsehood and voidness in dynamic languages

One of the things which different dynamic languages do differently is how truth, falsehood, and voidness are handled. I checked how it's done in 9 most popular dynamic languages - Common Lisp, JavaScript, Lua, Perl, PHP, Python, Ruby, Scheme, and Smalltalk.

The first question - does the language has dedicated booleans ? That is - do questions like 2 > 1 return special booleans or something else ?

Ruby, Lua, Smalltalk, JavaScript - Yes (true and false)
Python - Yes (True and False)
Scheme - Yes (#t and #f)
Common Lisp - No, it returns symbol t for true and empty list (nil) for false.
Perl - No, it return 1 for true, and undef for false.
PHP - Kinda. Since PHP4 there are booleans true and false, but their behavior is full of hacks - print true prints 1, print false prints nothing, false == 0, false == NULL, true == 1, even true == 42.

If booleans are used in boolean context their interpretation is obvious. If most objects are used in boolean context they usually are treated the same way as true. There are a few common exceptions. How are empty list, integer 0, floating point 0.0, and empty string treated in boolean context ?

Ruby, Scheme, Lua - all are true
Perl, PHP, Python - all are false
JavaScript - empty list is true, others are false
Common Lisp - empty list is false, others are true
Smalltalk - NonBooloanReceiver exception is raised if anything but booleans is used in boolean context.

Is string "0" false ?

PHP, Perl - unfortunately "0" is false, and this is a huge source of nasty bugs
Ruby, Scheme, Lua, JavaScript, Python, Common Lisp - "0" is true
Smalltalk - NonBooloanReceiver exception is raised

Is there a special value denoting absence of value ? What accessing nonexistent array element returns ?

Ruby, Lua - nil, accessing nonexistent elements returns it
JavaScript - undefined, accessing nonexistent elements returns it
Perl - undef, accessing nonexistent elements returns it
PHP - NULL, accessing nonexistent elements returns it
Python - None, accessing nonexistent elements throws an exception
Smalltalk - nil, accessing nonexistent elements throws an exception
Scheme - there isn't one, accessing nonexistent values is an error
Common Lisp - there isn't one, but empty list acts as one in most contexts, it is also returned when accessing nonexistent elements

Is the nonexistent value false in boolean context ?

Ruby, Lua, JavaScript, Perl, PHP, Python, Common Lisp - it is false
Scheme - there is no nonexistent value marker
Smalltalk - NonBooloanReceiver exception is raised

The most common answers are: there are dedicated booleans, and dedicated absence marker; it is possible to use normal objects in boolean context, most of which (including string "0") are treated as true, while absence marker is treated as false.

There is no clear consensus whether 0, 0.0, "", and empty list should be treated as true or false. Personally I think it's better to make them all true. Otherwise either libraries can define other false objects (like decimal 0.00, various empty containers, and so on) what complicates the language, or they cannot what makes it feel inconsistent.

Is most languages accessing nonexistent elements of an array returns an absence marker instead of throwing an exception, and in my opinion that's the right way and it makes the code look much more natural.

Wednesday, July 11, 2007

Using home directory as GTD inbox - version 2

The GTD software I described a few weeks ago evolved quite significantly since then.

Fortunately my inbox is still empty:

$ inbox_size
Your inbox is empty.

It can be used in two modes - either single-shot report of inbox contents with inbox_size, or continuous screening mode plus UI notification with inbox_size_notify. inbox_size.rb is a library (symlinked from /home/taw/local/bin/inbox_size) which finds all items in all my inboxes. It also handles special items:

Unread emails in Gmail inbox
Uncommitted changes to one of the repositories
Music log not committed to last.fm
Passwords file chanced since last encrypted copy
Last backup older than 3 days
Any things I wanted to be informed about

The code

The main code is in inbox_size.rb:

require 'time'
require 'magic_xml'

$offline = false

def inbox_ls
  items_whitelist = %w[
    /home/taw/Desktop
    /home/taw/ebooks
    /home/taw/everything
    /home/taw/img
    /home/taw/ipoddb
    /home/taw/local
    /home/taw/movies
    /home/taw/music
    /home/taw/ref
    /home/taw/website
    /home/taw/website_snapshot
  ]

  files = (Dir["/home/taw/*"] +
           Dir["/home/taw/Desktop/*"] +
           Dir["/home/taw/movies/complete/*"] -
           items_whitelist)
  items = files.map{|x|x.sub(%r[\A/home/taw/],"")}

  # Code for handling special inbox items goes here
  # ...

  return items.sort.map{|item| "* #{item}"}
end

if $0 == __FILE__
  if ARGV[0] == '--offline'
    ARGV.shift
    $offline = true
  end
  items = inbox_ls
  if items.empty?
    puts "Your inbox is empty."
  else
    puts "#{items.size} items in your inbox:", *items
  end
end

inbox_size_notify which scans the inbox continuouly and displays UI notifications if it's not empty is:

require 'inbox_size'

max_displayed = 30

big_timer = 5
old_items = []

while true
  items = inbox_ls
  next if items == []

  if items == old_items
    big_timer -= 1
    sleep 60
    next unless big_timer == 0
  end
  big_timer = 5

  if items.size > max_displayed
    displayed_items = items.sort_by{rand}[0, max_displayed].sort + ["* ..."]
  else
    displayed_items = items
  end
  system "notify", "Inbox is not processed", "#{items.size} items in your inbox:", *displayed_items

  sleep 60
  old_items = items
end

Script which displays KDE notifications is:

header = "Notification"
msg = ARGV.join("\n") # "All your base\nAre belong to us"

system 'dcop', 'knotify', 'Notify', 'notify', 'notify', header, msg, 'nosound', 'nofile', '16', '0'

Backup reminder

Since my disk died I became more serious about backups. I indent to have at least regular rsync of my SVK repository and some important files. Here's a script which rsyncs these files from shanti (my main box) to ishida (an old laptop).

t0 = Time.now

rv = system 'rsync -rL ~/.mirrorme/ taw@ishida:/home/taw/shanti_mirror/'

unless rv
  STDERR.puts "Error trying to rsync"
  exit 1
end

t1 = Time.now

File.open('/home/taw/.last_backup', 'w') {|fh|
  fh.puts t1
}

puts "Started: #{t0}"
puts "Started: #{t1}"
puts "Time: #{t1-t0}s"

If backup was successful a time stamp is saved to /home/taw/.last_backup. inbox_size.rb reminds me if I didn't backup for more than 3 days:

  # Time since last rsync
  time_since_last_rsync = Time.now - Time.parse(File.read("/home/taw/.last_backup").chomp)
  if time_since_last_rsync > 3 * 24 * 60 * 60
    items << "Over 3 days since the last backup"
  end

Tickler file

The "tickler file" (/home/taw/.tickler) contains all things I want to be reminded about. Appointments, deadlines, new episodes of The Colbert Report, whatever. Of course usually I want to be reminded before the deadline, not on the deadline, so the date must be some time before the event of interest. Entries in the tickler file look something like that:

Sat Jul 21 05:49:14 +0200 2007
15 days to Wikimedia Foundation validation deadline

It can be edited as a text file, but it's more convenient to add new entries with add_tickler script:

$ add_tickler 24h "New TCR episode will be available"

unless ARGV.size == 2
  STDERR.puts "Usage: #{$0} 'due' 'msg'"
  exit 1
end

due = ARGV.shift
msg = ARGV.shift

due_sec = case due
when /\A(\d+)s\Z/
  $1.to_i
when /\A(\d+)m\Z/
  $1.to_i * 60
when /\A(\d+)h\Z/
  $1.to_i * 60 * 60
when /\A(\d+)d\Z/
  $1.to_i * 60 * 60 * 24
else
  STDERR.puts <<EOF
Usage: #{$0} 'due' 'msg'
Due can be:
* 15s
* 15m
* 15h
* 15d
EOF
  exit 1
end

due_time = Time.now + due_sec

File.open("/home/taw/.tickler", "a") {|fh|
  fh.puts due_time
  fh.puts msg
}

The tickler file is checked by the following code in inbox_size.rb:

  # Tickler items
  tickler = File.readlines("/home/taw/.tickler")
  while not tickler.empty?
    deadline = Time.parse(tickler.shift.chomp)
    msg = tickler.shift
    if Time.now > deadline
      items << msg
    end
  end

The passwords file

Pretty much every website requires an account nowadays. I don't want to reuse password on multiple website, so I generate them randomly (cat /dev/urandom | perl -ple 's/[^a-zA-Z0-9]//g' | head) and keep them in unencrypted file /home/taw/.passwords which I simply grep if I want to login to some weird website again (normally Firefox remembers these passwords anyway, but sometimes it's necessary).

As it would suck to lose all accounts, I AES-256-CBC encrypt this file and keep encrypted copies in /home/taw/ref/skrt/, which is mirrored to multiple servers. As I need to enter my password to encrypt the file, it cannot be done automatically. The most inbox_size.rb can do is reminding me if there's no up-to-date skrt file:

  # skrt up to date ?
  pwtm = File.mtime("/home/taw/.passwords")
  last_skrt_tm = Dir["/home/taw/ref/skrt/*"].map{|fn| File.mtime(fn)}.max
  if pwtm > last_skrt_tm
    items << "No up-to-date skrt available"
  end

In which case I run the following skrt_new script:

t = Time.now
fn = sprintf "skrt-%04d-%02d-%02d", t.year, t.month, t.day
system "openssl aes-256-cbc /home/taw/ref/skrt/#{fn}

Music log

The iPod-last.fm bridge consists of two parts - one which extracts the log from an iPod, and one which submits the data to last.fm. They communicate using very simple format, with lines like that (time is local):

Sumptuastic ; Cisza (Radio Edit) ; Cisza (Single) ; 185 ; 2007-07-11 17:51:27

Nothing in the format is iPod-specific, so I wrote a wrapper around mplayer which logs music it plays to /home/taw/.music_log. It can also randomize songs and search for them recursively in directories. It uses a few extra programs - id3v2 to get song title, artist and album (from either ID3v2 or ID3v1 tags), and mp3info to get playing time.

def mp3_get_metadata(file_name)
  song_info = `id3v2 -l "#{file_name}"`
  artist    = nil
  title     = nil
  album     = nil

  if song_info =~ /^TPE1 \(Lead performer\(s\)\/Soloist\(s\)\): (.*)$/
    artist = $1
  elsif song_info =~ /^Title  : .{31} Artist: (.*?)\s*$/
    artist = $1
  end

  if song_info =~ /^TIT2 \(Title\/songname\/content description\): (.*)$/
    title = $1
  elsif song_info =~ /^Title  : (.{0,31}?)\s+ Artist: .*$/
    title = $1
  end

  if song_info =~ /^TALB \(Album\/Movie\/Show title\): (.*)$/
    album = $1
  elsif song_info =~ /^Album  : (.{0,31}?)\s+ Year:/
    album = $1
  end

  return [artist, title, album]
end

def mp3_get_length(file_name)
  `mp3info -F -p "%S" "#{file_name}"`.to_i
end

def with_timer
  time_start = Time.now
  yield
  return [time_start, Time.now - time_start]
end

randomize = true
if ARGV[0] == "-s" # --sequential
  randomize = false
  ARGV.shift
end

songs = ARGV.map{|fn| if File.directory?(fn) then Dir["#{fn}/**/*.mp3"] else fn end}.flatten
songs = songs.sort_by{rand} if randomize

songs.each{|song|
  time_start, time_elapsed = with_timer do
    rv = system "mplayer", song
    exit unless rv
  end
  artist, title, album = *mp3_get_metadata(song)
  length = mp3_get_length(song)

  next unless length >= 90 and (time_elapsed >= 240 or time_elapsed >= 0.5 * length)

  date = time_start.strftime("%Y-%m-%d %H:%M:%S")

  File.open("/home/taw/.music_log", "a") {|fh|
    fh.puts "#{artist} ; #{title} ; #{album} ; #{length} ; #{date}"
  }
}

It's a good idea to commit the log to last.fm often, but I'm not doing it automatically yet, as network problems with last.fm are too frequent. Instead inbox_size.rb reminds me if there are old uncommitted entries in the log:

  # .music_log not empty and older than one hour
  if File.size("/home/taw/.music_log") > 0 and File.mtime("/home/taw/.music_log") < Time.now - 60*60
    items << "Music log not clean"
  end

Uncommitted stuff in repositories

I sometimes get distracted by some interruption and forget to commit things to repositories.
I wrote uncommitted_changes script which checks local checkouts of all repositories I use (currently 1 SVK and 2 SVN repositories) if there are any uncommitted changes. I use svn/svk diff instead of svn/svk status as the latter finds all kinds of temporary files, and I always svn/svk add all new files when I start coding anyway.

Dir.chdir("/home/taw/everything/") { system "svk diff" }
Dir.chdir("/home/taw/everything/rf-rlisp/") { system "svn diff" }
Dir.chdir("/home/taw/everything/gna_tawbot/") { system "svn diff" }

inbox_size.rb simply checks that output of this script is empty:

  # Uncommitted changes
  uc = `uncommitted_changes`
  unless uc == ""
    items << "There are uncommitted changes in the repository"
  end

Unread Gmail emails

The last kind of inbox items tracked by inbox_size.rb are email inbox items. Google APIs are almost invariably ugly Java-centric blobs of suckiness, so instead of using Gmail API I simply get the list from RSS, parsed using magic/xml.

  # Unread Gmail messages
  unless $offline
    gmail_passwd = File.read("/home/taw/.gmail_passwd").chomp
    url = "https://Tomasz.Wegrzanowski:#{gmail_passwd}@mail.google.com/mail/feed/atom"
    XML.load(url).children(:entry, :title).each{|title|
      items << "Email: #{title.text}"
    }
  end

Wednesday, June 27, 2007

Regular expressions and strings with embedded objects

Regular expressions are among the most powerful programming techniques ever invented. Real world "regular expressions" are only loosely related to Computer Science "regular expressions". Computer Science "regular expressions" can only provide yes/no answers to "does this string match that regular expression" type of questions. We are usually interested in much more than that - we want to extract data from strings, convert strings to other strings and so on. We call expressions used for this purpose "regular" too for historical reasons.

Regular expressions are extremely concise, but sometimes they don't suffice and we need to write a "real parser". Unfortunately even with the best parser generating tools parsers tend to be many times more complex and error prone than equivalent regular expressions. And if the problem is too complex for regular expressions, very often it is also too complex for whichever parser generator you're using, and needs a lot of nontrivial hacking around limitation of it, or even writing a parser by hand.

Fortunately for many problems there's an alternative to parsers and parser generating tools - regular expressions plus a few tricks. This blog post is about one of such trick.

For my bot which verifies links in Wikipedia I needed to extract data from SQL dumps. SQL dumps look something like that:

INSERT INTO `page` VALUES (1,0,'Astronomia','',1800,0,0,0.600461925007833,'20070601091320',8076762,8584,0), (2,0,'AWK','',329,0,0,0.487812640599732,'20070530195555',8058046,4265,0), (4,0,'Alergologia','',108,0,0,0.580574716050713,'20070520093413',7912844,292,0), ...
INSERT INTO `page` VALUES (14880,0,'Dźwignica_linotorowa','',26,0,0,0.597327036408081,'20060814072401',4282357,727,0), (14881,0,'Urządzenia_transportowe','',91,0,0,0.176666489966834,'20070527090143',2976610,1041,0), ...

Basically a bunch of INSERT statements with multiple tuples each. I wanted to iterate over the tuples. Extracting tuple data from the sql dump is a simple next unless line =~ /\AINSERT INTO `page` VALUES (.*)\Z/. Spliting this blob into tuples and tuple fields is almost trivial - /\),$/ separates tuples and , separates tuple fields. Unfortunately there are many strings inside, and some of them might contain commas.

Sure, it's posisble to write a single monster regular expression to do just that, but it would be quite error-prone. Wouldn't it be easier to simply treat whole SQL strings as single "objects" inside the string ? That's exactly what we're going to do. First we need to get rid of \-escapes. That's not really necessary, as a regular expression for matching strings with \-escapes inside wouldn't really be that complicated, but we can make it even simpler this way. So every /\\(.)/ becomes "\x00" + esc[$1], where values of esc[...] are "safe" characters which won't interfere with further parsing, like A, B, C, and so on.

At this point every ' marks either a beginning or an end of some string. So we can replace all strings by object ids like "\x01<1234>", where 1234 is a suitable number. After we do that we can split on /$,\(/ and , just like we wanted. Afterwards we need to convert embedded objects (like \x01<1234> and \x00A) back to their original form.

The complete code is here:

def hash_or_die(kw)
  Hash.new{|ht,k| raise "Unknown key: #{k}"}.merge(kw)
end

def parse(data)
  esc = hash_or_die "\\" => "A", "\"" => "B", "n" => "C", "'" => "D"
  rev_esc = hash_or_die "A" => "\\", 'B' => "\"", "C" => "n", "D" => "'"
  data = data.gsub(/\\(.)/) {"\x00" + esc[$1]}
  strs = []
  data = data.gsub(/('[^']*')/) { # '
    strs << $1
    "\x01<#{strs.size-1}>"
  }
  records = []
  data.scan(/\((.*?)\)/) {
    records << $1.split(/,/).map{|field|
      field.gsub(/\x01<(\d+)>/) {
        strs[$1.to_i]}.gsub(/\x00(.)/){ rev_esc[$1]
      }
    }
  }
  records
end

def sql_str_unquote(str)
  str =~ /\A'(.*)'\Z/ or raise "SQL string format is wrong: #{str}"
  $1.gsub(/\\(.)/) {$1}
end

page_fn = Dir["plwiki-*-page.sql"].sort[-1]
externallinks_fn = Dir["plwiki-*-externallinks.sql"].sort[-1]

pages = {}

File.open(page_fn).each{|line|
  next unless line =~ /\AINSERT INTO `page` VALUES (.*)\Z/
  parse($1).each{|id,ns,title,*stuff|
    next unless ns == "0"
    title = sql_str_unquote(title)
    pages[id] = title
  }
}

File.open(externallinks_fn).each{|line|
  next unless line =~ /\AINSERT INTO `externallinks` VALUES (.*)\Z/
  parse($1).each{|from,to,index|
    title = pages[from]
    next unless title
    to = sql_str_unquote(to)
    next unless to =~ /\Ahttp:\/\//
    puts "#{title}\t#{to}"
  }
}

The same technique can be used to parse many other things like parsing Lisp code:

require 'pp'

lisp_code = '(a (b c) (d (e) f g) (((h))))'
nodes = []

lisp_code.gsub!(/([a-z]+)/) {
  nodes << [:atom, $1]
  "<#{nodes.size-1}>"
}
lisp_code.gsub!(/\s/,"")
true while lisp_code.gsub!(/\(((?:<\d+>)*)\)/) {
  nodes << [:app, *$1.scan(/<(\d+)>/).map{|x,| nodes[x.to_i]}]
  "<#{nodes.size-1}>"
}
lisp_code =~ /<(\d+)>/
pp nodes[$1.to_i]
# Output:
# [:app,
#  [:atom, "a"],
#  [:app, [:atom, "b"], [:atom, "c"]],
#  [:app, [:atom, "d"], [:app, [:atom, "e"]], [:atom, "f"], [:atom, "g"]],
#  [:app, [:app, [:app, [:atom, "h"]]]]]

and math expressions:

require 'pp'

math_code = '(2 + 2 * 2) / ((2 + 2) * 2)'
nodes = []

math_code.gsub!(/(\d+)/) {
  nodes << $1.to_i
  "<#{nodes.size-1}>"
}
math_code.gsub!(/\s/,"")

until math_code =~ /\A<(\d+)>\Z/
  next if math_code.gsub!(/\((<\d+>)\)/) { $1 }
  next if math_code.gsub!(/<(\d+)>([\*\/])<(\d+)>/) {
    nodes << [$2, nodes[$1.to_i], nodes[$3.to_i]]
    "<#{nodes.size-1}>"
  }
  next if math_code.gsub!(/<(\d+)>([\+\-])<(\d+)>/) {
    nodes << [$2, nodes[$1.to_i], nodes[$3.to_i]]
    "<#{nodes.size-1}>"
  }
end
pp nodes[$1.to_i]
# Output:
# ["/", ["+", 2, ["*", 2, 2]], ["*", ["+", 2, 2], 2]]

Technique of embedding objects in strings and matching such strings with regular expressions is simple and very powerful. Objects can be represented in many ways. If numbers and some sort of brackets are not relevant for parsing, and "\x00" doesn't occur in the input (it almost never does), "\x00<ID>" is a good idea. Fancy Unicode private characters can be used too if the regular expression engine can handle them. If you want to treat different objects in different way you can represent them in some convenient form like \x00<CLASS;PROPERTY_A;PROPERTY_B;ID>. The technique works best in languages with integrated regular expression engines like Ruby, Perl, and RLisp. In others like Python and Java it's going to be somewhat uglier, but still better than full-blown "parsers".

Tuesday, June 26, 2007

Healthcare in Poland

Currently in Poland there's a lot of discussion about the healthcare system. Poland has a mostly state-run healthcare. Not only the health insurance is public, even the hospitals are operated by the government. The former is probably a good idea, considering how horrible the healthcare system is in the only developed country where it is privately run. The latter is a relic of the Communist era which refuses to die.

In the last few months doctors and other healthcare workers were constantly protesting, demanding increased government spending on health care - primarily demanding better salaries. The media completely failed to do their job and didn't provide any real data on health care or health care financing whatsoever. We bloggers need to fill in the gap.

The data wasn't particularly hard to find. It's available from WHO European health for all database (HFA-DB). The comparison is for all 12 "New EU" countries in group and individually, plus "Old EU" and "EU total". I didn't include old EU countries here, as their situation is significantly different, so it wouldn't be particularly meaningful.

Country	Life expectancy (2005)	Infant mortality per 1000 births (2005)	Healthcare spending as % of GDP (2004)	Healthcare spending in USD PPP (2004)
Bulgaria	72.6*	11.65*	8	671
Cyprus	79.54*	3.01*	5.8	1128
Czech Republic	76.19	3.39	7.3	1412
Estonia	72.89	5.44	5.3	752
Hungary	73.02	6.23	7.9	1308
Latvia	71.06	7.8	7.1	852
Lithuania	71.33	6.84	6.5	843
Malta	79.44	5.96	9.2	1733
Poland	75.12	6.42	6.2	814
Romania	71.88*	16.84*	5.1	433
Slovakia	74.3	7.2	7.2	1061
Slovenia	77.58	4.15	8.7	1815
Whole EU	78.44	5.17	8.7	2334.32
Old EU	79.63*	4.34*	9.29	2729.1
New EU	73.96	8.71	6.51	869.61

* - data for 2004

The spending figures are total spending - that is public + private. As you can see healthcare in Poland is somewhat better and somewhat cheaper than "new EU" average. It's still far behind the old EU, but it's not really a crisis situation, especially since the results are improving and the spending is increasing - between 2000 and 2005 life expectancy increased 73.95 to 75.12, infant mortality fell 8.11 to 6.42. Between 2000 and 2004 spending increased as percent of GDP from 5.7 to 6.2, and in USD PPP from 587 to 814.

So the protests seem to be primarily politically motivated. The situation is fairly good compared to other countries in similar situation and is steadily improving. Another thing strongly implying political motivation is the sudden support for the protesting doctors from the normally vehemently anti-union opposition party Platforma Obywatelska.

Making it easy for users to write quality bug reports

One of the coolest things about making your programs public is the user feedback you will get. Some is going to be "Awsum thx, your program just saved my life" and "I tried to run it on Amiga 500 and it crashed, you suck", but in my experience most of the feedback consists of very helpful suggestions and bug reports.

This is very valuable, and as a developer you have a great influence on quality of the feedback. For most Unix console programs and Ruby libraries nothing needed to be done, as normal stack traces get printed to the terminal, and Unix users tend to know how to write good bug reports, but for jrpg it wasn't that simple - many bugs were highly nondeterministic, and as most users ran it on Windows there was no console to print stack traces to. At first I kept telling users to retry with some sort of console turned on, but after two or three such cases I wrote the following code. In util.py:

def save_errormsg(trace_back):
    (tp,v,tb) = trace_back
    tbf = traceback.format_exception(tp,v,tb)
    f = open("errormsg.txt", "a")
    f.write("== ")
    f.write(time.asctime(time.localtime()))
    f.write(" ==\n")
    for line in tbf:
        f.write(line)
    f.write("\n")
    f.close()

And in jrpg.py:

try:
    mistakes = Mistakes()
    book = demonsoul.Book_of_demons()
    mhc  = Main_Hero_Controller()
    wm   = World_model()
    wv   = World_view()
    ui   = UI()

    main_hero = Chara("female-blue", position=(0, 0))
    main_hero.is_main = True

    wm.switch_map("world", (14,25))

    ui.change_text([U"Welcome to the 日本語RPG", U"", U"Press F1 for quick help"])

    ui.main_loop()
except SystemExit:
    pass
except Exception:
    util.save_errormsg(sys.exc_info())
    raise

All exceptions get saved to errormsg.txt. When user reports a bug, I can ask them to attach this file, and thanks to stack backtraces bugs are much easier to fix. For boring technical reasons we don't really want to capture SystemExit, so we let it through.

Another thing that increases feedback quality (and also user count) is good packaging and really good documentation on how to get started - that's where most of the problems seem to be. Listening to users also helps - as you wrote the program it's too easy for you to convince them that they're wrong, and very few users are going to argue with that. Often you're right and their ideas wouldn't work for some reason, but not always. What I found very helpful was explaining the rationale behind the things being the way they are in detail every time an user suggests a change. A couple of times it let me find out that the user was actually right. For example backspace on jrpg repeated significantly too fast - I did some calculations which showed the speed to be just right, and I got used to the way it worked so I didn't notice, but when I tried to explain that to an user I found out that there was a mistake in my calculations and it should really be slowed down a bit.

Being nice to users and replying quickly also improves feedback.

Toy C backend for RLisp compiler

I wrote a toy RLisp C "backend". It's not yet connected to the real compiler - the only thing it compiles is Ackermann function - of course you can make it compile something else by issuing right opcodes by hand.

First some support code. Symbol#<=> is already in rlisp_support.rb, String#ord is supposed to be a wrapper around different behaviour of Strings in 1.8 and 1.9. Symbol#mangle_c and Symbol#stringify_c convert Ruby symbols (which may contain funny characters like :"==\nblah!@#") to C variable names and strings. For brevity I'm not pasting the tests here.

class Symbol
  include Comparable
  def <=>(other)
    to_s <=> other.to_s
  end
  protected :<=>
end

class String
  # FIXME: 1.8 specific, add support for 1.9
  def ord
    self[0]
  end
end

class Symbol
  def mangle_c
    to_s.gsub(/([^a-zA-Z0-9])/) { sprintf "_%02x", $1.ord }
  end
  def stringify_c
    '"' + to_s.gsub(/([\\"])/) { "\\#{$1}" } + '"'
  end
end

Here's the code which calls the code generator, and then the ack function. It uses Ruby->C trampoline to support closure variables. The interface used by code generator is similar to one used by normal Ruby-backed RLisp code generator, but not identical. I'm going to refactor them both to match, so RLisp frontend can talk with either backend. I think C-compiling at least RLisp stdlib is a good idea. C-compiling REPL less so. Code generators are so simple that I hope to be able to maintain both without too much work.

class Stuff
  cg = RLispCodegenC.new(:ack, [:m, :n])
  t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15 = cg.tmps(14)

  cg.funcall(t2, :m, :"==", 0)

  cg.x_if(t2) {
    cg.funcall(t4, :n, :+, 1);
    cg.asg(t3, t4)
  }
  cg.x_else {
    cg.funcall(t5, :n, :==, 0);

    cg.x_if(t5) {
      cg.global_get(t7, :ack)
      cg.funcall(t8, :m, :-, 1);
      cg.funcall(t9, t7, :call, t8, 1);
      cg.asg(t6, t9)
    }
    cg.x_else {
      cg.global_get(t10, :ack)
      cg.funcall(t11, :m, :-, 1)
      cg.global_get(t12, :ack)
      cg.funcall(t13, :n, :-, 1)
      cg.funcall(t14, t12, :call, :m, t13)
      cg.funcall(t15, t10, :call, t11, t14)
      cg.asg(t6, t15)
    }
    cg.asg(t3, t6)
  }
  cg.ret(t3)

  inline do |builder|
    result = builder.c cg.build
  end
end

ack_m = Stuff.new.method(:ack)
globals = {}
closure = { :globals => globals }
globals[:ack] = lambda{|*args| ack_m.call(closure, args) }

print "retval = ", globals[:ack].call(3, 3), "\n"

RLispCodegenC builds one function at time. To use it you need to create a new RLispCodegenC instance, call some opcode methods, call build on the RLispCodegenC object, and compile it using inline. inline caches .so objects, so C compiler is called only when something changed.

require 'inline'

class RLispCodegenC
  def initialize(name, args)
    @name  = name
    @args  = args
    @syms  = {:globals => true}
    @ids   = {}
    @temps = []
    @code  = ""
  end

  def build
    temps = [:globals] + @args + @temps
    src = <<EOF
VALUE #{@name}(VALUE closure, VALUE args) {
  VALUE #{temps.join(", ")};
  #{static_init}
  #{arg_init}
  #{@code}
}
EOF
    src
  end

  def arg_init
    res = "globals = rb_hash_aref(closure, SYM_globals);\n"
    @args.each_with_index{|arg, i|
      res << "#{arg.mangle_c} = rb_ary_entry(args, #{i});\n"
    }
    res
  end

  # This code should be executed just once in Init_*, not every time the method is called
  def static_init
    ids = @ids.keys.sort
    syms = @syms.keys.sort
    res = ""
    res << %Q[int #{ids.map{|i| "ID_#{i.mangle_c}"}.join(", ")};\n] unless ids.empty?
    res << %Q[VALUE #{syms.map{|s| "SYM_#{s.mangle_c}"}.join(", ")};\n] unless syms.empty?
    ids.each{|i| res << "ID_#{i.mangle_c} = rb_intern(#{i.stringify_c});\n" }
    syms.each{|s| res << "SYM_#{s.mangle_c} = ID2SYM(rb_intern(#{s.stringify_c}));\n" }
    res
  end

  def tmps(sz)
    res = []
    sz.times{
      t = :"t#{@temps.size}"
      res << t
      @temps << t
    }
    res
  end

  def to_c(x)
    if x.is_a? Fixnum
      "INT2FIX(#{x})"
    # It means a C temporary, ***NOT*** Ruby symbol
    # FIXME: Mangling ?
    elsif x.is_a? Symbol
      x.mangle_c
    else
      raise "Don't know how to convert #{x.class}: #{x.inspect}"
    end
  end

  def funcall(asg_to, recv, mid, *args)
    @ids[mid] = true
    mid_s = "ID_#{mid.mangle_c}"
    args_m = args.map{|a| ", " + to_c(a)}.join
    @code << "#{to_c(asg_to)} = rb_funcall(#{to_c(recv)}, #{mid_s}, #{args.size}#{args_m});\n"
  end
  def x_if(tmp)
    @code << "if(RTEST(#{tmp})) {\n"
    yield
  end
  def x_else
    @code << "} else {\n"
    yield
    @code << "}\n"
  end
  def asg(to, from)
    @code << "#{to_c(to)} = #{to_c(from)};\n"
  end
  def global_get(to, sym)
    @syms[sym] = true
    sym_s = "SYM_#{sym.mangle_c}"
    @code << "#{to} = rb_hash_aref(globals, #{sym_s});\n"
  end
  def ret(val)
    @code << "return #{to_c(val)};\n"
  end
  def debug(msg, val=nil)
    @code << %Q[rb_funcall(self, rb_intern("print"), 1, rb_str_new2(#{msg.inspect})); /* DEBUG */\n]
    @code << %Q[rb_funcall(self, rb_intern("p"), 1, #{val}); /* DEBUG */\n] unless val.nil?
  end
end