a monochromatic oeuvre

June 27, 2009

Testing Google’s PHP optimization tips

Recently, Google posted a series of articles titled “Let’s make the web faster“, including one by Google webmaster Eric Higgins about optimizing PHP.

Some of Google’s tips are obvious, like using caching or avoiding unnecessary SQL queries. A couple, however, seemed pretty weird to me, so I ran some benchmarks to check them out for myself. (See bottom of post for specs, and other info on how I ran the benchmarks)

  • Use echo instead of print: Higgins doesn’t explain why he thinks echo is faster than print. The PHP documentation links to an faq that explains the difference, and mentions that echo is marginally faster because it doesn’t set a return value, but there is no real difference in speed between the two.

    To test this, I ran two tests, using echo and then print to print a short string (5 characters) and then a long string (5000 characters) 1,000 times each.

    The Benchmark

    Using echo to print a short string 1,000 times: 1170 microseconds
    Using print to print a short string 1,000 times: 1180 microseconds

    Using echo to print a long string 1,000 times: 1163 microseconds
    Using print to print a long string 1,000 times: 1182 microseconds

    Verdict: For short strings, echo was slightly faster, and for long strings, it looks like print was slightly faster. For both cases, this was a difference of only about a couple hundred-millionths of a second per call, and likely well within normal variation between calls. I’d say use whichever one you feel like using, personally.

  • Use switch/case instead of if/else for loosely-typed comparisons:Apparently this has better performance, readability, and maintainability. I’ll bit on the readability and maintainability, but this article is about performance.

    I’m testing with code very similar to the example that Higgins provided for this one. There are four branches, each of which calls a function that does nothing but return, and each branch is equally likely.

    The Benchmark

    Using if/else, repeated 1,000 times: 422 microseconds
    Using switch/case, repeated 1,000 times: 425 microseconds

    Verdict: Once again, the difference is barely noticeable, and it looks like if/else was actually slightly faster on this test. I would say not to worry too much about it, though the switch/case definitely looks better from a code readability point of view.

  • Provide echo with many strings as arguments instead of using concatenation: In the video, Eric replaced each of the periods in an echo call with commas, so that the strings were passed as many arguments to echo instead of concatenating them before passing them in. This seems definitely weird to me, but it might work.

    For this one, I’m testing echo "Hello "."World!"."\n"; vs. echo "Hello ","World!","\n";

    The Benchmark

    Using concatenation, repeated 1,000 times: 1225 microseconds
    Using arguments, repeated 1,000 times: 3255 microseconds

    Ouch! No way is it faster to pass in multiple arguments to echo than concatenation them first! I notice that I have three arguments to echo and that took almost 3 times as long to run… let’s try it with 6 arguments. This time I’m testing echo "Hello "."World!"."\n"."Hello "."World!"."\n"; vs. echo "Hello ","World!","\n"." ","World!","\n";

    The Benchmark (2)

    Using concatenation, repeated 1,000 times: 1445 microseconds
    Using arguments, repeated 1,000 times: 5354 microseconds

    It’s not quite linear, but it’s definitely correlated.

    Verdict: There’s no question about it. Use concatenation! It’s better! Google doesn’t know everything!

The moral of this story… Google is smart, but they don’t know everything. Test things for yourself to make sure their results are the same as yours. It’s possible that Google is using a different version of PHP, running on different hardware, or something like that. Each computer is different and if these small optimizations matter to you, you have to try it for yourself to make sure. Edit: The PHP devs and at least one other blogger have reached about the same conclusions. Note: I ran these benchmarks on my Linode VPS, with 360MB RAM and 4×2.50GHz processors. I was using PHP 5.2.6-3ubuntu4.1 with Suhosin-Patch 0.9.6.2 from the command line, and the PHP Benchmark library. Each reported time was an average over 100 trials. Source code for my benchmarks is available here. You have to pipe stdout to /dev/null, since I’m using print and echo. Results are in stderr.

April 1, 2009

Writing a keygen: Hacker challenge

I stumbled upon a hacker challenge posed by blogger Andy Sloane last night.  He explained how, as a hobby, likes to try to crack or create key generators for commercial programs.  Naturally, he never distributes these cracks, but is only doing it for the intellectual challenge.  Writing a key generator is usually more useful than cracking, since a good keygen typically allows you to receive future updates.  Often, it is also more interesting from an intellectual point of view, as writing a keygen usually requires some knowledge in reverse engineering and number theory.  Andy tries to demonstrate what he means by providing a simple key checker of his own for us to break.  Since I had a lighter workload last night, I figured I’d try it.

For his challenge to the world, he offers the source code of the key checker and three working keys.  The source is available on his blog.  (While a normal keygen writer would not have access to the source code of the key checker, it is usually simple, though tedious, to decompile and interpret the compiled binaries that would be released.)

After downloading and skimming the source, I spent a while trying to understand the custom written (and minimally commented) library he uses for large (125 bit, in this case) integers.  Rather than try to interpret the code itself, I used gdb to inspect the way the library worked, and had soon figured out the inner workings of the key checker:

Each key that it received, it would convert into an integer.  Each key had a unique integer k (with some exceptions: O, Q, and 0; I and 1; and Z and 2 were considered to be interchangeable to lower the change of an misread key) to which it was decoded.

Two magic numbers n = 21967608700894359473408781171922272451 and d=4444901153489723031681183441752517327 were hard-coded into the program.  A new number mkd mod n was created.  The key was accepted if m ≡ 12345678 mod c, where c = n/232⌋ (rounded down to the nearest integer).

This was clearly a problem for number theory! This algorithm looked suspiciously similar to the RSA encryption algorithm, which similarly employs two large numbers with special properties.

A quick summary of the steps behind RSA encryption:

  1. Pick two large primes p and q.
  2. Compute n = pq.
  3. Compute the totient of n, φ(n) = (p-1)(q-1).
  4. Choose an integer e such that e is relatively prime to and less than φ(n).
  5. Compute the integer d such that ed ≡ 1 mod φ(n).
  6. For your message m, encrypt it by generating ciphertext c ≡ me mod n. e is known as the public key and is used to encrypt the message.
  7. The person who receives the ciphertext by using d and applying the formula m = cd mod n.

This works because of Euler’s theorem, which states that aφ(n) ≡ a mod n for any integers a and n.

From that, cd =(me)d=med= m1+kφ(n) m mod n. (Check out the explanation of RSA on wikipedia for a more in depth explanation.

I quickly used Mathematica to factor n to see if it was the product of two primes. It was!  I checked that d was relatively prime to φ(n) to be sure.  It also was. The only part where this differed from RSA was the final step where it checked to see if m ≡ 12345678 mod c before accepting the key.

I was a little thrown off by c being related to n, but I quickly realized that it didn’t matter what c was.  It was just there to make sure that more than one m (and therefore more than one key) was accepted.  Effectively, what this key checker did was to take RSA generated ciphertext as a key and check to see if it decryped to a message that left a remainder of 12345678 when divided by c.

Therefore, to generate a key, I needed to create a correct message and encrypt it, then convert the ciphertext back to the “key” format.  Creating a message is easy. m is an accepted message if m ≡ 12345678 mod c, so m = ac + 12345678, where a is any natural number.  To encrypt m, I need to find the private key e.  Since n is small enough to factor and ed ≡ 1 mod φ(n), it is easy to find e through modular arithmetic.

Voila! Now I have everything I need to write a keygen for Andy’s example key checker. In fact, here it is.  Now, clearly, I should go do my homework, having wasted several hours on this.

March 5, 2009

Good password habits?

Cardinary Sins of Passwords

  • Using short passwords with few numbers or special characters.
  • Using the same password for many things (or everything).
  • Rarely (or never) changing passwords.
  • Using a name or common phrase in a password.

If you’re guilty of one or more of these bad password habits, raise your hand!

*Raises hand*

Despite being (I think) very tech savvy, I have some very bad password habits, and I would not be at all surprised if many of my similarly computer-smart friends do too. Mostly, I use the same password for many things, and I rarely change my passwords.  The password I use most often, for almost all of my accounts on various websites, I first used with my first email account so long ago in the mid 90s. It is a relatively short and simple password, and someone could definitely wreak some havoc if they got their hands on it, though they wouldn’t have access to anything important. Of my other passwords, the newest one is about 3 years old and I came up with the oldest one almost 7 years ago. That’s plenty of time for someone to get their hands on the keys to my personal information, bank accounts, etc.

One of my new years resolutions this year was to improve on my password habits, and I definitely haven’t gotten anywhere with that. (Another one was to blog more, heh, fail there too.) So I guess the question is, why is it so hard to keep passwords up to date and secure?

For me, the number one barrier to changing passwords regularly is the hassle of having to memorize a new password, especially a complex one with numbers and special characters. Similarly, I reuse the 3 or 4 passwords I use most often to minimize the risk of forgetting a password.  With so many different services which require passwords, and more every week as I sign up for a new web site, it is sometimes difficult to remember whether I have an account to a certain site, much less remember a unique password for each one. Furthermore, I’ve never had any problems with security, so I don’t have any real motivation to change them.

Possible solutions

Using a password manager: I got a free license for 1password during MacHeist’s Giving Tree this year, which I’m going to start using. This allows me to use more passwords of greater complexity, and not have to worry about forgetting them. Of course, this is only as secure as the master password I use for 1password, and I have to figure something out for when I am on a public computer. If I don’t actually remember my passwords, I won’t be able to get to my accounts away from my computer.
Using phrase passwords: Instead of the traditional 8-12 character random passwords, use a 20-30 character phrase that is easy to memorize but difficult to guess. To help mitigate the risk of getting locked out of my accounts if I lose access to my password manager somehow, I think I will start using phrases from songs or quotes or something as passwords, instead. Unfortunately, the longer a phrase is, the greater a chance for a typo while typing it in.
Using a password generation system: Another option which might be safer than phrase passwords is using a system to generate secure passwords that are unique for every site. One such system is this one suggested by Lifehacker.

I’m starting to migrate towards these practices right now. Hopefully, I’ll be able to keep it up and maintain my good luck with regard to password security for a while longer.

January 7, 2009

Replacing TwitterIM with my own bot.

TwitterIM:  TwitterIM is under maintenance at the moment. Please check back later.

Twitter’s AIM bot, TwitterIM, has been “under maintenance” for as long as I can remember.  The last time it worked for me was in January 2008, and I have been “checking back later” periodically since.  Personally, I do not have high hopes for it working any time soon.  I decided to just take advantage of Twitter and AIM’s APIs and write my own bot to update my twitter through AIM.

I wrote this in about 10 minutes, so its a really simple program.  Once it connects to the AIM server, it waits for messages from me, ignoring all other messages.  A message from me is immediately posted to Twitter through curl.  In this implementation, there is no check for a successful post, though that would not be particularly difficult to implement.

The source code is available here.  You need Perl, the Net::OSCAR module, and AIM and Twitter accounts, naturally.  Just fill in your credentials at the top of program and run it with Perl.

January 6, 2009

Ruby Blackjack

“Hm, I should probably learn Ruby.”

I probably first said this about a year and a half ago when Ruby and Ruby on Rails were “in”, the cool new (relatively, anyways, Ruby was written in 1995) language and the easy to use web application framework that went along with it.  In the year and a half since then, I’ve glanced at Ruby and Rails, not looking at it close enough to discover more than that it resembled Perl and possibly Python.

Well, as part of an internship interview, I’ve finally gotten a reason to actually sit down, read the Ruby documentation, and understand how this language works.  I am almost done writing a command line blackjack implementation in Ruby.  Here are my thoughts:

Object-oriented. I really like the way Ruby strongly object oriented – everything is an object, including integers and booleans, which most other languages designate as primitives.  This preemptively takes care of the limitations in designating these types as primitives, which often lead to wrapper classes such as Java’s Boolean and Integer.

Open classes. Ruby implements open classes, which allows you to define or redefine the behavior of any class at any time.  This way, I can easily implement: a predicate in the String class that checks if a string is empty or nil, a foldr method in the Array class, or anything else that I need.  In contrast, in Java, implementing these functions would require creating another StringUtils or ArrayUtils class.

Powerful. I don’t know if powerful is the right word.  Overall, I just feel like I would be much more comfortable writing complicated/long programs in Ruby, over Python or Perl.  Normally, I would stick to Java or C++ for projects with a larger scope.  Ruby is really nice, except for…

End? Syntax complaint.  I really prefer using brackets {} to denote a block of code, such as the interior of a loop or a function.  Maybe I’m just not used to the language yet, but I feel that it is laid out so that it is difficult to understand and read code by simply scanning through it.  The problem is made worse by the fact that with brackets, I can easily use vim or TextMate to find the corresponding bracket, whereas “if … end” doesn’t lend itself quite as well to that kind of searching.

I hate blackjack. What a terrible game.  Blackjack is a mishmash of rules and variants that make an elegant implementation of the game difficult.  It is a simple, boring game that casinos keep around because it is considered a beginners game, and so is played by novices who pay little to no attention to strategy.  Provided a player plays correctly, the game is relatively even, with a house advantage of less than 0.5% for more common variants.  Generally, blackjack is just not worth playing. Stick with poker.

In summary, blackjack sucks, Ruby is pretty cool.

December 4, 2008

All your info are belong to Facebook (and Google)?

“But what about interesting things that you do on other sites? Maybe you commented on a blog or dugg an article on Digg. Maybe you put up something for auction on eBay, or found a video you love and wanted to share it. Wouldn’t it be nice if when you were on another site, you could easily find your friends on that site and see what they are doing there, just like you can on Facebook?” — Facebook Connect

In what almost seems like a synchronized release, Facebook Connect and Google Friend Connect have been unveiled today by the largest social network and the largest search engine in the country.  At the very least, these two new technologies will be another general Web sign-on protocol, most likely replacing OpenID, which had a lackluster reception at best and will likely effectively disappear with these two giants as competition.  From a most pessimistic point of view, these two web giants who, between them, already have a ridiculous amount of information about the average user, has yet another means of data collection.

It does seem like Facebook has learned from Beacon, and has made Facebook Connect an opt-in service rather than an opt-out service, hopefully avoiding the privacy scandals that led Facebook to quickly pull Beacon off the site.  Still, I expect to hear stories of people accidentally publishing news about things that they don’t want known, though hopefully not to the same degree as the story about the man who bought an engagement ring on Overstock.com and got it published to his wall.

Perhaps these two tools can fulfill their promise of easy inter-website connectivity without bringing on the anti-privacy apocalypse.  I’ll keep my fingers crossed… while installing Facebook Connect onto my website.

November 8, 2008

Halloween!

‣ ()

I don’t update very much, do I?

Blogs are hard. Oh well. Halloween happened! Here is a picture of me and Dana.

Stick figure ♥.

Stick figure ♥.

September 9, 2008

My dorm room…

‣ ()

So I arrived at school expecting to set up my room in about the same way as I had the year before.  My roommate and I would have our beds set up in a Captain’s bed style on opposite sides of the room along the shorter wall, with our desks along the longer wall and the refridgerator and microwave in the middle of the longer wall between ours desks.

As I walked into my room on the first day, my first thought was that it felt somewhat smaller than it should have been.  I soon discovered that this was because it actually was smaller, as the school had, without informing us, shortened the shorter wall by about 6 inches, supposedly to earthquake-proof the dormitory.

These are pictures of my suitemates’ room, which is set up like I was going to.

The bed is set up in a Captain's style, with the drawers and bookshelf underneath.

The bed is set up in a Captain's style, with the drawers and bookshelf underneath.

The two desks are along the longer wall, with the fridge between the desks.

The two desks are along the longer wall, with the fridge between the desks.

As a result of these changes, one of the beds no longer fit along the shorter wall (the side with the door), and nothing worked anymore! All my plans had been ruined!  After a quick scramble to find an acceptable room design, we decided upon this design (I would try to describe it but pictures would work better I think):

The two beds are lofted and closer together. Mine is along the far side of the room, and Patrick's is in the middle of the room.

The desks and drawers are under the beds, with our work areas in the room between the beds.

A lounge area is set up with a sofa, bean bag sofa, fridge, TV, and video games.

A lounge area is set up with a sofa, bean bag sofa, fridge, TV, and video games.

To achieve this setup, I just bought the sofa, and had the bean bag sofa donated to me from my friend.  The leather sofa chair I used to have is now in the courtyard outside my room.  The only drawback is that our beds are now lofted and relatively close to each other, instead of on opposite sides of the room.  This hangout area is pretty sweet though.  Fair trade.

July 28, 2008

On favicons

‣ ()

I’ve spent several hours this weekend working on a favicon for my website and blog. I’ve finally settled on one I think I can live with – the simple little blue button with the letter M engraved in it.  I’m still not entirely happy with it, though, so it may change again in the near future.

I considered several different candidates, some of which I’ve shown below, for a favicon before settling on this one.

Gallery of rejected favicons

So what is the point of the favicon, and why am I spending hours working on this tiny 16×16 pixel image?  The favicon has become an important part of a website’s identity, representing it in the address bar, on browser tabs, and on bookmarks.  As all modern web browsers now have favicon support, this seemingly unimportant little image has become more and more important as a way to make s website stand out in tabs or in bookmarks.  A well made favicon makes a website look more professional and complete.  It adds a little personality to the site, making it more visible around other websites.

It’s not enough to have just any favicon, though.  The favicon is part of a site’s identity, creating a visual representation of it for its visitors.  Most web hosts or blogging engines come with a default favicon nowadays.  A blog or website that simply uses this default favicon is actually weakening its brand, as a visitor, when remembering the favicon will think of the host from which it originated instead.

At first glance, all these blogs look the same, as they all use the default blogger favicon.

A good favicon should fit the website to which it belongs.  It should match the logo or, if there is no logo, match the color scheme.  This is, for example, part of why I did not choose to use the rainbow colored ball should above as the favicon for this blog.  With the simple blue and white color scheme, a bright, colorful favicon would have looked out of place.  The simple blue button I finally chose was much more appropriate.

Favicons should not be taken lightly.  They are an essential piece to a website’s professionalism, and should be chosen with care.

July 25, 2008

About me

‣ ()

Who are you?

I’m Marquis Wang.  I go to school at Harvey Mudd College in Claremont, California, where I am planning on majoring in Computer Science.  I grew up in Champaign, IL, attending the University of Illinois Laboratory High School, which I graduated from in 2007.

I like to keep busy with various programming projects, which range from designing and maintaining my personal web sites to writing small programs or games.  I will keep some of the more complete ones in my sandbox.

Otherwise, my interests vary from moment to moment.  I enjoy playing chess, going hiking, reading books, and going on adventures.

As a student, my employment status changes from year to year, but in the past 3 years, I have been a sysadmin for the University of Illinois’s Theoretical and Computational Biophysics Group, a student webmaster for Harvey Mudd College, and I am currently working as a programmer for the Cognitive Computation Group, once again at the University of Illinois.

Why do you blog?

I’m starting this blog as a record of the things that I’m interested in – things that I am working on or thinking about.  I can research things that I find interesting, then document my findings on this blog, so that I can refer back to it as necessary.  At the same time, other people can benefit from the same information, and hopefully give some feedback so I’m not working in a black hole.  Some people say that isolation is the best environment for creativity, but I find that it is far too easy to find yourself in a rut unless you have some outside input.

What exactly is a monochromatic oeuvre?

Monochromatic: (adj) Containing or using only one color; monotonous or lacking in variety
Oeuvre: (noun) The works of a painter, composer, or author regarded collectively; a work of art, music, or literature

Therefore, a monochromatic oeuvre is a collection of art of all one color or type.  Hopefully, the contents of this blog will be at least slightly polychromatic.  The title is a tribute to one of the best comic strips of the last two decades, Bill Watterson’s Calvin and Hobbes.  I’ll leave it as an exercise for the reader to find the exact strip in question.

How can I contact you?

marquis@marquiswang.com

§

Resume - Sandbox - Music

© 2008-09 Marquis Wang