Archive for the ‘Programming’ Category

Adium Sort By Log Size

Sunday, January 22nd, 2012

Back when I was using Pidgin, I loved its “sort by log size” feature. It organized your contact list for you, floating your most-talked-to contacts to the top. It was also fun to see which of your friends “wins” by being on top.

Adium had a plugin that did this, unsurprisingly called the Sort By Log Size Plugin. Unfortunately, it was partly incompatible with Adium 1.4, occasionally preventing the contact list from loading at all. Even worse, it appeared that the plugin had been abandoned, with no updates since June 2009.

For the last several years I have been relying on manually sorting my contacts into ordered groups on Adium, but I finally got my lazy ass off my couch yesterday and fixed the bug myself. I’m trying to get into contact with the developer, but until then it can be downloaded from my Github (direct download).

Update: The developer has accepted the patch and is providing an update that will be available pending approval! Hooray!

Calvin and Hobbes + NASA Astronomy Picture of the Day

Monday, September 6th, 2010

ihaveideas on reddit had an excellent idea.

I have been playing with ImageMagick recently so I figured implementing this would be good practice.

I think it works so far :-) . Here are some results:

Pretty sweet, huh?

Code is available here. It’s kinda hacky right now, but it seems to work.
The bash script takes one argument, the URL of the APOD post. If no argument is given it defaults to the most recent post.

Possible future improvements: Write a program that runs this daily and sets it as your desktop background. That can wait until at least tomorrow.

Auto-Uploading Screenshots in OS X

Sunday, August 8th, 2010

99% of time I take a screenshot, its to show someone. The workflow there is:

  1. Take a screenshot of a selected area with Cmd+Shift+4. OS X saves it to the Desktop.
  2. Open a terminal.
  3. SCP the file to my web directory.
  4. Open the web directory in a browser to make sure the upload worked and so I can copy the URL.
  5. Paste the URL to whoever I want to share it with.

This is slow and boring. The other day my fellow intern Larry showed me his way of auto-uploading these screenshots, so I shamelessly stole it, modified it a little and am now claiming it as my own:

First, I made an Automator application that runs the following shell script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
USER="user"
HOST="host.com"
UPLOAD_DIR="/var/www/screenshots"
WEB_DIR="screenshots"
LOCAL_DIR="$HOME/Desktop"
 
FILE="Screenshot.`date +%m%d%H%M%S`.png"
URL="http://$HOST/$WEB_DIR/$FILE"
 
screencapture -xi $LOCAL_DIR/$FILE
ping -t 1 -c 1 $HOST
if [ $? -eq 0 ]; then
        scp $LOCAL_DIR/$FILE $USER@$HOST:$UPLOAD_DIR/
        rm $LOCAL_DIR/$FILE
        echo $URL | pbcopy
        growlnotify -m "Screenshot uploaded to $URL"
else
        echo $LOCAL_DIR/$FILE | pbcopy
        growlnotify -m "Screenshot copied to Desktop"
fi

Then, I used Quicksilver to bind Cmd+Shift+4 to call this Application. I have an SSH key set up so the SCP works without any interaction from me. The script takes the screenshot, checks to see if the server is up, uploads the image, and copies the URL to my clipboard. If the server is down, it reverts to the default functionality of leaving the image on the Desktop.

The workflow is now:

  1. Take a screenshot of a selected area with Cmd+Shift+4.
  2. Magic!
  3. Paste the URL to whoever I want to share it with.

Larry just has Quicksilver run the code as a shell script but I like the automator application since it adds a little busy spinner in the taskbar while the script runs since uploading can be a bit slow if the connection sucks.

Dependencies: Quicksilver, Automator, growlnotify, I suppose. Automator, growlnotify, and possibly Quicksilver are all optional.

Fibonacci functions and Dynamic Programming

Saturday, February 20th, 2010

Bhargav and I had a short discussion regarding Fibonacci number generating functions over winter break and I recently answered an interview question along the same vein, so I guess its a question worth looking at:

The recursive formula for the generation of the nth Fibonacci number Fn is Fn = Fn-1 + Fn-2, where F1 = F2 = 1. How then, do we write a function that generates the nth Fibonacci number? The simplest approach is to convert this formula directly into code (all sample code given in python):

def F(n):
    if n == 1 or n == 2:
         return 1
    else:
         return F(n-1) + F(n-2)

This works, but there’s a minor problem. When we run it for small numbers, it seems fine, but it slows down rapidly for large values of n. On my computer (which, admittedly, isn’t very fast, but it’s still a reasonably nice computer), F30 takes about half a second to run, F35 takes about 6 seconds, and I don’t even want to know when F50 will finish (according to my rough calculations, in about 6 days). This is clearly unacceptable. I could calculate F50 by hand in less than 6 days (or 20 minutes). So what’s wrong?

If we look at the function, we’ll see that when you call F(n), it makes two recursive calls to F(n-1) and F(n-2), each of those makes two more recursive calls, which each make two more, and so on. Furthermore, each additional call only reduces the problem size by a constant factor, one or two. This leads to exponential growth, where a call to F(n) takes almost twice as long as a call to F(n-1). This is bad! The culprit here is we’re doing extra work. Here is the tree of recursive calls for F(6):
Recursive tree of function calls for F(6)

As you can see, F(4) is being calculated in both subtrees of F(6), and F(3) is calculated in three different subtrees. As n gets larger, more and more work is being redone in multiple subtrees.

Bhargav’s solution was to use a hash table to save a solution once it’s been calculated, so that future calls can check if the solution has already been found for some n before calculating it again:

fib_results = {1:1, 2:1}
def F(n):
    if n in fib_results:
         return fib_results[n]
    else:
         fib_results[n] = F(n-1) + F(n-2)
         return fib_results[n]

This is in fact much faster and runs in linear time instead of exponential time. Unfortunately, it takes linear space in the form of the hash table, and it is unclear how or whether we can avoid using that space, though it seems like it might be possible since we didn’t use linear space in the naive solution. This approach is called memoization. Memoization is an example of a top-down alternative to dynamic programming. It’s advantage over the bottom-up approach, called dynamic programming, is that it is much easier to convert a naive recursive algorithm to a memoized algorithm. The disadvantage is that dynamic programming is generally somewhat faster in implementation and more powerful, making it clearer how to make optimizations such as getting rid of that linear space hash table.

The idea behind dynamic programming is, once we’ve defined a problem in terms of its subproblems, to solve it by starting from the base cases and solving subsequently larger problems, usually in tabular form. To find F(n), we create an array of n cells, filling cells 1 and 2 with the value 1 (those are the base cases). To fill each cell, we look at the two previous cells and fill it with their sum. Conveniently, this means that the ith cell contains the value of F(i). Hence, by the time we fill the nth cell we will have found the value of F(n).

Sequence for filling DP table to solve F(n).

This algorithm would look like this:

def F(n):
    # Create an array of n cells
    fib_table = [None]*n

    # Initialize base cases
    fib_table[0] = 1
    fib_table[1] = 1

    # Fill table
    for i in range(2, n):
        fib_table[i] = fib_table[i-1] + fib_table[i-2]
    return fib_table[n-1]

Unfortunately, this still takes linear space! It will probably be a little faster than the memoized algorithm, since arrays are a little faster than dictionaries, but we still haven’t got rid of the need for the lookup table. However, if we look at this algorithm, it becomes clear that to fill each cell, we only need to look at the two cells before it. In fact, we can forget the values in every cell before those two, since we’ll never need them again. To save using that space, we can just reuse the same three cells over and over again. In fact, since we only have three cells, we can just use variables instead of creating a table:

def F(n):
    a = 1
    b = 1
    c = 1
    for i in range(2, n):
        c = a + b
        a = b
        b = c
    return c

Mission accomplished. We have a Fibonacci number function that runs in linear time and constant space. Dynamic programming is clearly awesome. A Fibonacci number function is one of the simplest applications of dynamic programming. One could easily imagine coming up with this solution without going through this thought process. However, there are many more complex problems that dynamic programming can solve, without which the best solutions are incredibly difficult to find. Some applications of dynamic programming include RNA folding and seam carving (crazy content aware image resizing)

Testing Google’s PHP optimization tips

Saturday, June 27th, 2009

Recently, Google posted a series of articles titled “Let’s make the web faster“, including one by Google webmaster Eric Higgins about optimizing PHP.

Some of Google’s tips are obvious, like using caching or avoiding unnecessary SQL queries. A couple, however, seemed pretty weird to me, so I ran some benchmarks to check them out for myself. (See bottom of post for specs, and other info on how I ran the benchmarks)

  • Use echo instead of print: Higgins doesn’t explain why he thinks echo is faster than print. The PHP documentation links to an faq that explains the difference, and mentions that echo is marginally faster because it doesn’t set a return value, but there is no real difference in speed between the two.

    To test this, I ran two tests, using echo and then print to print a short string (5 characters) and then a long string (5000 characters) 1,000 times each.

    The Benchmark

    Using echo to print a short string 1,000 times: 1170 microseconds
    Using print to print a short string 1,000 times: 1180 microseconds

    Using echo to print a long string 1,000 times: 1163 microseconds
    Using print to print a long string 1,000 times: 1182 microseconds

    Verdict: For short strings, echo was slightly faster, and for long strings, it looks like print was slightly faster. For both cases, this was a difference of only about a couple hundred-millionths of a second per call, and likely well within normal variation between calls. I’d say use whichever one you feel like using, personally.

  • Use switch/case instead of if/else for loosely-typed comparisons:Apparently this has better performance, readability, and maintainability. I’ll bit on the readability and maintainability, but this article is about performance.

    I’m testing with code very similar to the example that Higgins provided for this one. There are four branches, each of which calls a function that does nothing but return, and each branch is equally likely.

    The Benchmark

    Using if/else, repeated 1,000 times: 422 microseconds
    Using switch/case, repeated 1,000 times: 425 microseconds

    Verdict: Once again, the difference is barely noticeable, and it looks like if/else was actually slightly faster on this test. I would say not to worry too much about it, though the switch/case definitely looks better from a code readability point of view.

  • Provide echo with many strings as arguments instead of using concatenation: In the video, Eric replaced each of the periods in an echo call with commas, so that the strings were passed as many arguments to echo instead of concatenating them before passing them in. This seems definitely weird to me, but it might work.

    For this one, I’m testing echo "Hello "."World!"."\n"; vs. echo "Hello ","World!","\n";

    The Benchmark

    Using concatenation, repeated 1,000 times: 1225 microseconds
    Using arguments, repeated 1,000 times: 3255 microseconds

    Ouch! No way is it faster to pass in multiple arguments to echo than concatenation them first! I notice that I have three arguments to echo and that took almost 3 times as long to run… let’s try it with 6 arguments. This time I’m testing echo "Hello "."World!"."\n"."Hello "."World!"."\n"; vs. echo "Hello ","World!","\n"." ","World!","\n";

    The Benchmark (2)

    Using concatenation, repeated 1,000 times: 1445 microseconds
    Using arguments, repeated 1,000 times: 5354 microseconds

    It’s not quite linear, but it’s definitely correlated.

    Verdict: There’s no question about it. Use concatenation! It’s better! Google doesn’t know everything!

The moral of this story… Google is smart, but they don’t know everything. Test things for yourself to make sure their results are the same as yours. It’s possible that Google is using a different version of PHP, running on different hardware, or something like that. Each computer is different and if these small optimizations matter to you, you have to try it for yourself to make sure.

Edit: The PHP devs and at least one other blogger have reached about the same conclusions.

Note: I ran these benchmarks on my Linode VPS, with 360MB RAM and 4×2.50GHz processors. I was using PHP 5.2.6-3ubuntu4.1 with Suhosin-Patch 0.9.6.2 from the command line, and the PHP Benchmark library. Each reported time was an average over 100 trials. Source code for my benchmarks is available here. You have to pipe stdout to /dev/null, since I’m using print and echo. Results are in stderr.

Writing a keygen: Hacker challenge

Wednesday, April 1st, 2009

I stumbled upon a hacker challenge posed by blogger Andy Sloane last night.  He explained how, as a hobby, likes to try to crack or create key generators for commercial programs.  Naturally, he never distributes these cracks, but is only doing it for the intellectual challenge.  Writing a key generator is usually more useful than cracking, since a good keygen typically allows you to receive future updates.  Often, it is also more interesting from an intellectual point of view, as writing a keygen usually requires some knowledge in reverse engineering and number theory.  Andy tries to demonstrate what he means by providing a simple key checker of his own for us to break.  Since I had a lighter workload last night, I figured I’d try it.

For his challenge to the world, he offers the source code of the key checker and three working keys.  The source is available on his blog.  (While a normal keygen writer would not have access to the source code of the key checker, it is usually simple, though tedious, to decompile and interpret the compiled binaries that would be released.)

After downloading and skimming the source, I spent a while trying to understand the custom written (and minimally commented) library he uses for large (125 bit, in this case) integers.  Rather than try to interpret the code itself, I used gdb to inspect the way the library worked, and had soon figured out the inner workings of the key checker:

Each key that it received, it would convert into an integer.  Each key had a unique integer k (with some exceptions: O, Q, and 0; I and 1; and Z and 2 were considered to be interchangeable to lower the change of an misread key) to which it was decoded.

Two magic numbers n = 21967608700894359473408781171922272451 and d=4444901153489723031681183441752517327 were hard-coded into the program.  A new number mkd mod n was created.  The key was accepted if m ≡ 12345678 mod c, where c = n/232⌋ (rounded down to the nearest integer).

This was clearly a problem for number theory! This algorithm looked suspiciously similar to the RSA encryption algorithm, which similarly employs two large numbers with special properties.

A quick summary of the steps behind RSA encryption:

  1. Pick two large primes p and q.
  2. Compute n = pq.
  3. Compute the totient of n, φ(n) = (p-1)(q-1).
  4. Choose an integer e such that e is relatively prime to and less than φ(n).
  5. Compute the integer d such that ed ≡ 1 mod φ(n).
  6. For your message m, encrypt it by generating ciphertext c ≡ me mod n. e is known as the public key and is used to encrypt the message.
  7. The person who receives the ciphertext by using d and applying the formula m = cd mod n.

This works because of Euler’s theorem, which states that aφ(n) ≡ a mod n for any integers a and n.

From that, cd =(me)d=med= m1+kφ(n) m mod n. (Check out the explanation of RSA on wikipedia for a more in depth explanation.

I quickly used Mathematica to factor n to see if it was the product of two primes. It was!  I checked that d was relatively prime to φ(n) to be sure.  It also was. The only part where this differed from RSA was the final step where it checked to see if m ≡ 12345678 mod c before accepting the key.

I was a little thrown off by c being related to n, but I quickly realized that it didn’t matter what c was.  It was just there to make sure that more than one m (and therefore more than one key) was accepted.  Effectively, what this key checker did was to take RSA generated ciphertext as a key and check to see if it decryped to a message that left a remainder of 12345678 when divided by c.

Therefore, to generate a key, I needed to create a correct message and encrypt it, then convert the ciphertext back to the “key” format.  Creating a message is easy. m is an accepted message if m ≡ 12345678 mod c, so m = ac + 12345678, where a is any natural number.  To encrypt m, I need to find the private key e.  Since n is small enough to factor and ed ≡ 1 mod φ(n), it is easy to find e through modular arithmetic.

Voila! Now I have everything I need to write a keygen for Andy’s example key checker. In fact, here it is.  Now, clearly, I should go do my homework, having wasted several hours on this.

Replacing TwitterIM with my own bot.

Wednesday, January 7th, 2009
TwitterIM:  TwitterIM is under maintenance at the moment. Please check back later.

Twitter‘s AIM bot, TwitterIM, has been “under maintenance” for as long as I can remember.  The last time it worked for me was in January 2008, and I have been “checking back later” periodically since.  Personally, I do not have high hopes for it working any time soon.  I decided to just take advantage of Twitter and AIM’s APIs and write my own bot to update my twitter through AIM.

I wrote this in about 10 minutes, so its a really simple program.  Once it connects to the AIM server, it waits for messages from me, ignoring all other messages.  A message from me is immediately posted to Twitter through curl.  In this implementation, there is no check for a successful post, though that would not be particularly difficult to implement.

The source code is available here.  You need Perl, the Net::OSCAR module, and AIM and Twitter accounts, naturally.  Just fill in your credentials at the top of program and run it with Perl.

Ruby Blackjack

Tuesday, January 6th, 2009

“Hm, I should probably learn Ruby.”

I probably first said this about a year and a half ago when Ruby and Ruby on Rails were “in”, the cool new (relatively, anyways, Ruby was written in 1995) language and the easy to use web application framework that went along with it.  In the year and a half since then, I’ve glanced at Ruby and Rails, not looking at it close enough to discover more than that it resembled Perl and possibly Python.

Well, as part of an internship interview, I’ve finally gotten a reason to actually sit down, read the Ruby documentation, and understand how this language works.  I am almost done writing a command line blackjack implementation in Ruby.  Here are my thoughts:

Object-oriented. I really like the way Ruby strongly object oriented – everything is an object, including integers and booleans, which most other languages designate as primitives.  This preemptively takes care of the limitations in designating these types as primitives, which often lead to wrapper classes such as Java’s Boolean and Integer.

Open classes. Ruby implements open classes, which allows you to define or redefine the behavior of any class at any time.  This way, I can easily implement: a predicate in the String class that checks if a string is empty or nil, a foldr method in the Array class, or anything else that I need.  In contrast, in Java, implementing these functions would require creating another StringUtils or ArrayUtils class.

Powerful. I don’t know if powerful is the right word.  Overall, I just feel like I would be much more comfortable writing complicated/long programs in Ruby, over Python or Perl.  Normally, I would stick to Java or C++ for projects with a larger scope.  Ruby is really nice, except for…

End? Syntax complaint.  I really prefer using brackets {} to denote a block of code, such as the interior of a loop or a function.  Maybe I’m just not used to the language yet, but I feel that it is laid out so that it is difficult to understand and read code by simply scanning through it.  The problem is made worse by the fact that with brackets, I can easily use vim or TextMate to find the corresponding bracket, whereas “if … end” doesn’t lend itself quite as well to that kind of searching.

I hate blackjack. What a terrible game.  Blackjack is a mishmash of rules and variants that make an elegant implementation of the game difficult.  It is a simple, boring game that casinos keep around because it is considered a beginners game, and so is played by novices who pay little to no attention to strategy.  Provided a player plays correctly, the game is relatively even, with a house advantage of less than 0.5% for more common variants.  Generally, blackjack is just not worth playing. Stick with poker.

In summary, blackjack sucks, Ruby is pretty cool.

All your info are belong to Facebook (and Google)?

Thursday, December 4th, 2008

“But what about interesting things that you do on other sites? Maybe you commented on a blog or dugg an article on Digg. Maybe you put up something for auction on eBay, or found a video you love and wanted to share it. Wouldn’t it be nice if when you were on another site, you could easily find your friends on that site and see what they are doing there, just like you can on Facebook?” — Facebook Connect

In what almost seems like a synchronized release, Facebook Connect and Google Friend Connect have been unveiled today by the largest social network and the largest search engine in the country.  At the very least, these two new technologies will be another general Web sign-on protocol, most likely replacing OpenID, which had a lackluster reception at best and will likely effectively disappear with these two giants as competition.  From a most pessimistic point of view, these two web giants who, between them, already have a ridiculous amount of information about the average user, has yet another means of data collection.

It does seem like Facebook has learned from Beacon, and has made Facebook Connect an opt-in service rather than an opt-out service, hopefully avoiding the privacy scandals that led Facebook to quickly pull Beacon off the site.  Still, I expect to hear stories of people accidentally publishing news about things that they don’t want known, though hopefully not to the same degree as the story about the man who bought an engagement ring on Overstock.com and got it published to his wall.

Perhaps these two tools can fulfill their promise of easy inter-website connectivity without bringing on the anti-privacy apocalypse.  I’ll keep my fingers crossed… while installing Facebook Connect onto my website.