Navigate Relative URLs in FireFox

For a long time I’ve been wondering how I can navigate around the web using relative URLs (similar to using cd in Linux). Using various content management systems, it’s a real pain to have to select the relevant portion of the URL, and type in wp-admin, admin, admincp, etc. Even on some sites, being able to go up a level is very handy when the navigation is lacking. All without a mouse.

Ambitious, right? After searching for a while, I couldn’t find anything like this out there. Just today I decided to take another stab at it, and actually came up with an elegant solution. No plugins or hacky measures required.

Create a new bookmark and set the keyword as something short that you would like to use. I used cd. The way smart bookmarks, or keywords, work is you enter keyword everything else into the address bar, and everything following the keyword gets put into the %s placeholder from the bookmark URL.

To navigate to a relative path, we’d use the following bookmark URL: javascript:window.location=”%s”

Now, whenever you type something like cd .. or cd /admin into your address bar, you’ll navigate to the correct relative path. If you’re super lazy like me, you can use Ctrl+L to bring the address bar into focus.

It’s all about the small victories.

vBulletin Large Inline Images “Exploit”

Disclaimer: this does not affect newer versions of vBulletin! Only old versions are affected, and only if you have no maximum character limit in posts.

Our forum brand, ForumOps works on a lot of vBulletin forums, primarily running on 3.8.x. We ran into quite an annoying “exploit” (perhaps not technically, but it’s caused us problems) that older communities should be interested in.

Whenever you copy/paste an image into the editor, it gets converted into an inline image. This makes people say “wow!” because it’s so much more convenient than using the attachment system or than using a 3rd party image uploading service. vBulletin’s old WYSIWYG editor uses inline images, which get inserted directly into the post with base64 encoding. There are a few major concerns here…

However, in order for this to cause problems, you need to have the post character limit disabled. This is quite common, since without a technical explanation, why would you want to limit the post length? Some people post long articles and are frustrated by it. Usually, the inline image is harmless… people who copy/paste rich content typically only paste in small images or icons, which jut work.

If you post a large image, and have the character limit disabled, it can cause a myriad of problems.

Database – Let’s say an average post is 1000 characters. That is approximately 1000 bytes, or roughly 1 kilobyte (assuming they are standard English characters). Once you paste an image in, that post could now be 1MB, or even 10MB (depending on multiple PHP settings). 1MB is a fairly reasonable image size that we see quite often. With a single image embedded in the post, that post is now takes up 1000x space. If you had a few images, or a larger image, it could be up to 100,000x larger! People tend to see performance problems once they get a few hundred thousand posts, and especially once they get into the millions. If you have a few posts, you will be hitting those limits a lot faster than you’d expect. This can create some obvious database issues with performance and maintainability. This may even cause your post table to crash and give you a ton of hard to debug problems. A DBA will notice the table is huge, but will just assume it’s because of all the post content.

vBulletin Code – The biggest problem we ran into was with the BB code parser. Instead of it having to parse 1kb of text, it now has ~1000x more text to process, which is brutally slow. This makes viewing that broken thread impossible since it will likely time out, or just hang forever (long enough for people to give up, anyway). If it’s buried in a popular thread, it may even be near impossible to remove, since it’s hard to delete posts without actually going into the specific thread.

Now, our client was lucky since the problem was ran into on accident. A user had posted a 5MB image, and we received complaints about certain parts of the site being broken (BB Code parser showing new posts, and in a custom moderator section). A malicious user could easily use this to bring the site down rapidly, giving the moderators little to no chance of fighting back.

The Solution – If you insist on disabling your post limit, please consider setting it high (even very high; say 500,000) since users could still post a huge post and cause issues. Images are a little trickier since it’s unintentional. However, people are set in their ways, so lets at least try to mitigate the root problem…

If you create a new plugin using the “newpost_process” hook, and use the following code, it should prevent attacks.

if (strpos($post['message'], 'data:image') !== false and strpos($post['message'], 'base64') !== false) {
    $maxSize   = 1024*500;
    $totalSize = 0;
    $matches   = null;
 
    preg_match_all(
    	'#data\:image/(gif|png|jpg|jpeg);base64,([0-9a-zA-Z\+\/\=]+)\[/(img|IMG)\]#', 
        $post['message'], 
        $matches,
        PREG_SET_ORDER
    );
 
    foreach ($matches as $match) {
        $totalSize += strlen($match[2]);
    }
 
    if ($totalSize > $maxSize) {
        $dataman->errors[] = sprintf(
            trim("
            	You've copy-pasted some images in your post which are %s too large.
            	Please remove them and add them as attachments instead.
            "),
    	    vb_number_format(strlen($post['message']) - $maxSize, 0, true)
        );
    }
}

This checks for the existence of inline images within a post, and can limit them to a specific file size. In the above code, I have it set at 500kb, which is probably a little high. Feel free to tweak the error message the user will receive.

I have no idea which versions this affects. I know it works on 3.x, and perhaps some of the earlier 4.x versions before the editor was swapped out. If anyone can verify it for specific versions, I’ll be sure to mend my post.

Hope this helps someone out there!

Write Code for Humans

After working on high-performance projects for a while (or aspiring to), it gets easy to build things in such a way that favors performance over readability. It’s good to be reminded every once and a while that while we develop code for computers to execute, ultimately it’ll be humans who will work with them.

Mobile web applications are an interesting example. A large portion of the code is downloaded and executed by remote clients, so we tend to pay more attention to total file-size, number of HTTP requests, and ultimately how much code gets executed on the other end. If we are graded on our work, surely this is what will be looked at. There are benchmarking tools such as Yahoo YSlow and Google Page Speed which help you analyze your site and suggest techniques to apply. Naturally, everyone wants an A or a high rating.

About a week ago, I was looking over one of my projects, and wondered what the hell happened to the code; it was awful. White-space was missing in a lot of areas, CSS was condensed (in most places), and even some of the PHP caching code was pretty hackish. Since I was having to rebuild a large component of the project for other reasons, I made an effort to clean up these concerning areas.

If you can optimize your architecture or design as you go, then by all means do it. What you shouldn’t do; however, is try and remove things that affect readability in favor of performance. I think there are two acceptable ways to do this… as part of a build/deployment script, or as a filter in production. Your development copy should always be for humans. Developer time is often significantly more expensive than server resources. Let’s look at some of the various resources we need to deal with…

CSS / JavaScript

Tools like minify are awesome for this. Simply pass your external CSS/JS files as a query parameter, and it generates a combined and minified version of the resources so the client only downloads one file. You can also use this or similar tools for inline CSS/JavaScript, though those can usually be avoided unless you’re dealing with truly dynamically generated code.

If you’re using Zend Framework, check out these view helpers to use Minify with Zend_View_Helper_HeadScript and Zend_View_Helper_HeadLink. Similarly, if you’re using Symfony2, check out the builtin component, assetic.

Whitespace / HTML File Size

Make your templates as readable as possible. There are two ideal choices to deal with this: gzip/deflate compression or a simple filter that removes extraneous white-space. Here’s an example of how you could set up compression. Add any of the following lines to your

<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
</IfModule>

Each particular line would enable deflate compression for the associated mime type. You can usually expect around 2/3 to 3/4 of your filesize to disappear when compressed. The other approach, which was a filter, would look like this:

$response = trim(preg_replace('/>\s+</', '><', $response));

This is blatantly taken from Twig‘s spaceless filter. Note, there isn’t much point in using both of these techniques. It’s best to benchmark both (resulting file-size, server compression time, client decompression time) and decide for yourself.

Images

Although not in scope of this post, you can do similar processing to images. There are command-line tools available to compress images, or remove meta data from them to help eliminate a few percent of file-size. This is definitely not the low hanging fruit, but interesting nonetheless. Don’t spend too much time optimizing specific images when it’s possible to run batch processing across all of your images at deploy or update time.

Do yourself and your peers a favor, write code with them in mind.

Thinking Asynchronously in PHP

PHP is an excellent language for developing on the web. It’s portable, versatile and has matured a lot over the past few years. Web applications are becoming more complex each day: they are processing massive amounts of data, talking to other services, and are expected to do so quickly. They get to a point where processing is either too server intensive, or it may be making your site less responsive to users if its doing too much. If you take a step back and think about PHP applications asynchronously, you can build faster, more scalable applications.

Web Application Flow

A typical web application’s job is to turn a request into a response. In order for this to appear responsive, applications should generate that response as quickly as possible – I usually find anything under half a second acceptable. If we were to build the application asynchronously, we’d delegate any complex logic, service calls, or other expensive tasks. These tasks can usually be broken down into two categories…

Preparing Data For This Page

If a page requires us to perform complex queries, pull data from web services, etc. then that will likely add considerable delays to page generation. The usual response is caching, but people tend to naturally still use it in a synchronous fashion. Let’s take a twitter feed as an example (showing latest 10 tweets).

You’ll see code like this quite often:

public function getTweets($username) {
    if ($tweets = $this->_cache->get("tweets_$username")) {
        return $tweets;
    }
 
    $tweets = $this->_twitter->get($username);
    $this->_cache->set("tweets_$username", $tweets);
 
    return $tweets;
}

It checks to see if our cache entry exists, and returns it if it does. If not, it calls the routine that fetches the tweets from Twitter, and caches that result (assuming our default cache expiration). Let’s say this is 30 minutes. The problem with this approach is every 30 minutes, it needs to download the information again, making the user wait while the page loads. The other major problem is that when Twitter is down (or for whatever reason, we are unable to connect), the page is broken, because the cache has expired.

A much better approach is to never even attempt to do this work in the main application flow. Instead, let’s have a cron job run every 30 minutes that downloads this information and caches it with no expiration. This way, even if Twitter is down, the cached copy is always expected to work. You could still use the above code as a safeguard, but with this approach it should never have to be used as a fallback. You’d only want to ever update the cache if you successfully retrieve your information.

This model works with almost any data you can cache. You’ll need to decide if you’d rather only perform these tasks on demand (when a user visits a page), or if you want to do a little extra work to ensure the data is always available. Depending on the traffic of the site, or the amount of unique cacheable items you are working with, you’ll have to weigh the benefits.

Processing Needed Later

The other category usually comes from requests that actually need some sort of action performed. One concept we’re going to look at which really helps scale your applications is a work queue or job queue. With a queue in place, we can defer execution of actions so we can quickly complete our request to response cycle. Just because something is delayed, it doesn’t mean there will be a lapse in our accurate data. By simply delaying something for 1, or even 0 seconds, it can be run immediately, but from another process.

The way this works is we need to have a few different processes in place:

Job Queue
Storage of tasks to be executed. Example: print spooler
Job Worker
A process that reads tasks from the queue, and executes them. Workers can be spread across multiple physical machines, if necessary.
Client
A client would be our application, which adds items to the job queue. Similar to workers, we could have multiple applications using the same queue.

Let’s look at some examples…

Sending Mail

Using PHP to send mail is a relatively simple task. If your server can directly send mail, then it is probably very quick, too. A lot of servers will use some sort of mail queue internally to ensure mail gets sent at a steady pace and doesn’t overload the server. However, if you are sending mail via remote web servers (for example, using GMail), then it will take a lot longer, especially if you are sending lots of mail.

Instead of making the user wait a few extra seconds, we can instead toss the mail into a queue (which should be very quick) and not worry about it. Then, one of our mail queue workers can process it as soon as they are available. Depending on your set up, this could be instant, or it could be very delayed. It’s up to you to decide how important the timing of your mail delivery is.

Rebuilding Stats, Caches, etc.

Another common scenario is performing clean-up processing after making changes to your data. Delaying execution until another process can pick it up is usually harmless to the user, and gets them to their next task that much faster.

Different Types of Queues

There are two very common implementations or approaches you can take.

Running as a Daemon

If you need to minimize time between queuing and execution, or more importantly, between scheduled execution and actual execution, you should look at something like Gearman or beanstalkd . These applications run as daemon and can immediately process queued tasks as they are ready to be executed. However, since it requires software to be running, it may not be available or possible in some hosting environments, and takes a little more work to set up. I’d argue, for the accuracy, it will faster than the alternatives since it is not having to continually poll to get new data.

Running Scheduled Tasks

If you are limited, a simple option is to set up a scheduled task for your worker. It should be a simple PHP script that connects to your queue (you could store it in the database), and processes items as they are available. If you want things to run as quickly as possible, you can make your cron job run more frequently, however this may also burden your system depending how frequently you poll, and how intensive your scripts are.

I’m going to follow this post up with some examples of implementing this with both a database-based queue, running as a scheduled task, and also with either gearman or beanstalkd.

Community in Software

I’m of the opinion that software without community fails.

When I’m looking at software to use, whether it’s development tools, content management systems, or other applications, there is almost always a direct correlation between quality of product and size of community. When I talk about “product”, I don’t necessarily mean the actual downloaded package or even the code.

I’ll admit, I’m usually quick to point out poorly designed software. As a developer, I hate WordPress with a passion, but because it has such a strong community, the ecosystem around it is incredible. I can get by simply by installing plugins and themes, and hopefully not ever have to touch any code. If I get stuck, with anything, there is so much community, support and documentation on the web that it’s a very positive and usually fruitful experience.

This past week I’ve been working with another seemingly popular CMS. I thought it would be easy to work through whatever issues I had, because I’ve heard of it before, and you know, if it’s anything like WordPress I’ll have no problem… do a quick search, find a plugin, configure, profit. I very quickly found a screencast by one of the few (only?) companies supporting the product. It was outdated, and on the version I was working with, did not work correctly. I immediately asked on Twitter and my extended network. Nothing.

For examples sake, let’s say this product was technically far superior to WordPress. It doesn’t matter. No amount of technical superiority will ever amount to anything if it has no strong community to back it.

Since the rise of social networks, forums, GitHub || SouceForge (they need some love, too), the global software community has become a lot more united. It’s not just the geeks on mailing lists and IRC anymore. If you’re developing software for the public, embrace community. Community wins.

Facebook Page Like Redirect

We encountered some pretty nasty Facebook behavior late last night during a launch. Usually when you “Like” a page, the iframe application refreshes normally. However, if your application is the page’s default tab, the on “Like” redirect goes back to the wall. Apparently this is by design. There is a bug report filed, which states

“Once a user Likes a page, by design, the landing page for future visits will be the Wall tab. Thanks for the report.”

This is not what you’d expect – since it only appears to happen when the application is the default tab. I’d expect it would always redirect back to the tab you had open. They can still satisfy their above quote this way.

We couldn’t come up with any fixes, but found a few workarounds:

  • Don’t use it as the default tab
  • Point your marketing material to the actual tab and not the generic page URL

I also came up with a potential solution the next morning, which should work, though not tested:

  • Add some JS code (pseudocode): if window.top.href == ‘generic page URL’, set window.top.href = ‘page URL with specified application tab’

Hope this helps someone!

Local Facebook Development

I’ve started doing a little Facebook application development again lately, so I’ve been on the prowl on how to do things more efficiently. Working local is efficient (we all know that by now, I hope)…

Facebook has always been a bit of a pain with things like FBML, but now that page tabs support iframes, I thought I’d revisit it a little. It turns out that you can simply use your localhost (or any private) address, and the iframe will load. However, there are some major problems in the authentication process with this method. I moved on, and continued to do things the old fashioned way with a remote development server, and set up all my application settings accordingly.

However, since Facebook simply needs to check the URL, and expect valid responses once and a while, you can edit your local hosts file and map the remote hostname to 127.0.0.1. This way, when Facebook performs its checks, the remote server responds properly, but you can still rapidly develop locally with the iframe pointing to your local development server.

Example, assuming our development server is set up at dev.somefbapp.mycompany.com, we’d configure the app with that (public) address. Then, when we want to test things more rapidly, we can enable this in our hosts file:

127.0.0.1    dev.somefbapp.mycompany.com

This seems to be working flawlessly for me as of today (July 28th, 2011) and on PHP SDK 3.2. Hopefully this brings a little light to someones day who gets as frustrated as I do with Facebook’s documentation.

Using Zend_Db With Silex

I’ve been messing around with Silex (and of course Symfony2) a lot lately, and I was excited to see a Doctrine extension. However, it was only for the DBAL and not the ORM. In my symfony2 apps, I’ve finally adjusted to using Doctrine’s ORM, but for some reason, I’ve really grown to like the DBAL that Zend_Db offers over the pseudo SQL (DQL) that Doctrine has. Here is a quick Extension to get you started with Zend_Db.

<?php
 
namespace SyndExtension;
 
use SilexApplication;
use SilexExtensionInterface;
 
class ZendDb implements ExtensionInterface
{
    public function register(Application $app)
    {
        $app['autoloader']->registerPrefix('Zend_', $app['zend.class_path']);
 
        $app['db'] = $app->share(function() use ($app) {
            return Zend_Db::factory($app['db.adapter'], array(
                'host'     => $app['db.host'],
                'dbname'   => $app['db.dbname'],
                'username' => $app['db.username'],
                'password' => $app['db.password']
            ));
        });
    }
}

Create that under ./src/Synd/Extension/ZendDb.php. To use it, you’ll need to add Synd to your autoloader, and then register the extension in your application:

$app['autoloader']->registerNamespace('Synd', 'src');
 
$app->register(new SyndExtensionZendDb(), array(
    'db.adapter'      => 'mysqli',    
    'db.dbname'       => 'your_db',
    'db.host'         => 'localhost',
    'db.username'     => 'root',
    'db.password'     => '',
    'zend.class_path' => '/path/to/zf/library'
));

Please note that this is a quick and dirty bridge, and is for ZF1 not ZF2. If I’m breaking any best practices with naming and combining libraries, be sure to let me know. I’m hoping to explore the Zend_Db and Zend_Form combination since I use them heavily in my apps. This provides a very lightweight infrastructure in order to throw up quick proof of concept applications.

Hopefully this helps someone… or at least gives enough familiarity for my ZF buddies to start looking at Silex / Symfony2.

Reflection on Separation of Concerns

Working with larger projects a lot, I sometimes forget some of the earlier decisions I made to get here. Biting the bullet with Zend Framework 3 years ago made some of those decisions subconscious (or seem obvious), but after a little reflection, I am thankful. This post really has nothing to do with ZF, but it does relate to the idea of frameworks a lot. I’ve decided I need to start looking at what other developers are doing more – outside of the few dozen that I actively follow on Twitter. I need to go back, re-evaluate some of those decisions, and hopefully share some knowledge along the way.

After the last #LeanCoffeeKL event, I was brought into a discussion with a fellow developer who could not get mail setup locally. Another guy chipped in with some useful advice. I instantly thought in my head, “who cares? don’t send mail locally”. After thinking how obvious that was for a few minutes, I remember setting up my local dev server ages ago, and being pissed off that I couldn’t get mail working properly. All of the blog posts would explain how to set it up through your ISP. Some, smarter, would show you how to send it through SMTP (or through Gmail, etc.).

A big mindset change for me was when I discovered that I could code for different environments. Having distinct development, staging and production environments made configuration a lot easier. It also made dealing with external tools a lot easier. So, how did that help me with that pesky mail problem?

Mail is a service. My application code does not care how mail works, it just cares that it can call the mail service. If I’m testing things, I don’t want to actually send mail. That can have some very scary unintended problems. I do, however, want to make sure that the generated emails are correct. By abstracting what the mail service does, you can create a basic interface for that service that my application can call. My configuration (per environment) can determine what mail implementation I need.

For example, on development, I capture email to the database. On staging and production, I capture email to the email queue (in the database). However, there is no cron job set up on staging to actually send the mail. However, I can when I need to.

A lot of these problems come up all the time, but unless you know the right questions, it’s pretty hard to navigate forward. I’m hoping to start exploring these topics a lot more in the next few weeks on here.

Fixing a Broken Eclipse PDT Index

One my secondary machine, I noticed that Eclipse’s Open Type / Open Method dialog stopped working (not populating). I tried several things, such as rebuilding the index / projects, restarting eclipse, and several solutions from online. I actually couldn’t find anyone with this problem for Eclipse PDT (PHP edition), but some solutions for Java.

Here are the usual listed solutions:
-Restart Eclipse
-Clean or build broken projects
-Delete files from workspace (Java)

I finally got this fixed today after comparing both of my Eclipse directories (working and non). Shut down eclipse, and navigate to your Eclipse workspace directory. Delete the workspace/.metadata/.plugins/org.eclipse.dltk.core.index.sql.h2 directory, and load up eclipse. I noticed mine had grown to 6GB which is probably why it crapped out. My other installation’s directory was about 250mb by comparison.

Hopefully this helps someone else! Auto-complete, open type, etc. is a god-send when working with large projects.