+2

Free as in money, not as in pain - InstantSpot moves to ColdFusion 8

ColdFusion

As I mentioned in several previous blog entries, on January 12 we re-launched InstantSpot after a complete bottom-to-top rewrite. In addition to a completely new code base, we took the unlikely choice of using Railo as our CFML processing engine.

"Why?" you ask? (You aren't the first)

The reasons were several, and I will detail a few of the key points that went into our decision.

  • It's free - InstantSpot is basically a small project of big ideas by two developers doing this out of our own pockets and - how can I put this delicately? - we are poor and cheap! Unfortunately, despite how much we will it to be so, InstantSpot has not made us bazillions of dollars (at least as of the time of this posting). From the beginning we have made an effort not to make InstantSpot a financial burden on our families, as they are already paying dearly in the time we spend tied up in code till all hours of the night, and we like to cut financial corners everywhere we can. A free CFML processing engine? That is an obvious avenue to at least explore.
  • It's fast - No lie... Railo is fast. From our very first development and tests, it just seemed to blow other engines away in the speed that it processed code. This was backed up test after test. Not only does it shine in the speed of processing code, but it also has a tiny footprint on the server. In our environment running it as a Tomcat application, you almost wouldn't even guess it was there.
  • It's CFMX 7 compatible - To us this meant that we didn't have to code anything differently simply because we chose to use Railo over ColdFusion. We had no issues whatsoever using our normal data model patterns we use in any other application, and we used BER releases of ColdSpring and Mach-II without the slightest hiccup. Eventually we found a couple of small places where we had to make a workaround (3 I can think of), but they were without question edge cases, and there were easy workarounds that didn't feel as though we were compromising the application.
  • It's the underdog - If you were to poll the ColdFusion community at large, you would find that many people don't even know there are any other choices besides Adobe ColdFusion, and many of those may have only heard of New Atlanta's Blue Dragon. Railo hardly gets a mention in most circles. We thought it might be fun to be an advocate by example and help promote what we felt is a great alternative choice. Additionally, Aaron and I have a tendency to choose the road less traveled, but this certainly fit our m.o.


Sounds reasonable right? Since we made that choice around June of 07 and started moving forward with the rewrite, we have felt overwhelmingly positive about our decision.

All of that began to change about 1:00am January 13.

After making the DNS changes and as we started seeing traffic redirect to the new server, we started seeing absolutely inexplicable errors. The closer that we examined them, it became obvious that we had some *serious* threading issues in our application. We are extremely careful in this regard when it comes to our code, so this was very surprising. However, this *was* brand new code, and of course there could have been a hole somewhere right?

As more traffic started coming in, the errors escalated. We started seeing errors at least every minute, each of which generated a painful new email to both Aaron and me. It became clear quite rapidly that the errors actually had nothing to do with the code. We started seeing errors from both Mach-II and ColdSpring that just simply couldn't happen. For instance , here is one we started seeing from ColdSpring:

Message	variable [beandefinition] doesnt exist
Tag Context	/www/instantspot/www/coldspring/beans/AbstractBeanFactory.cfc (211)

Really? That is pretty interesting since line 210 is:

<cfset var beanDefinition = getBeanDefinition(arguments.beanName) />

And how about this one from Mach-II?

Message	variable [nextevent] doesnt exist	
Tag Context /www/instantspot/www/MachII/framework/RequestHandler.cfc (115)

Oh yeah? Well... this is line 114:

<cfset nextEvent = appManager.getEventManager().createEvent(result.moduleName, result.eventName, eventArgs, result.eventName, result.moduleName) />

Clearly even our worst var-scoping misstep couldn't have created those errors, and furthermore, these are well-tested frameworks used in hundreds if not thousands of applications. If these threading errors existed in them, Aaron and I would not be the ones discovering it in January 2008. We were also seeing some of our own objects that were attempting to call methods of other objects, and obvious sign of serious threading issues. In two instances, a person's RSS feed actually contained someone else's content.

We began to wonder if Railo even recognized var-scoping at all? I pulled up an old blog entry that I had made in which I wrote some code examples that showed an easy example of a var-scoping error and ran it against Railo. I set up a Railo scribble pad, and ran the test. It did pass that test, which tells me that Railo at least manages var scoping on a cursory level. However, under the load of our application it appeared that we were looking at something bigger than just var-scoping a few object methods.

At this point, there was no longer any question that in order to get out of this tailspin we needed to do something drastic. We quickly decided that the most logical step was to switch to Adobe ColdFusion 8. The cheap-gene which is so deeply embedded in our DNA had to be thrown out the window, and we had to act by getting a ColdFusion 8 license and getting it implemented asap. One immediate concern that came to mind was how much we had modified the Railo WEB-INF in order to do some of the URL handling that we had implemented as we not only use mod_rewrite in Apache, but we also use another Java application in the mix as well. After installing ColdFusion 8 Standard and digging into the /wwwroot/WEB-INF, we found that we could painlessly apply the same pieces to our ColdFusion application, and with some very small changes, we had InstantSpot running in our development environments.

After doing some heavy but rapid testing throughout all of our application, we felt that we could make the switch. Even if an error or two was discovered later, the benefits would strongly outweigh the utter nonsense we were dealing with at that time. So around midnight last Wednesday night, we pushed up the ColdFusion implementation of our application, crossed our fingers, held our breath, flipped the switch, and......

..... silence.

After the application initialized, suddenly there was peace... no errors... no emails... just an application purring along as it was intended. In fact, we had to push up a test template with broken code to ensure that our error notification was still working. Since that time we have not seen a single error occur in our application with well over 100K page requests since the move.

I want to be clear that this post is not meant to be an attack on Railo. I am sure that Gert and crew work extremely hard and I tend to believe that Railo will mature to a nice alternative if they keep up their efforts that they have shown to date. However I do hope that this post serves as a warning as we found that there are huge implications with using it as it stands today.

alynch said:
 
cat = out of the bag
 
posted 648 days ago
Add Comment Reply to: this comment OR this thread
 
Michael Sharman said:
 
Very interesting Dave, I'd love to see Railo get a copy of the codebase (or a subset perhaps) and run some diagnostics to determine the actual cause.

I must admit I was waiting patiently to hear how you and Aaron went with this, and I agree that Railo is (will?) become a great alternative competitor to Adobe's ColdFusion Server.
 
posted 648 days ago
View Replies (1) || Add Comment Reply to: this comment OR this thread
 
.: HIDE REPLIES :.
alynch said:
 
Michael, I'm with you...I was eagerly awaiting our chance to proclaim victory with an alternativce CFML processor...(dangit)

One of the "great" things about this issue, is that these errors were being generated within the Mach II framework, which is obviously freely available for testing.

If the Railo guys ask, we'd be glad to share with them how we configured the stack Apache/Tomcat/Railo.

IMO those things alone should allow for a really good test bed for them.
 
posted 648 days ago
Add Comment Reply to: this comment OR this thread
 
Michael Sharman said:
 
Did you guys do any load/stress testing on the server (not specifically checking code)?

I wonder if that would have picked anything up.
 
posted 648 days ago
View Replies (1) || Add Comment Reply to: this comment OR this thread
 
.: HIDE REPLIES :.
alynch said:
 
We did some cursory load testing with grinder.

Nothing notable was found other than InstantSpot version 2 was going to handle traffic much more gracefully than the first version. Definitely nothing to indicate we would experience these type of threading issues.
 
posted 648 days ago
Add Comment Reply to: this comment OR this thread
 
 
I am a little late replying, but I echo second what Aaron said. First, we really wanted it to be a success, and were truly disappointed in the results. Secondly, we did set The Grinder up against it for some load testing, but our primary goal at the time was watching processors and memory. In fact at the time I believe we had the app in development mode which was outputting errors rather than saving/notifying. In hind sight, that was an obvious area we could (should) have focused more attention to.

Regarding the code, some errors were happening at very early stages of the request in the Mach-II app before our code even came into play, and were happening pretty much everywhere throughout the application. I don't believe specific pieces of code were a factor. Perhaps it is related to the way that we implemented Railo itself as a Tomcat app using mod_jk to connect to Apache. As Aaron said, we would be more than happy to share that information.
 
posted 648 days ago
View Replies (1) || Add Comment Reply to: this comment OR this thread
 
.: HIDE REPLIES :.
Jeff said:
 
I too would like to see this passed to the Railo staff to see if they can fix the problem. I am using Railo all the time for my home development, and other projects, and I hope we get an answer on this soon.

This seems surprising since they already have several large, high traffic sites running Railo in Europe, and I hope this can be addressed.

 
posted 646 days ago
Add Comment Reply to: this comment OR this thread
 
Gert Franz said:
 
Hi guys,

first of all thanks, Dave that you have used Railo for your application even though you have experienced this dissapointment in the last couple of weeks.
I do not want to look for excuses. Everybody always tries to excuse own errors. But we need to find the errors which seem to be generated by the local scoping of variables. That's the most important thing. Could you provide a part of your codebase which generates those unexplicable errors? We definitely want to fix them and help other servers not to run into the same issues.
It is really hard for us to find not reported errors.
And Dave, you know we alway are there for help if there are some problems with compatibility or (in this case) stability.
I am pretty sure we can find the error if there is some way to reproduce it by using a certain piece of code.

Thanks a lot for your help

Gert

Railo Technologies GmbH
gert.franz@railo.ch
www.railo.ch

 
posted 646 days ago
View Replies (5) || Add Comment Reply to: this comment OR this thread
 
.: HIDE REPLIES :.
Michael Streit said:
 
we have made some stress tests with the local scope, but we can't reproduce that bug.
we are relativly shure that the problem is not the local scope, the problem is (we think) that the following call returns null:

because "createEvent" returns null, nextEvent is populated with null (when you set a variable in CF with null, the variable does not exist).
whitch version of railo do you use?
in version 2.0.0.018 we have fixed a bug that can be the key for this problem (- fixed bug in script return "in a UDF that contain more than 200 statements, script return are ignored")
 
posted 646 days ago
Add Comment Reply to: this comment OR this thread
 
 
Michael, is there any easier way to find the version other than re-setting it up in development and going into the Railo admin? Is that available in a file somewhere?
 
posted 645 days ago
Add Comment Reply to: this comment OR this thread
 
Gert said:
 
Dave, You have a version file inside the WEB-INF/railo folder. It should contain the version number like 2.0.0.030-final.
Besides this one you can check the server installation dir. Inside it you have the following directory:
railo-server/parches. Inside the patches directory you should see files named: 2.0.0.0x.railo. The lates one should let you know wich version you have (or I must say had :-).

Gert
 
posted 645 days ago
Add Comment Reply to: this comment OR this thread
 
 
Gert/Michael -

It the version file has the following: 2.0.0.026-final
 
posted 645 days ago
Add Comment Reply to: this comment OR this thread
 
Gert said:
 
Dave,

thanks for the info. Any chance for us to get a piece of code? We might have found something and are working hard on it, but we would like to test it with maybe your app as well.

Gert
 
posted 645 days ago
Add Comment Reply to: this comment OR this thread
 
Michael Streit said:
 
Hi Dave

After doing a lot of testing, we realized that local scope was not the cause of the problem. By stress testing the return value handling of UDFs we were able to find and fix the issues.


We've got an alpha version of the fix available, but a final, official fix will be available at the beginning of next week.

Feel free to contact us on any future issues - we'd be happy to help you work through any problems you might have with Railo.
 
posted 645 days ago
Add Comment Reply to: this comment OR this thread
 
alynch said:
 
Wow, great news...I hope that truly was the fix!

I've got to hand it to you guys (Gert, Michael)...what a classy response to this problem. If you can keep up this level of response to issues, I'd wager Railo is destined to become a real contender in the CFML arena.
 
posted 645 days ago
Add Comment Reply to: this comment OR this thread
 
 
My thoughts exactly Aaron. I can't imagine a more appropriate and earnest response to this problem, and I sincerely hope that this does indeed fix that issue. Nice work guys!
 
posted 645 days ago
View Replies (1) || Add Comment Reply to: this comment OR this thread
 
.: HIDE REPLIES :.
Gert said:
 
Hi Dave,

never got an answer to my questions regarding the fix. Could you recheck you mailbox? Or am I part of the spam folder :-)

Gert
 
posted 633 days ago
Add Comment Reply to: this comment OR this thread
 

Search