Posts

Interesting article about unikernels

I came across this interesting article about unikernels:

http://queue.acm.org/detail.cfm?id=2566628

It will be interesting to see how all of this change in the cloud shapes up in the future and how it will effect the traditional operating system Jobbing Software Engineer!

Send event to intercom.ie – example python code

So as mentioned in a previous post I have been doing some python software work to integrate a client’s could application with intercom.io. The intercom REST interface is nicely designed and quite straightforward to use.

Applications send events to their intercom account (with optional event data) when their users do interesting things, intercom can be configured with rules so that messages are automatically sent in response to a particular combination of events.

It is a nice idea that shows great potential, however the intercom rules engine is quite limited and probably needs quite a bit of work before it is really useful, for example, at the time of writing, it is not possible to include any of the event data as part of the decision process of a rule.

Anyway to post an event to intercom.io you do an HTTP POST to /events passing some data, including: the event name, the user’s email address a time-stamp. You can also send optional data that will be stored with the event (but which can’t bu used as part of a messaging rule).

The following code defines a function called raise_event() which can be used like this:

This raises an event named ‘project_created’ with some extra data: project id and project name.

Of course this is all very interesting but for a real python API wrapper you should probably checkout python-intercom here:

https://pypi.python.org/pypi/python-intercom

Integrating Intercom to cloud App keeps me out of trouble… (ARM NEON)

I have been busy over the last few days integrating Intercom into a client’s cloud app. I have integrated quite a few SaaS systems of late (and written not a few REST interfaces myself) and I am quite enjoying this integration as the data model is good and the REST interface is reasonably easy to use!

So far so good, and at least it keeps me out of trouble, the specific trouble I have in mind is to attempt to implement a visual programming front-end for the ARM NEON SIMD instructions using the very exciting noFlo Javascript library!!

The ARM NEON instructions are very powerful SIMD instructions that are very useful for optimising image processing and computer vision tasks on ARM devices. They are however quite difficult to use as each instruction has so many variants for the different input and output element types. I reckon it could be a problem well suited to visual/graph flow programming and as far as I can see the most capable people at those races are those in the noFlo team – I am getting seriously excited about their work!

Anyway back to Intercom for the moment!

get_value_for_datastore() example

When using a db.ReferenceProperty in google app engine, the function get_value_for_datastore() can be used to get the key of a referenced datastore model without causing GAE to automatically fetch the model – this can be very handy if you are trying to optimise the performance of your cloud code.

 

In the above code, A contains a reference to B, given a model of type A, b’s key can be retrieved as follows:

 

This will get b’s key without fetching its model!

 

Google App Engine – Access ReferenceProperty without fetching referenced object

In google app engine’s cloud based Datastore db.ReferenceProperty can be used in a Model to reference another Model like this:

 

In the above example A references B, if you have an object of type A you can directly access its referenced B like this:

When you access a.b, GAE will automatically fetch b from the datastore without you having to do anything special. This is handy, but sometimes you may not want the automatic datastore fetch to happen, instead you may just want know the referenced object’s key.

 

If you were dealing with 1000s of objects, the extra DB fetch for each object could add a huge unnecessary time overhead to your processing (for example if you were using memcache for caching, or if you already had a list of all the relevant b’s in memory).

 

So how do we retrieve the key of the referenced object without triggering an datastore fetch? To achieve we have to invoke a rather obtuse function called (wait for it): get_value_for_datastore()

 

For example, We can find out what b’s key value is like this:

 

The above won’t cause GAE to fetch the referenced B from the datastore. Its all a bit long winded and confusing but it does seem to work!

 

Google App Engine, DeadlineExceededError snow storm in _warmup and module load

A Google App Engine cloud application that we have been involved in developing suddenly started getting snowed under by a storm of GAE’s most excellent DeadlineExceededError exceptions.

 

The app never really suffered from this before (unless a rogue task actually took too long that is) but suddenly these errors started happening on a huge number of requests during the module load or _ah/warmup phases, they didn’t even get to run any of our code before they were killed with this most beautiful of error messages. It literally brought the app to its knees…

 

For example, we kept finding stuff like this in the logs:

 


class 'google.appengine.runtime.DeadlineExceededError'>:
Traceback (most recent call last):
File "/base/data/home/apps/appname/9993.355980156233423494/warmup.py", line 1, in module>
import appengine_config # Cache by import
File "/base/data/home/apps/appname/9993.355980156233423494/appengine_config.py", line 27, in module>
....

 

After a little research it seems that this has started happening to loads of GAE users since December 2011, have a look at this thread:

 

http://groups.google.com/group/google-appengine/browse_thread/thread/369a9a76c394c99e/976722e37ad07d0c

 

Loads of annoyed people out there having the same type of problem. The general recommendation was to move to a high replication datastore (rather than master/slave), this move is not always simple however and I believe will incur higher costs!

 

To cut a long story short, our app was dead in the water so we had no choice other than to migrate it to a high replication datastore and hope that this fixed the problem (and hope that the migration didn’t create too many new problems). Once we carried out the migration the DeadlineExceededError excpetions went away, so that was good but we were a bit peeved that our hand was forced in such a way.

 

So I don’t know what google changed but it seems that they want people on high replication or else! Once again we find ourselves seriously questioning whether google app engine is a sutibaly stable platform for deploying real-world apps – the jury is still out…

 

Anyway if you find yourselves with the same problem they you may be headed the HRD way!

 

Google Cloud – App Engine SDK v1.6.1 Released

Just catching up with things after Christmas, one thing that escaped my notice in December was that Google released version 1.6.1 of their Cloud App Engine SDK on 13th. of December. There’s not a huge amount of interest really in this release, but two items did catch my eye:

 

You can now select how much CPU power and memory is available for your fronted instances, you can select this from a small set of presets in the app dashboard – however google warns that selecting a higher preset will incur extra cost! We sometimes run into soft memory limit problems on our frontends so this may be of great use to us, but I do worry about the $$ !

 

Google have also released a new experimental document conversion API which they say contains OCR functionality – I look forward to testing this and will be very interested to see how well the OCR preforms as it can be a very tricky thing to get right! I suppose that this is an API to the same OCR functionality that they have in google docs….

 

Oh and they have added an API for programmatically reading the application logs, this may come in handy for automated testing!

 

More details about the release can be found here:

 

http://googleappengine.blogspot.com/2011/12/app-engine-161-released.html

 

Google Cloud Development SDK v1.5.5 Released

 

Yesterday Google released v1.5.5 of their App Engine SDK. There are loads of nice changes in this release including the easing of some previously annoying limits, the one I am especially happy about is the doubling of the frontend request deadline from 30s to 60s, this will make our life a lot easier!

 

More details can be found here:

 

http://googleappengine.blogspot.com/2011/10/app-engine-155-sdk-release.html

Google App Engine error 500 on Upload to Cloud

Q: I keep getting ‘Error 500’ when I try to upload / deploy to my code to the google app engine via ‘appcfg.py update’ or similar, what’s going on?

 

A: Most often this error is caused by problems with the Google cloud infrastructure and as such there is nothing you can do to fix the problem except wait and keep retrying every now and again, the problem normally goes away in an hour or 2. Do however check that your various quotas havent run out.

 

Don’t be suprised if when you view the System Status table, you see that it is all green – in our experience this status grid rarely represents the status of the cloud as experienced by developers and users. You could report the problem to Google here as suggested in the error message however if you were to report every service interruption you wouldn’t be left with very much time to do much else… ;-)

 

Google App Engine – TransientError in Cloud

We were engulfed by a blizzard of Google task queue ‘TransientErrors’ over the weekend, the ‘TransientError’ is one most ellusive of google app engine errors. This is what the official documentation has to say of it:

 

exception TransientError(Error)
There was a transient error while accessing the queue. Please try again later.

This very detailed description has left many scratching their heads wondering (a) Why they are getting the errors and (b) What should they do about them!

 

So far, the best description I could find of the error is here:
http://osdir.com/ml/GoogleAppEngine/2009-06/msg01337.html

 

A TransientError is an unexpected but transient failure. Typically it
is a deadlined or otherwise missed connection between two backends in
our system. We distinguish TransientError from InternalError to say
that transient errors failed but were expected to succeed (and
retrying will probably work), whereas InternalError is quite
unexpected and retries will have no effect.

 

This seems to suggest that you shouldn’t worry too much about them as they happen rarely and the automatic retry should work – no harm done…

 

In our experience however it seems that the error is far from transient in that it often occurs in batches with the retries experiencing the error too – it causes our app to grind to a halt, the whole thing tends to snowball. It is like something fairly serious goes wrong with the google cloud and things just stop working.

 

Does anybody know if there is any reliable way of handling these errors when they occur?

 

And more seriously, can an application be made reliable on the google cloud with nasty undocumented errors like this cropping up in batches every few months? I suppose software development for cloud environments is still really only in its infancy, developers and platform suppliers have a lot to learn, but unexplained stealth errors like the TransientError really doesn’t help the situation!