Speed up Encoding of JPEG images on Embedded System

Q: How can I speed up encoding & decoding of JPEG images on my embedded system?

A: Use libjpeg-turbo!

If you are encoding JPEG images from code on your embedded system then chances are you are using libjpeg either directly or indirectly. libjpeg is the standard JPEG implementation, and as properly JPEGing images is VERY hard everybody just uses this library. For example, if you save an image from OpenCV, it will use libjpeg under the hood.

libjpeg concentrates on getting the encoding and decoding correct, it doesn’t worry too much about speed or optimisation and so can appear slow, especially on embedded systems.

libjpeg-turbo is a drop in replacement for libjpeg, it supports the exact same API and uses the Intel and ARM Neon SIMD instructions to greatly increase the speed of encoding and decoding.

So next time you are struggling with JPEG speed, don’t listen to those who may mutter insane things about ‘using the GPU’ or some such rubbish, just check out libjpeg-turbo instead, it may well just do the trick, especially if you can arrange to encode your images on more than one core simultaneously!

libjpeg example – encode JPEG to memory buffer instead of file

Q: How can I use libjpeg to encode directly to memory on my embedded system without using a file?

A: If you’re encoding images into JPEG on your ARM/Linux embedded platform then you’re most likely using libjpeg, and if you’re encoding quickly then you’re most likely using its drop-in replacement libjpeg-turbo!

Most of the examples of how to use the libjpeg API show how to encode directly to a file, but it’s also possible to use the API to encode to a memory buffer instead. This is handy if you want to transmit the JPEG via MQTT or something and don’t actually need a JPEG file – you can avoid the overhead of writing to your flash filesystem and having to read the data from it before sending.

Of course, one easy way of achieving this if your embedded system has a RAM disk (as a lot of ARM/Linux based systems will) is to just get libjpeg to encode to file on the RAM disk, in this way you can avoid writing to flash, however you will still have to open the file and read in all of the data into a buffer afterwards.

The trick to getting libjpeg to encode directly to  memory is make a call to jpeg_mem_dest() (instead of to jpeg_stdio_dest() for a file), this allows the caller to specify an output buffer for the jpeg data (as well as its size).

You can either supply a pointer to a buffer, or if you pass NULL, the library will allocate a buffer for you – either way, you must free() the output buffer once you are finished with it!

Some things to note are:

  • You can pass in a pointer to a buffer that you have already malloc()’ed
  • If you pass in NULL, the lib will allocate a buffer for you.
  • If the buffer that you specify (or that the lib automatically malloced()’ed) turns out to be too small to hold the JPEG data then the lib will free() the buffer and malloc() a new one (this will probably also involve a memory copy).
  • After encoding the size variable will contain the number of JPEG bytes in the output buffer (i.e. it no longer contains the size of the buffer used!)
  • If you want to re-use the same buffer for each encode, then just pass in the same buffer pointer each time with it’s original size and don’t free() it after encoding.

Here is an example of a c++ function that encodes to memory, it uses a buffer that’s allocated by the library:

The we can use this function to encode an 8bit image like this:

Get the Subversion (SVN) URL for a file using Tortoise SVN?

How can I get the Subversion (SVN) URL for a single file using Tortoise SVN?

If you’re still rocking a venerable Subversion (SVN) repository to keep all of your software safe, then Tortoise SVN is a great user-interface to all of the SVN delights,  but how can you easily retrieve or copy an SVN URL to a single file in the repo?  I find that the easiest way to get an SVN URL to a folder is to right-click on the local folder and choose the ‘Repo-browser’ menu option, when the browser shows, the URL can be easily copied from the URL text box at the top of the browser screen.

 

Now, to get the URL for an individual file, just click on the file within the Repo-browser and it’s URL will be displayed in the URL box!

 

Get URL for single file in SVN

 

From here you can copy and paste to the heart’s content!

Fire-And-Forget wrapper for sending simple UDP data using boost::asio libraries

So I had a problem, as part of an embedded software system I was working on I needed to periodically send some GPS information via UDP datagrams to other devices on the network – really simple stuff, transmit a string to an IP address on a given port, Fire And Forget, send a string from here to there – end of story!

Now my normal port of call these days for networking in C++ is to reach for the boost::asio libs, as mentioned here, this library is very powerful and flexible and can handle a huge number of different networking scenarios.

All well and good, but sometimes I wish that I could just call a simple function and not have to remember or worry about all of boost stuff as it can be a proper head-wreck and there’s a lot of typing as the namespaces are so long! Python has me really spoiled, it makes so many things simple and easy to use!

So I wrote a little C++ wrapper class that just has 1 function called send() that can send a string or the contents of a binary buffer, the class is called: boost_udp_send_faf (faf stands for Fire-and-Forget!).

It is a very basic wrapper around the boost libs and only handles very simple transmission use-cases, but these use-cases probably cover about 80% of my UDP transmission needs!

To use the class for a single Fire-and-Forget send:

This sends the message to 192.168.1.44 on port 8861.

If your program needs send multiple datagrams, then we can keep the socket open and reuse the end-point like this:

The socket will remain open until the ‘sender’ object goes out of scope.

It’s probably best to add exception handling to your calls as they can fail for lots of reasons and as ever networking is very flaky & unpredictable.

This will also ensure that the socket is closed once the message is sent as the object will go out of scope when execution leaves the try block.

The code for boost_udp_send_faf can be found in this repo, but I have in-lined it below as well:

Use PIC Timer2 not Timer0 for accurate Interval Timing on an Embedded system

Recently I was struggling to achieve very accurately synchronised camera and light triggers for a real-time computer vision project that I was working on, my original PIC embedded system for triggering everything using a PIC micro-controller had a fairly reliable accuracy of about 250us which was sufficient for a few years, but for various reasons this needed to be reduced by a factor of 10 to about 25us. The problem was that as I reduced the timer’s overflow period it started to become quite inaccurate and erratic.

It is possible to achieve quite accurate timing on your embedded software system using a PIC, however it can take a bit of work to tune things. Most articles that introduce the use of timers on the PIC use the Timer0 module as an example, maybe because it has a ‘0’ in it?

Now, when using Timer0 to produce a repeated time interval, its High & Low overflow registers must be manually reset in the interrupt handler to setup the next time period, all while the timer is still counting away.

This it makes it difficult to know what values to put into the registers to get an accurate overflow period – you end up trying to account for the interrupt latency and counting assembly instructions in the interrupt handler to estimate times etc.

The upshot is that it’s hard to get a reliable overflow period with a desired accuracy when using Timer0.

Example, initial Timer0 setup code:

The best thing to do is to ditch Timer0 and switch to Timer2 instead. Timer2 automatically handles this kind of pattern (and it’s not much harder to use even though it’s got a ‘2’ in its name!).

When using Timer2 you don’t need to reload the overflow registers and so it is much easier to get an accurate and reliable overflow period! Have a look here if you need help in figuring how to setup Timer2 for your desired timing parameters – it’s very handy!

Once Timer2 is initialised in this manner, it will cycle automatically without the need to reload any values yielding a stable (and hopefully more accurate) time period.

Using ‘Mod’ on (small) Embedded Systems while Avoiding Time Penalties

The Mod/Modulo (% in C) operator is incredibly useful in many situations, it calculates the remainder from an integer division, for example 10 Mod 3 = 1 (10 Div 3 = 3, 3 * 3 = 9, 10 – 9 = 1).

Although you can happily call the % operator from your embedded C program it is important to be aware that it is typically implemented using integer division and multiplication, which is fine if your embedded system support these in hardware, if not they will be implemented in software sometimes resulting in huge time penalties (for example, on many PIC chips).

Often this won’t be a problem, but if you’re programming a real-time system these delays can be devastating! A big sting-in-the-tail from an innocent looking operator %

So what can be done:

a.) Avoid using % in your time-sensitive code, perhaps use running sums and remainders instead.

b.) Use a power-of-2 in your Mod call instead.

If the value of the right-hand operator isn’t crucial and both arguments are positive, then switching to a power of 2 will make it much easier and faster to calculate, a power-of-2 Mod can be calculated with an AND instruction.

Your compiler should figure out that you are using a power-of-2 and automatically carry out the optimisation, but to be safe, check the output assembly code. If it hasn’t optimised then you can replace your code with a & (b – 1)

Python Monty Hall Problem Simulation

The Monty Hall Problem is a very (to me at least) counter-intuitive probability mind-experiment which contorts my brain and fascinates me at the same time, I have been mulling it over the last few weeks and wanted to write a little simulator to see if the numbers come out as predicted (if not expected, and indeed they do!). I can just about understand the probabilistic arguments, but I still find it very confusing, as soon as I think that I grok it – my ‘understanding’ disappears into the night! I am used to being this bamboozled when reading about quantum mechanics or something, but I find it fascinating that such an apparently simple problem can be so deceptively deep!

Summary

There are 3 doors, behind one lies a car, while behind the other two are goats. A player chooses a door at random. Monty opens one of the other doors to show that there is a goat behind it. Monty then asks the player if they would like to stick with their original choice of door or switch to the other un-opened door.

If the player sticks with their door then their chance of winning the car should be 1/3. If the player switches door then their chances of winning the car increases to 2/3!!!

If you are having problems understanding the outcome, I find it helps to imagine that there are a million doors rather than 3. After you choose your door (1/1,000,000 chance of hiding the car) Monty opens up 999,998 doors that hide goats to leave one door still closed. Now which door do you think is most likely to hide the car? The one you choose, or the one that Monty avoided opening while he opened all 999,998 other doors?! It seems obvious to me that the other door that Monty left un-opened has a massively higher chance of hiding the car than your original choice! As N reduces to 3 this ‘obviousness’ reduces greatly however! This simulator allows you to experiment with more then 3 doors for this reason.

For more info: https://en.wikipedia.org/wiki/Monty_Hall_problem

This program is an experimental simulator to see what numbers we get when the player decides to stick or switch. It can be found on GitHub.

boost::split – warning C4996: ‘std::copy::_Unchecked_iterators::_Deprecate’: Call to ‘std::copy’ with parameters that may be unsafe

If you’re using the very handy boost::split() on Visual Studio, then you may run into the following annoying warning:

warning C4996: ‘std::copy::_Unchecked_iterators::_Deprecate’: Call to ‘std::copy’ with parameters that may be unsafe – this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ ‘Checked Iterators’

It’s a big warning that rightly litters the compiler’s output!!

Everything’s OK, it’s just VS whinging about std::copy() being called with pointers. As the warning mentions, you can disable the warnings by defining _SCL_SECURE_NO_WARNINGS, its probably best to define it only for the affected files and not your whole project however!

Another way is to explicitly instruct that warning 4996 be ignored in the code like this:

More Info

Remote Desktop Connection Closes / Disappears silently soon after Connecting

I had a strange problem with my Windows 10 Remote Desktop connections over the last few days, soon after connecting the connection would simply disappear without any message or warning, this would happen typically under a minute after connecting to the remote machine – amusing, but quite frustrating if you wanted to get anything done!

I searched around and couldn’t find any mention of similar things happening to other folk, but there was some mention of cleaning out stale connection data, so I managed to make the problem go away by going to the Remote Desktop connection manager and deleting the ‘Saved Credentials’ from the connection. When I reconnected everything was fine and has remained so ever since!

Connections from multiple Browsers to Mosquitto MQTT via Web Sockets not Working

If you are using the paho javascript MQTT client library to connect to MQTT via web sockets, you might notice that you can’t connect from multiple browser sessions at the same time. This may be because you are using the same client ID for each connection – you should use a different client ID for each connection.

To get around this you could generate a random client Id like this:

This generates a thoroughly meaningless/ugly client Id, but it works…

There is some worry that using an approach like this can lead to a buildup of stale sessions on the server side as the browser will pass a completely different client Id on each connection/refresh but I am not sure, I think this may only affect persistent sessions, will keep an eye on it…