A new design, moving to 32-bit

April 21, 2016, 5:00 pm

≪ Previous: And then the JeeNode stopped

With one of the two main energy metering nodes out of order, it’s becoming more urgent now to implement a replacement. It would be a shame to break the cycle, after some 8 years of data collection, all logged and waiting for future visualisation.

There are quite some signal sources, more than a simple JeeNode w/ ATmega328 can easily handle. Given the recent focus on STM32F103-based boards, it should come as no surprise that this has also been picked as base for the new node. The Olimexino-STM32, to be exact:

The Olimexino-STM32 has an STM32F103 µC with 128 KB flash and 20 KB RAM, as well as a very nice complement of features on-board:

plenty of I/O, since this is based on a 64-pin µC
12-bit ADCs, which can read up to 1 million samples per second
a LiPo battery connector for backup, including charger
very low power 3.3V regulator (the same as on the JeeNode)
on-board switching regulator, which can run off 9..30 V (DC, but see below)
µSD card slot (on the bottom), for long-term data storage
8 MHz and 32 KHz crystals, allowing the µC to keep track of real time
USB mini jack, for power and as USB device
CAN bus terminal header (note that it’s either USB or CAN on this µC)
Arduino-shield compatible headers, including the offset fix
mini 2x5 header for JTAG (also compatible with a Black Magic Probe)
10-pin UEXT header, for off-board extensibility
and a 16-pin extension header with all the otherwise-unused pins

In short - lots to like, and many convenient features for an “energy reporting node”.

Here is a first build, with RFM69, OLED (optional), P1 port with NPN transistor to invert the signal, and a “pulse port” for the 3 pulse counters - the most urgent task for this whole setup:

There’s a LiPo on the back, attached with dual-sided tape, and there’s a small DC jack adapter (newer models of the Olimexino-STM32 have it on-board, instead of the screws on this one):

The board is well-documented, even though in this case it’s all obscured by the battery:

One nice property of the Olimexino-STM32, is that it includes an efficient switching regulator with a wide voltage range. Since the board also includes a diode in series, it can in fact be operated from an AC power source. Which is perfect here for two reasons:

feeding AC into the unit means we now have a way to detect phase and measure AC mains variations, as well as fluctuations in 50 Hz for estimating the load on the power grid
there’s already a bell transformer nearby, used for … the door bell!

Here’s the meter cupboard - nice and dry now - and the new 8 VAC power feed, ready for use:

From the logo in the first picture, you can see that Mecrisp Forth is already running, and so are the OLED & RF69 drivers. The next task is to read the pulse counters and report them over RF.

Note that this RFM69 is being operated in native mode, i.e. it’s not sending out packets in the original RF12 format. Since this is the first such node added to the JeeLabs Home Monitoring Network, the central hub also has to be extended with a receiver node to pick up these packets.

Lots of software tasks ahead. And also some electrical puzzles, apparently - it looks like the pulse counters are not yet properly being sensed by this new board.

↧

Going 32-bit, at last

April 26, 2016, 5:00 pm

≫ Next: Energy monitor requirements

≪ Previous: A new design, moving to 32-bit

Given last week’s mishap, and the resulting damage to the main metering JeeNode, it’s time to start creating a new setup. While I’m at it, I’ll throw a few new measurements into the mix and will also try to improve on the robustness and autonomy of it all:

This is just a start, there are lots of different angles, each of them probably a project in itself. Here’s a mounting plate, as first step:

The first priority is to replace that broken JeeNode setup, so that the readout works again and can be saved at the central “hub”.

↧

Energy monitor requirements

April 26, 2016, 5:00 pm

≫ Next: Powering from an AC source

≪ Previous: Going 32-bit, at last

The meter cupboard is - as one would expect - the place where lots of energy-related matters come together. Here is the situation as of early 2016 at JeeLabs:

Note: that “JeeLabs Energy Monitor” is wishful thinking and vapourware at the moment.

There is also an FTTH internet modem, feeding an ethernet cable going upstairs to the FritzBox wireless router. And lots of wiring, no longer used: a splitter feeding TV and radio signals into four different cables, with sockets at various places in the house (replaced by a single DVB-T receiver in the living room), as well as a maze of old telephone wires.

At one point in time - in the era of modems and faxes - there were fourPOTS phone lines coming into the house (the last ones in the neighbourhood, as it turned out), later replaced by ISDN, and now all routed over the internet link. And at a fraction of the cost…

Listed in the above diagram are all the potential sources of information. Several of these haven’t been implemented yet, such as the magnetometer-based readouts of the water and gas meters, and the sensing of temperature and humidity. Also not yet hooked up, but in-place and ready to go, are the three Current Transformers.

The advantage of pulse counters is that their signal is very easy to connect, and fairly accurate - especially at low power levels, when pulses are far apart. The benefit of current transformers is that they are more effective for high current loads and inductive loads, where true and apparent power can differ substantially. With the above setup, we can have the best of both worlds - even though the different sensors are not currently hooked up to the exact same circuits.

This is a relatively small setup. The house is from the 1970’s, when current requirements as well as regulations were fairly basic. It’s all 1-phase 230 VAC mains (many houses in the Netherlands now have 3-phase distribution). There is not even a DIN rail in the meter cabinet at the moment!

↧

Powering from an AC source

April 27, 2016, 5:00 pm

≫ Next: Reading out the pulse counters

≪ Previous: Energy monitor requirements

Ok, we have a name: “JeeLabs Energy Monitor” - and we have a board to use: the Olimexino-STM32. As mentioned before, that board has a number of convenient features for this project.

Then again, any other board can be used - most differences are minor, and there is a huge variety to choose from. Even the “F103” µC family is not really important. Since the F103 is mature, and this is a low-end 128K flash / 20K RAM unit, most other µC’s should be fairly easy to port to.

The Olimexino can be powered from a 9-30 V supply, from the USB port, or from a LiPo battery. But there are a couple of advantages to powering from an AC power source, i.e. directly off a step-down transformer of 9..12 VAC:

the waveform can be used as approximation of the 230V AC mains voltage
its zero-crossings will let us calculate reactive power for the Current Transformers
the mains 50 Hz frequency fluctuations can be measured to estimate power grid load
there’s already an always-on bell transformer in the meter cupboard - might as well use it!
the Olimexino has a diode in series with the power jack, it can be powered directly by AC

That last point requires some clarification: the Olimexino specs mention a 9..30V DC power source. Powering it with AC means the on-board switching power supply will have to work a bit harder, since it will run with a widely varying input voltage: zero 50% of the time, and a half sinewave the other 50%.

The input capacitor is 100 µF, and will end up seeing fairly large currents, as it gets topped up every 20 ms. As will be shown below, the resulting voltage level always stays above 9V, allowing the regulator to do its work.

The voltage on an unloaded 9 VAC transformer used for testing looks like this:

It’s not such a great sine wave, probably because it’s not such a great transformer. But it’ll just have to do - a high-quality bell transformer should produce a more accurate representation of the AC mains waveform.

When we hook this up to power the Olimexino, something unfortunate happens:

The asymmetric loading of one half of the sine wave to power the board, plus the fact that the on-board 100 µF capacitor only draws current while being topped up, causes this distortion.

That’s not great if we also want to use the waveform later on for reactive power calculations. Fortunately, the original waveform is symmetric. So the solution is to use one half of this 50 Hz wave for drawing power and the other half for measurements. Here’s the circuit we will use:

As described in a recent article, a voltage divider is used to reduce the signal to acceptable levels for ADC readout and this divider is then connected to +3.3V to handle this - negative! - voltage.

By matching up the scales of the two signals we can see the result on the oscilloscope:

The blue trace is the transformer’s AC voltage (about 34 Vpp). The top half is used to power the board, and the bottom half gets converted to a “3.3V signal with downward blips”. There is enough information to reconstruct an accurate representation of the full signal. Thanks to the symmetry, we can place the zero crossings at the level where the upward and downward slopes have 50% duty cycle - halfway on the above screen shot, i.e. roughly 3.0V on the ADC input.

One last screenshot, using a bit of scope trickery to bring it all together:

There is a lot of useful information in here:

persistence has been turned on, and the scope is triggering on both slopes
because of this, the signal is shown twice, once shifted 180° in phase
the yellow and blue traces are the same signal, but yellow is shown inverted
this creates a nice overlap and shows the symmetry, which is nearly 100%
the top half shows the blue line “capacitor top-up” voltage dips
the magenta line is the voltage after the diode, i.e. the switcher’s input voltage
as you can see, that switcher input voltage fluctuates between 12V and 17V
what is also very clear is how the capacitor pulls the AC feed as it charges
there are small spikes on the supply voltage - this is normal for switching regulators

When running at 72 MHz, the Olimexino draws about 45 mA @ 5V - with the switcher this translates to around 20..30 mA drawn from the AC power supply. This should not be a problem, even if it were to double later on, when a bit more circuitry gets added.

So there you have it: we’ve re-used an existing power source, and we have a 0..3.3V signal which can be fed to an ADC pin, to let us determine AC mains voltage and frequency fluctuations as well as obtain a decent representation of the wave shape of AC mains power, all in real time.

↧

Reading out the pulse counters

April 28, 2016, 5:00 pm

≫ Next: Upstairs, downstairs

≪ Previous: Powering from an AC source

Unfortunately, progress on pulse input recognition has been a bit slow - partly due to unrelated “time sinks” - so this will be an overview of some other details of the JeeLabs Energy Monitor.

The mounting bracket was 3D-printed from a design on Thingiverse, with these parameters:

pcb_width = 53.5;
pcb_length = 53.5;
pcb_height = 2;
slisse = 1;

pcb_bottom_margin = 10;
wall_width_thick = 2.5;
wall_width = wall_width_thick-slisse;
box_height = 16;

This custom-sized “clip” holds the Arduino-shaped board nicely, with room for the LiPo battery. It was shortened a bit to stay clear of the reset button. A quick click-and-slide now does the trick:

A DCF77 receiver module from Pollin hangs just below the unit, fixed to a nail with a tie-wrap.

There’s an RFM69 on the shield, and three CT inputs have been created, with a 1 KΩ + 1 KΩ voltage divider to create a 1.65V reference. There are some capacitors to decouple and “stiffen” this reference voltage (further details to follow later, when the current transformers are added).

The shield is a hodge-podge of experimental circuits and connections, as can be expected in this very early prototyping phase:

And here’s the back side, all wired up with prototyping-friendly isolated Kynar wire:

No Arduinos were harmed in the construction of this board, despite their 0.06” header offsets, because the Olimexino has room for extra headers with a normal tinkerer-friendly 0.1” pin grid.

If you look very closely though, you will see that at position (R,18) an extra pin was added to the board - this mates with a soldered-on pin on the Olimexino to tie into the 5V power connection, so that the board can be powered from just this shield via an FTDI header during development.

↧

Upstairs, downstairs

May 3, 2016, 5:00 pm

≫ Next: A remote console w/ ESP-Link

≪ Previous: Reading out the pulse counters

It turns out that tinkering and development is hard when the signals to be measured are located far from my desk and electronics workbench. The meter cabinet is located downstairs next to the front door, while my lab / playground is three half flights up (we live in a split-level house).

While bringing a laptop down for software development and upload would be easy, taking the scope down there and leaving it all hooked up for days on end… not fun.

Fortunately, there are ways to improve on this. One is to create a path for accessing the console during development with Thorsten von Eicken’s ESP-Link. The other trick is to use the JEM prototype board interactively, to peek and poke at its input pins and try and debug as much as possible that way.

Which is exactly what I’ve been doing lately:

A remote console w/ ESP-Link - Wed
Measuring AC supply voltage - Thu

It doesn’t remove the need to move the unit occasionally - to make hardware changes to it for example - but it should go a long way to making remote development possible!

↧

A remote console w/ ESP-Link

May 3, 2016, 5:00 pm

≫ Next: Measuring AC supply voltage

≪ Previous: Upstairs, downstairs

The ESP-Link is a clever project, which turns an ESP8266 WiFi module into a transparent serial link - sort of a wireless FTDI interface. Thorsten not only created a really powerful software package for this, he also built a few prototypes to turn this into a small “BUB-like” package:

Here’s the schematic of what’s on this board:

An ESP8266 and a switching regulator to bring the 5V down to 3.3V, basically. And some very finicky choices of pins and jumpers to be able to use this in various ways:

ESP-Link software has to be uploaded to the ESP8266 using another serial FTDI board
note that this only needs to be done once: after that, updates can be installed over the air
when done, a few jumpers and pins are changed to make it match the normal FTDI pinout

This is all documented on the ESP-Link project site. It can be a slightly tricky business to get going initially, but after that it’s a delight to use.

The ESP-Link presents an impressive home page for configuration and use from a browser:

If you want, you can even use its web-based console page for everything:

Note the reset button, which makes recovery of runaway code in Mecrisp a breeze.

But that’s all icing on the cake. The main use of ESP-Link is as a socket-based telnet connection. For this reason, the Forth Line evaluator tool has now been extended to also support telnet:

$ folie -p jemesp:23
Connected to: jemesp:23
ok.
11 22 + . 33  ok.
^D
$

A telnet connection will be used when the path to the “serial port” has the format shown above, i.e. ending in a colon and network port number - else folie falls back to normal serial port mode.

Best of all, (remote!) uploading via include <filename> in folie still works as expected:

$ folie -p jemesp:23
Connected to: jemesp:23
ok.
Mecrisp-Stellaris 2.2.4 for STM32F103 by Matthias Koch
\       >>> include h
\       >>> include ../mlib/hexdump.fs
\       <<<<<<<<<<< ../mlib/hexdump.fs (73 lines)
\       >>> include ../flib/io-stm32f1.fs
\       <<<<<<<<<<< ../flib/io-stm32f1.fs (69 lines)
\       >>> include ../flib/hal-stm32f1.fs
\       <<<<<<<<<<< ../flib/hal-stm32f1.fs (134 lines)
\       >>> include ../flib/timer-stm32f1.fs
\       <<<<<<<<<<< ../flib/timer-stm32f1.fs (47 lines)
\       >>> include ../flib/pwm-stm32f1.fs
\       <<<<<<<<<<< ../flib/pwm-stm32f1.fs (52 lines)
\       >>> include ../flib/adc-stm32f1.fs
\       <<<<<<<<<<< ../flib/adc-stm32f1.fs (54 lines)
\       >>> include ../flib/rtc-stm32f1.fs
\       <<<<<<<<<<< ../flib/rtc-stm32f1.fs (43 lines)
\       >>> include ../flib/ring.fs
\       <<<<<<<<<<< ../flib/ring.fs (32 lines)
\       >>> include ../flib/uart2-stm32f1.fs
\       <<<<<<<<<<< ../flib/uart2-stm32f1.fs (31 lines)
\       >>> include ../flib/uart2-irq-stm32f1.fs
\       <<<<<<<<<<< ../flib/uart2-irq-stm32f1.fs (24 lines)
\       >>> include ../flib/spi-stm32f1.fs
\       <<<<<<<<<<< ../flib/spi-stm32f1.fs (68 lines)
\       >>> include ../flib/i2c-bb.fs
\       <<<<<<<<<<< ../flib/i2c-bb.fs (49 lines)
\       <<<<<<<<<<< h (57 lines)
\ done.

There’s still a buglet in this setup: the “reset” word leads to a runaway loop of “Unhandled Interrupt 00000003” - but this can be recovered through the reset button on the web page. The same happens with eraseflash and any other word indirectly calling reset.

Perhaps it’s related to timing or the junk character generated by Mecrisp after such a s/w reset?

Anyway… thanks to ESP-Link, it’s now possible to tinker with the JeeLabs Energy Monitor prototype from anywhere in the house. A huge convenience!

↧

Measuring AC supply voltage

May 4, 2016, 5:00 pm

≫ Next: When an input pin isn't one

≪ Previous: A remote console w/ ESP-Link

With the JeeLabs Energy Monitor prototype hooked up in the meter cabinet, it’s now very easy to explore things a bit. One interesting test is to look at the analog signal from the AC power supply + divider, which is connected to pin PA0.

Here is some code which samples that signal and prints it out until a key is pressed:

: a1 +adc begin pa0 adc . key? until ;
a1 4029 4031 4030 4030 [...] 1268 1343 1763 2330  ok.

The ADC is initialised, and then we acquire a value, print it, and loop until some key is pressed. The readings are easily copied to a spreadsheet and graphed:

Some observations:

as expected, we’re only reading the negative excursions of the supply voltage
the scale is working out nicely, over 70% of the ADC’s 12-bit resolution is used
it looks like we’re capturing about 25 samples per 50 Hz cycle
this loop timing is partly limited by the 115,200 baud rate of the (polled!) serial output
i.e. 10 bits per char, 5 chars per ADC (4 digits and a space) @ 115,200 baud takes 434 µs
plus some time for the ADC - hmmm… not sure where the remaining 300+ µs went!

The current polled ADC acquisition code itself is very fast. It’s easy to measure this:

: a2 micros pa0 adc micros nip swap - . ;  ok.
a2 41  ok.

So it takes 41 µs to acquire one ADC sample (less actually, the micros code has some overhead). To improve on this, we can first acquire samples in a quick loop, and then print them out:

4000 buffer: adata  ok.
: a3 1000 0 do pa0 adc i cells adata + ! loop ;  ok.
: a4 +adc micros a3 micros swap - . ;  ok.
: a5 1000 0 do i cells adata + @ . loop ;  ok.
a4 32915  ok.
a5 1333 1334 1334 [...] 4026 4028 4027  ok.

Which means that we’ve captured 33 ms of data, i.e. more than one full 50 Hz cycle:

Both half-wave captures consist of 338 readings, with 269 readings in between, when the ADC returns 4030 or 4031. Which means that the zero crossings are indeed included in our data, and that there is enough information to reconstruct a full sine wave for CT power calculations later.

Note that the sine wave has somewhat flattened peaks and there also appears to be some noise.

These readings were taken with a µC running at 8 MHz, which gives enough resolution for now. At 72 MHz, a2 reports 6 µs and a4 reports 4949 µs - this translates to a polled ADC acquisition rate of over 200,000 samples per second. The flip side is that at this higher rate we can only capture some 5 ms with a 1000-entry buffer, considerably less than a full 50 Hz AC cycle:

For tests, the polled approach is fine - and very convenient, given the interactive command-line access provided by the ESP-Link - but the acquisition rate will not be constant due to occasional interrupts, and the µC would be tied up most of the time, which is clearly not very practical.

The solution is to use DMA-based ADC acquisition. Then, we only need to deal with acquired data in large chunks. The STM32F103 µC’s DMA hardware has two nice interrupts for this: when the buffer is half full, and when it reaches the end and acquisition starts over at the beginning.

Let’s plan ahead a bit and consider the rates and time + memory resources needed:

there are 4 ADC channels: 1 for supply voltage and 3 for the current transformers
we could collect 1600 samples per channel, that’s 800 samples per interrupt
this requires 4 x 1600 x 2 bytes = 12.8 KB, which is ok - the Olimexino has 20 KB RAM
if we sample at 25,000 Hz, each 800-sample block will cover 32 ms of time
this is enough to always capture more than one full wave, i.e. 3 .. 4 zero crossings
we’ll get one interrupt every 32 ms, with a 800-sample block ready for each channel
so during this interrupt, the code will need to process 4 x 800 = 3200 data points

Even if our calculations take more than 32 ms, this would be ok: we could simply ignore some acquisition cycles. The only constraint is that the buffered data needs to be processed (or moved out of the way) before the DMA engine overwrites it with new data, i.e. within 32 ms.

The beauty of DMA is that it completely frees the CPU. It “only” has to deal with ≈ 30 interrupts per second, with all ADC values acquired exactly 40 µs apart and stored in the DMA buffer.

There is still plenty of work to do… such as dealing with phase and finding those zero crossings.

↧

When an input pin isn't one

May 10, 2016, 5:00 pm

≫ Next: This week's potpourri

≪ Previous: Measuring AC supply voltage

The article about the ESP-Link ended with what turned out to be an ominous note, in hindsight:

There’s still a buglet in this setup: the “reset” word leads to a runaway loop of “Unhandled Interrupt 00000003” - but this can be recovered through […]

Something really strange was happening:

toggling the DTR pin from the ESP-Link did correctly reset the Olimexino
but doing a software reset sent the Olimexino into a tail spin

And here were the crazy bits:

the software reset worked fine when DTR from the ESP-Link was not connected
once in this endless fault loop, even pressing the RESET button did not work (!)

To summarise: the ESP-Link itself works brilliantly. It gets data across between the Olimexino serial FTDI port and WiFi, in both directions and at full 115,200 baud speed, without dropping a single byte. The Reset button on its “Console” web page allows getting control back from any runaway code loop or crash, and does this again without any hiccups whatsoever.

But enter the word “reset” followed by a newline, and you get this:

The yellow trace is the RESET (DTR) pin, blue is serial data from ESP to STM, and green is serial output data, i.e. from STM to ESP. Note that the three traces overlap to fit on the screen.

First the characters “reset” are sent and echoed, then a brief delay, then a CR is sent (and echoed as a space), and then… total havoc!

With the RESET / DTR pin disconnected, it all works exactly as expected:

The Mecrisp welcome greeting, and a little later a custom message from the “init” code.

The crucial observation: how on earth can a software reset behave differently when a hardware pin connection is changed? (thx to Matthias K for pointing this out) - a power glitch? noise?

After many, many hours of head-scratching, it turns out that the answer is hidden in this sneaky little diagram, mentioned in STM’s reference manual (RM0008, p.90):

The RESET pin is not just an input pin, it’s also driven low when a reset is generated internally! - with the pin tied to the “stiff” output pin on the ESP-Link, it was fighting against a high logic level and ended up causing a hardware fault interrupt. Note that the µC continued to work as it was sending out messages over serial at the proper baud rate. It just never reached a reset state…

Once identified, the solution was trivial: just add an extra diode, so that the ESP can only pull the µC’s RESET pin down. Now, internal resets no longer face the “1” output level when pulled down, and instead perform a clean reset, shaped by the RC circuit on the Olimexino board:

And indeed, everything works with the diode - here is the RESET pin during a software reset:

Spot-on: 10 kΩ + 100 nF = 1 ms RC rise time, at which point the RESET pin is ≈ 67% of Vcc (ignore the the scope’s 1.80 ms value for tr: it’s calculated from the 10%/90% points).

↧

This week's potpourri

May 10, 2016, 5:00 pm

≫ Next: Great ADC/DMA performance

≪ Previous: When an input pin isn't one

This week will be a hodgepodge of topics which I’ve been working on recently: a hardware interaction which kept me really puzzled for quite some time, a delightful excursion into the deeper innards of the STM32F103 µC, and an exploration into the raw performance of ARM chips and Forth:

As always on this weblog, each article this week is released on successive days, to let me stay - slightly - ahead of the game.

↧

Great ADC/DMA performance

May 11, 2016, 5:00 pm

≫ Next: Some µC speed measurements

≪ Previous: This week's potpourri

For the “JEM” JeeLabs Energy Monitor, we’re going to need to put the ADC on the Olimexino’s STM32F103 to some serious work: the goal is to acquire 4 ADC channels at 25 Khz each, so that we can capture a full cycle of the 50 Hz AC mains signal with a resolution of 500 samples, as well as collecting the readings of up to three current transformers.

Since AC mains voltage is being sampled via the negative peaks of the incoming 9V AC supply, we really only get half cycles, with flat segments in between. To be able to reconstruct a full cycle, we need to capture at least 3 segments: in the worst case, two flat ones with only one complete negative cycle. This requires a data sampling window of at least 30 ms.

As described earlier, we’re going to aim for the following setup:

single ADC, acquiring 4 channels every 40 µs
for each channel, two buffers of 800 samples
this gives an acquisition time of 32 ms per buffer

The STM32F103 has a very capable ADC subsystem, as seen in this diagram from the datasheet:

To distill some other relevant info from the datasheet for our use case:

the ADC can take up to 1 million samples per second
it’s slightly less when running at 72 MHz (max ≈ 850 Ksps)
there’s a “SCAN” mode to read 1..16 specific ADC channels in rapid succession
the ADC can be triggered to run from a hardware timer, set to 25 KHz in this case

So this means we’re getting one new ADC reading every 10 µs on average. There is one catch: in scan mode, the ADC can only be used in combination with DMA, which makes sense since these data rates would completely overwhelm the CPU if handled through interrupts.

A benefit of using a hardware timer + DMA is that the ADC acquisition timing will be rock solid.

That DMA controller itself is an equally sophisticated part of the µC chip, by the way:

Note that both diagrams include hardware which is not on the “low end” STM32F103RB used on the Olimexino-STM32 board, which has only one ADC and one DMA unit.

It takes quite some reading in the (1137-page!) reference manual for the STM32F1xx chips, to figure out all the settings needed to implement the above acquisition mode. Then again, once that’s done, the code is remarkably short.

Here’s the basic DMA-based acquisition cycle to keep the ADC permanently running:

: adc1-dma ( addr count pin rate -- )  \ continuous DMA-based conversion
  3 +timer        \ set the ADC trigger rate using timer 3
  +adc  adc drop  \ perform one conversion to set up the ADC
  2dup 0 fill     \ clear sampling buffer

    0 bit RCC-AHBENR bis!  \ DMA1EN clock enable
      2/ DMA1-CNDTR1 !     \ 2-byte entries
          DMA1-CMAR1 !     \ write to address passed as input
  ADC1-DR DMA1-CPAR1 !     \ read from ADC1

                0   \ register settings for CCR1 of DMA1:
  %01 10 lshift or  \ MSIZE = 16-bits
   %01 8 lshift or  \ PSIZE = 16 bits
          7 bit or  \ MINC
          5 bit or  \ CIRC
                    \ DIR = from peripheral to mem
          0 bit or  \ EN
      DMA1-CCR1 !

                 0   \ ADC1 triggers on timer 3 and feeds DMA1:
          20 bit or  \ EXTTRIG
  %100 17 lshift or  \ timer 3 TRGO event
           8 bit or  \ DMA
           0 bit or  \ ADON
        ADC1-CR2 ! ;

It’s not so important at this stage how this works, just what it does:

a buffer + length is passed in, where the DMA unit will deposit all its readings
the DMA unit is set up to fill this buffer in circular mode, going on forever
the ADC is set up to acquire data on every timout of timer 3 at a specified rate

This was created in an earlier experiment, titled Reading ADC samples via DMA to implement an oscilloscope. That was for a single channel, whereas here we need four. Luckily, we can keep that DMA code as is and modify the ADC settings on the fly to switch to 4-channel scan mode:

: quad-adc ( -- )  \ configure ADC and DMA for quad-channel continuous sampling
  +adc  6 us  adc-calib 
  adata #abytes VAC-IN arate-clk adc1-dma
  VAC-IN adc#                 \ channel 0
  CT1    adc#  5 lshift or    \ channel 1
  CT2    adc# 10 lshift or    \ channel 2
  CT3    adc# 15 lshift or    \ channel 3
              ADC1-SQR3 !     \ set up the ADC scan channels
  3 20 lshift ADC1-SQR1 !     \ four scan channels
         8 bit ADC1-CR1 bis!  \ enable SCAN mode
;

The above code depends on a number of constants, defined as follows:

                               4 constant #adcs
                             800 constant #asamples
                               2 constant #abuffers
#adcs #asamples * #abuffers * 2* constant #abytes
                              40 constant arate-us
                   arate-us 72 * constant arate-clk

It also needs this definition of a 12.8 KB buffer to store all acquired data in:

#abytes buffer: adata

Note that the timer and DMA settings have not changed: timer 3 will fire once every 40 µs and trigger a burst of four ADC conversions, one for each channel. Each completed conversion then triggers a DMA transfer, filling up the circular buffer four times faster than before.

All this code looks complex, and of course in a way it is indeed - this is a complex use case for the ADC + DMA hardware contained in the µC, after all! But in actual use it couldn’t be simpler:

quad-adc

That’s it. Now - magically - the adata buffer will be continuously filled with ADC samples from all four channels, without the CPU doing any work at all. It’s all happening in the background, and perhaps most surprising of all: the current drawn for all this extra activity is only 2 mA!

The processing overhead is negligible: one 16-bit read and one 16-bit write by the DMA unit - once every 10 µs on average. Since both ADC and SRAM are on the fast internal bus, this will occupy that internal data bus less than 0.3% of the time.

We do need to be careful with timing and synchronise our processing to avoid DMA changing values while we’re still using them. This is solved by inspecting a few status bits in the DMA controller: there is one bit for when the buffer has been filled halfway and another bit when the buffer is full and the DMA unit starts over from the beginning. These happen 40 µs x 800 samples = 32 ms apart, so we can simply poll this in the main loop of our application. There is even no need to introduce interrupts - 32 ms is a very long time for a µC running at 72 MHz.

At the halfway point, we have 32 ms to process the 1st buffer. At the end point, we have another 32 ms to process the 2nd buffer. And so on. This is the circular equivalent of double buffering.

Another subtle issue, is that we can no longer use the ADC in polled mode. To read out the LiPo voltage for example, we need to somehow make the ADC read out an extra channel, without interfering with the above high-speed acquisition cycle. As it happens, the designers at STM thought of that too, and came up with the concept of “injected data channels”: it’s possible to make the ADC acquire 1..4 extra channels, and have it place the results in separate registers.

Using this mechanism, we could specify that we want to read PB0 as well for example (once!), and then simply wait for the ADC scan to pick that request up after it has taken care of all the regular channels. This will allow reading out a few other analog pins, with at most 40..50 µs delay - the worst-case time needed by the ADC to start again and process our “injected” request.

As you can see, modern ARM µCs are a lot more than just a CPU-with-some-memory!

↧

Some µC speed measurements

May 12, 2016, 5:00 pm

≫ Next: Working on JEM

≪ Previous: Great ADC/DMA performance

Not long ago, Ken Boak very generously donated one of his assembled PCB designs to JeeLabs:

This is a break-out board for the STM32F746VG, an ARM Cortex M7 CPU with floating point and a whopping 1 MB flash + 320 KB RAM, all in a 100-pin SMD package.

Lots of I/O hardware, including USB and Ethernet, lots of analog I/O with three ADCs capable of millions of samples per second each, and a dual DAC. Lots of UART/I2C/SPI too, of course.

But the most interesting aspect of this chip, versus the lowly STM32F103 chip used in the HyTiny and Olimexino, is perhaps its speed: the STM32F7 series can run at up to 216 MHz, three times as fast as the F103. On first thought, it might seem that this would translate to “simply” running three times as many instructions in the same amount of time. Not so:

This is what the different columns represent:

µs/10k = microseconds to run 10,000 iterations of the loop
clk/loop = processor clock cycles per single loop iteration
iter/µs = iterations per µs (the same as: million iterations per second)
speedup = performance increase of F746 @ 216 MHz over F103 @ 72 MHz
efficiency = performance increase specific to Cortex M7 vs Cortex M3

That last column is the most interesting one: it compares the measured performance of some simple loops in Mecrisp Forth while dividing out the clock rate. So an empty loop runs about 4 times faster than could be explained by the clock speed difference alone.

The most likely explanation is a better cache, a better processing pipeline, or a better lookahead optimiser - or more likely: a mix of all this. Getting to the bottom of this would require much more investigation - for now, the point was simply to show how advances in µC technology can lead to more-than-linear performance increases.

The code used for the above timing results was as follows (running from RAM):

10000 buffer: buf

: j0 micros 10000 0 do                         loop micros swap - . ;
: j1 micros 10000 0 do nop                     loop micros swap - . ;
: j2 micros 10000 0 do 1 i buf + c!            loop micros swap - . ;
: j3 micros 10000 0 do     buf   c@       drop loop micros swap - . ;
: j4 micros 10000 0 do   i buf + c@       drop loop micros swap - . ;
: j5 micros 10000 0 do   i buf + c@ dup + drop loop micros swap - . ;
: j6 micros 10000 0 do   i buf + c@ dup * drop loop micros swap - . ;
: j7 micros 10000 0 do   i buf + c@ dup / drop loop micros swap - . ;
: jn j0 j1 j2 j3 j4 j5 j6 j7 ;

It’s not a very comprehensive timing suite - just a quick set of explorations which came to mind. Let’s not even try to suggest that this would be representative in any way or for any purpose.

One aspect stands out, though: the amazing speed of this code. It can be typed into the console interactively, yet the resulting performance levels are orders of magnitude higher than other interactive languages, which tend to be interpreted (especially in such a constrained µC context).

The range of power consumption modes is equally impressive, from drawing a few dozen mA when the F103 runs at 72 MHz and about 150 mA when the F746 runs at 216 MHz, to just a few microamps when entering standby mode. Computers have come a long way since the PDP-8!

P.S. - Here is a different kind of performance comparison: running 1,000,000,000 iterations of an empty loop takes about 26 s on an STM32F746 @ 216 MHz, 7 s on a core i7 @ 2.8 GHz, using Qemu in a Linux VM (via qemu-arm-static), and 1 s on an Odroid C1+ @ 1.7 GHz. Whereby those last two both use the Linux ARM build (all these tests were done with “Mecrisp 2.2.5 RA”).

↧

Working on JEM

May 17, 2016, 5:00 pm

≫ Next: Tracking pulses w/ interrupts

≪ Previous: Some µC speed measurements

The JeeLabs Energy Monitor prototype is progressing nicely - once I figured out that I had my numbering of the Arduino analog pins 0..5 reversed… doh!

Here is what’s on the menu for this week:

It turns out that this little STM32F103 µC ARM chip in the Olimexino board I’m using has plenty of power to perform an amazing number of tasks in parallel: acquiring four ADC channels at 25 KHz each, keeping track of the exact timing of three pulse counters, driving an OLED display, sending out packets over an RFM69 link, and more…

↧

Tracking pulses w/ interrupts

May 17, 2016, 5:00 pm

≫ Next: Frequency aliasing in ADCs

≪ Previous: Working on JEM

There are three pulse counters for measuring power at JeeLabs - one for solar PV production and two for the kitchen stove and the rest, respectively:

These generate 2000 pulses per kWh, that’s one pulse per 0.5 Wh, and are optically isolated. Reading them out is super simple: add a 1 kΩ series resistor and power them from 3.3V .. 5V. The result is a series of clean “1” pulses, each 100 ms long (there’s no contact bounce).

At the maximum rated current of 16A each, which corresponds to 3680 Watt for a nominal 230 Vac feed, we get about 2 pulses per second. With room to detect surges at least 5 times as high.

The main trick is to measure the time between these pulses fairly accurately, as this provides a measure of the actual current consumption. With 1s between pulses, we know the power is 1800 W, and with 10s between pulses, it’ll be 180 W - averaged out over those periods, that is.

It’s easy to measure time in a µC, especially on a millisecond scale. There’s a SysTick counter in the ARM µC, which is set up to run at 1000 Hz, i.e. one tick per millisecond. See this code.

So all we need to do is set up three “external interrupts” to trigger on the rising edge, and then count and timestamp each event:

0 0 2variable pulses1  \ last millis and pulse count #1
0 0 2variable pulses2  \ last millis and pulse count #2
0 0 2variable pulses3  \ last millis and pulse count #3

: ext3-tick ( -- )  \ interrupt handler for EXTI3
  3 bit EXTI-PR !  \ clear interrupt
  millis pulses1 1 over +! cell+ ! ;

: ext4-tick ( -- )  \ interrupt handler for EXTI4
  4 bit EXTI-PR !  \ clear interrupt
  millis pulses2 1 over +! cell+ ! ;

: ext5-tick ( -- )  \ interrupt handler for EXTI9_5
  5 bit EXTI-PR !  \ clear interrupt
  millis pulses3 1 over +! cell+ ! ;

: count-pulses ( -- )  \ set up and start the external interrupts
       ['] ext3-tick irq-exti3 !     \ install interrupt handler EXTI 3
       ['] ext4-tick irq-exti4 !     \ install interrupt handler EXTI 4
       ['] ext5-tick irq-exti5 !     \ install interrupt handler EXTI 5-9

               9 bit NVIC-EN0R bis!  \ enable EXTI3 interrupt 9
  %0010 12 lshift AFIO-EXTICR1 bis!  \ select P<C>3
                3 bit EXTI-IMR bis!  \ enable PC<3>
               3 bit EXTI-RTSR bis!  \ trigger on PC<3> rising edge

              10 bit NVIC-EN0R bis!  \ enable EXTI4 interrupt 10
            %0010 AFIO-EXTICR2 bis!  \ select P<C>4
                4 bit EXTI-IMR bis!  \ enable PC<4>
               4 bit EXTI-RTSR bis!  \ trigger on PC<4> rising edge

              23 bit NVIC-EN0R bis!  \ enable EXTI9_5 interrupt 23
   %0010 4 lshift AFIO-EXTICR2 bis!  \ select P<C>5
                5 bit EXTI-IMR bis!  \ enable PC<5>
               5 bit EXTI-RTSR bis!  \ trigger on PC<5> rising edge
;

This code is only so long because of the repetition. It’s all fairly straightforward, once you go through the reference manual and find all the register settings.

Once count-pulses has been called, we end up with three variables of 2 words each, containing automatically-updating pulse counts and the last millisecond timestamp.

And because this is based on interrupts and running in the background, we still have Mecrisp’s interactive command loop to peek and poke around, and look at these variables:

pulses1 2@ . . 10 8819  ok.
pulses2 2@ . . 9 12928  ok.
pulses3 2@ . . 29 17537  ok.

Pulse counter #1 has pulsed 10 times since started, the last one being 8819 milliseconds since the last µC reset. It’s all working like a charm and it doesn’t involve any code or attention to keep this running, all we need to do is pick up these values when we want to report them. Onwards!

↧

Frequency aliasing in ADCs

May 18, 2016, 5:00 pm

≫ Next: Simple variable packet data

≪ Previous: Tracking pulses w/ interrupts

This is a pure sine wave, captured by the ADC + DMA code, as described previously:

The plot above consists of 800 samples, sampled 40 µs apart, i.e. at 25 kHz - for a total of 32 ms. A quick calculation would seem to indicate that we’re seeing 1.6 cycles of a 50 Hz sine wave.

Except that it’s not… the incoming signal used here was a 24,950 Hz sine wave!

There is no way to tell what the frequency of a sampled signal is without further information. The reason for this is aliasing, an important aspect in any situation where continuous signals are sampled - as in the case of an ADC. We’ll get the same result with 25050/49950/50050/… Hz.

The math behind all this is locked up inside the Niquist-Shannon sampling theorem, but the intuition for this phenomenon is actually quite easy to pick up.

Here is a high-frequency pure sine wave, sampled at a - too low - rate (SVG from Wikipedia):

Each successive sample is picking up a slightly earlier piece of the sine wave. Unfortunately, when you drop the real signal (the red line) and look at only the sampled value (the black dots), it all ends up suggesting an aliased sine wave of a much lower frequency (the blue line). Yet this last waveform is totally fake - it is not present in the original signal!

When sampling at frequency X, signals of frequency Y, X-Y, X+Y, 2*X-Y, 2*X+Y, … all look the same - no matter how fast that ADC circuit is, or how clean and noise-free that input signal is.

The fact that the 24,950 Hz capture above looked so much like a 50 Hz signal is in fact a tribute to the accuracy of the sampling obtained by running our ADC off a hardware timer. If you think about it: any jitter in the timing of the sampling interval would lead to a highly distorted sample in the case of 24,950 Hz, since the real input signal varies much more quickly than with 50 Hz.

When X is 25,000 Hz, any frequency from 12,501 Hz to 24,999 Hz will “flip over” and alias back into our sampled data as if they were signals of 12,499 Hz to 1 Hz, respectively. And the same will happen over and over again for any frequency above X.

This also points to a solution: if we filter out high frequencies before sampling, then all is well. As long as we keep all frequencies > X/2 out of the ADC. This is called the Nyquist frequency.

Ideally, we’d need a perfect low pass, which passes all signals under X/2 as is, and suppresses everything above. In the real world, such a filter does not exist, but we can choose a filter which starts filtering at X/4 or even X/10, and rely on the roll-off of a simple RC filter to do the job. With a sample and hold capacitor on the input of an ADC, a well-chosen resistor can also work.

There is also another approach to avoid aliasing: assume that we know that frequencies > Z are not present in our signal. Perhaps the properties of the input circuit are such that they already limit the frequency response. Then we could oversample at a higher rate 2*Z, perform some digital filtering, and then throw away the extra samples to end up with exactly the data rate we need. This last step is called decimation. This doesn’t change the fact that sampling should never see frequencies above half the sampling rate - we’ve merely moved part of the work over to DSP.

For the JeeLabs Energy Monitor prototype, the plan is to sample at 25 KHz - so we have to block all signals above 12.5 KHz. What this means for the actual circuit still needs to be determined…

↧

Simple variable packet data

May 19, 2016, 5:00 pm

≫ Next: Tying up several loose ends

≪ Previous: Frequency aliasing in ADCs

Until now, most of the wireless sensor nodes here at JeeLabs have been using a simple “map C/C++ struct as binary” approach as payload format. The advantage of this is that it simplifies the code in C (once you wrap your mind around how structs and binary data are stored in memory, that is) - but it’s also a bit C-specific, compiler-dependent, and not truly flexible.

With Forth now taking over both sides of the wireless RF link, it’s time to revisit this approach.

Here is a new design, which can efficiently package and send a number of numeric values in a generic form, and which is also easy to decode at the receiving end. This format and design is not language-specific, in fact it has been the basis of several commercial and open-source products since 1991. It’s also used in the p1scanner code in a previous release of HouseMon.

The main idea is to transform integers into variable-length byte sequences. A packet is then simply a series of such values, starting with a packet format ID.

This particular encoding represents small positive integers more compactly than larger ones. And it’ll work with arbitrarily large integers (the current Forth implementation can deal with up to 64-bit values). Negative integers can be handled in several ways, to be described further on.

The following example is for 32-bit integers, the most common size on ARM µC’s, since that’s the size of a native int and of a value on the Forth stack.

Here is how a 32-bit integer is converted to a sequence of 1..5 bytes:

As you can see, bits are grouped 7 at a time, with the high bit set only in the last byte. So all a decoder has to do is to advance through input bytes until it finds one with the high bit set, while accumulating and shifting the result 7 bits at a time.

Here’s the trick: leading zero’s are skipped. Encoded byte sequences never start with a 0-byte:

If the entire value is 0, then this emits a single byte with only the high bit set, i.e. 0x80.

For negative values, with all the high bits set to one, we have several options:

just ignore the issue, and accept that negative values will always be encoded as 5 bytes
add an offset to make it positive (and subtract it after decoding) - probably the simplest solution, but it requires knowledge in the decoder about which values to adjust and how
convert the value N to abs(N) * 2 + sign(N) - and convert it back in the decoder

This approach leads to a compact binary data packet for simple integer values. By adding the convention that the first integer (a single byte, i.e. 0..127) represents the type of packet, we can play various tricks to make things more compact for certain cases.

Here is an actual encoded packet sent out by the JeeLabs Energy Monitor:

8102A8078E808895

There are 6 bytes with the high bit set, therefore 6 values:

0x81 = 1
0x02A8 = 2 * 128 + 40 (i.e. 0xA8-0x80) = 296
0x078E = 7 * 128 + 14 (i.e. 0x8E-0x80) = 910
0x80 = 0
0x88 = 8
0x95 = 21

One potentially very useful property of this encoding is that the 0-byte never appears as the first byte in these variable-byte encodings. Which means that the 0-byte can be inserted as special “escape mark” for all sorts of purposes. It could be followed by an alternate representation for a small string for example, or even be the start of a much richer “separator” convention to allow encoding complete JSON data structures. Note that 0-bytes can appear inside byte sequences when some of the intermediate 7 bit groups are all zero.

And since only the last byte of each encoded byte sequence has its high bit set, we could also traverse multiple values in reverse order, if needed.

For now, the special 0-bytes are not used. They leave the door open for future enhancements.

↧

Tying up several loose ends

May 24, 2016, 5:00 pm

≫ Next: Parsing P1 smart meter info

≪ Previous: Simple variable packet data

This week is yet another mixed bag of topics, all related to either the JeeLabs Energy Monitor, or taking the STM32F103 platform further with Mecrisp Forth.

It might not look like much from afar, but there’s substantial progress under the hood:

Here’s the setup I’ve been using during the development of the USB driver:

That’s two HyTiny’s “in series”: the one of the left is the target board, with USB hooked up - as well as the 6-pin power/swd/serial header connected to the second HyTiny. That one is then used as Black Magic Probe for serial debugging and to perform a full reflash when the target board falls off a cliff.

Which has been fairly frequent, as I try to make the board jump through some hoops!

↧

Parsing P1 smart meter info

May 24, 2016, 5:00 pm

≫ Next: The need for multitasking

≪ Previous: Tying up several loose ends

The smart meter at JeeLabs looks like this:

It’s a Landis & Gyr E350, which monitors all power coming into the house and going out (when solar PV production exceeds local consumption). There’s an RJ12 jack on the bottom right, with serial data coming out at 9,600 baud (newer units send at 115,200 baud).

Every 10 seconds, a “telegram” of information is sent out, which looks something like this:

/XMX5XMXABCE000046099

0-0:96.1.1(30313337323430332020202020202020)
1-0:1.8.1(00003.540*kWh)
1-0:1.8.2(00011.199*kWh)
1-0:2.8.1(00000.000*kWh)
1-0:2.8.2(00004.667*kWh)
0-0:96.14.0(0002)
1-0:1.7.0(0000.35*kW)
1-0:2.7.0(0000.00*kW)
0-0:17.0.0(999*A)
0-0:96.3.10(1)
0-0:96.13.1()
0-0:96.13.0()
0-1:96.1.0(3131323838323030303336383037303132)
0-1:24.1.0(03)
0-1:24.3.0(121129160000)(00)(60)(1)(0-1:24.2.0)(m3)
(00014.684)
0-1:24.4.0(2)
!

This unit has been in operation since end 2012, with a JeeNode attached to pick up P1 data and send out wireless RFM12 packets, using the same variable format described in a recent article:

OK 18 129 1 83 111 232 1 47 58 201 1 55 1 142 3 26 45 233 130 144 [...]

The sketch used to extract data from P1 packets is called p1scanner and can be found on GitHub.

Here is an essentially equivalent re-implementation in Mecrisp Forth:

       8 constant p1#
p1# cells buffer: p1.buf
       0 variable p1.type
       0 variable p1.value

: p1clear p1.buf p1# cells 0 fill ;
: p1save ( pos -- ) cells p1.buf +  p1.value @ swap ! ;
: p1dump cr p1# 0 do i cells p1.buf + @ . loop ;

: p1select ( type -- )  \ these values are for a Landys & Gyr E350 meter:
  case
      181 of 0 p1save endof  \ cumulative electricity consumption, normal
      182 of 1 p1save endof  \ cumulative electricity consumption, low
      281 of 2 p1save endof  \ cumulative electricity production normal
      282 of 3 p1save endof  \ cumulative electricity production low
    96140 of 4 p1save endof  \ tariff
      170 of 5 p1save endof  \ actual consumption
      270 of 6 p1save endof  \ actual production
     2420 of 7 p1save endof  \ cumulative gas consumption
  endcase ;

: p1char ( c -- )
  case
    [char] / of p1clear endof
    [char] : of 0 p1.type ! 0 p1.value ! endof
    [char] ( of p1.type @ 0= if p1.value @ p1.type ! then 0 p1.value ! endof
    [char] ) of p1.type @ p1select endof
    [char] ! of p1dump endof
             dup digit if p1.value @ 10 * + p1.value ! then
  endcase
;

: p1test
  begin
    uart-irq-key? if uart-irq-key p1char then
  key? until ;

The p1select word filters specific values in the data and stores them in an array of 8 entries. The p1test word simply listens to the 2nd serial port, and feeds all incoming characters to p1char. This 2nd port uses interrupts with a ring buffer to avoid losing incoming data. So at 9600 baud and with a 128-byte ring buffer, serial processing needs to start within ≈ 120 ms.

The logic of p1char is the same as the p1_scanner() function in the original C++ code. It plays the same tricks to ignore many of the incoming characters, only triggering on a few specific ones, while also identifying and parsing each numeric value.

Here is a test run, with p1test reporting on Mecrisp’s serial UART1 console, while the above test packet was manually pasted three times into a second terminal session tied to UART2:

p1test 
3540 11199 0 4667 2 35 0 14684 
3540 11199 0 4667 2 35 0 14684 
3540 11199 0 4667 2 35 0 14684

As you can see, all the important values have been properly isolated and parsed, ready to be sent out in the reporting section of the JeeLabs Energy Monitor’s code.

The actual hookup will need some more testing. The interface is described in an old P1 revisited weblog post, but it’s not clear yet whether this will also work at 3.3V and perhaps even without inverting transistor stage. Some more experimentation is needed…

↧

The need for multitasking

May 25, 2016, 5:00 pm

≫ Next: Using a buffered serial console

≪ Previous: Parsing P1 smart meter info

With an increasing number of sensing and reporting activities taking place on the JeeLabs Energy Monitor (JEM) prototype, things are starting to become a bit more complicated.

How can we deal with such a multitude of tasks, each with their own timing requirements?

The traditional (or perhaps one should say: modern?) answer to this is to include a Real-Time Operating System, which has built-in task switching and can offer some hard guarantees on how quickly a task can be triggered to run on specific external events.

Forth, on the other hand, has very early on (and that means decades ago!) implemented a very low-key “cooperative” form of multi-tasking, whereby tasks voluntarily pass control to other tasks in a round-robin fashion.

There is much to be said in favour of this approach, over the “pre-emptive” style, which can stop and start tasks in very abrupt ways. The benefit of pre-emption, is that it’s better at guaranteeing a specific maximum response time. The drawback is that it massively complicates the code, to avoid getting interrupted in troublesome ways, such as in the middle of incrementing a counter.

One advantage of collaborative vs. pre-emptive, is that processing becomes deterministic again. This makes it far easier to reason about what is happening, and in what order.

With the collaborative approach, all tasks must be written in such a way that they periodically relinquish control. The longer some (any!) task waits to do so, the longer the worst-case delay will be when servicing pending requests.

In the case of JEM, all the hard timing requirements are relatively lenient, i.e. in the order of several milliseconds. As long as a task never spends more than a few ms before passing control to the next task, we’ll be fine.

The key trick is to handle all really strict timing demands with interrupts or DMA:

acquiring 4 ADC channels @ 25 KHz: this is handled by DMA, with buffers large enough to allow servicing within 10..20 ms, and requests coming in once every 30 ms on average
counting pulses on 3 pins: this is handled using external interrupts, which perform all critical timing and counting tasks - there are no strict requirements for further processing
reading and parsing serial data from the smart meter’s P1 port at 9600 baud - this uses interrupts with a 128-byte buffer, which can hold over 100 ms of incoming data
parsing the DCF77 radio pulse stream - these come in at 1 pulse per second, each pulse must be timed to distinguish between 0.1 and 0.2 second wide - this has not yet been implemented, but can probably be done with a hardware timer interrupt once every 10 ms
sending out wireless packets every few seconds - this is so relatively slow that it can easily be done in a main loop, while keeping track of elapsed time

None of these tasks take much processing time. But there is one activity missing: 6) processing the four acquired ADC signals to detect the zero crossings, and to calculate voltage times current for each of the three Current Transformers. It should be relatively easy to relinquish control at least once per millisecond, even when some calculations might take much longer than that.

The Mecrisp multitasker can be found here. It’s written in Forth and supports dynamically adding and removing tasks, as well as waking up tasks from interrupt handlers. It’s very lightweight and works by having tasks call the built-in “pause” word once in a while.

Multi-tasking needs one stack per task (eh, two in Forth: a data and a return stack). These stacks must be sized for the worst case, i.e. maximum stack use (including interrupts). Allocating a task and its stack(s) for each item in the above list would require quite a lot of RAM space, but we can in fact do it all with just two tasks: 1) the command interpreter, and 2) everything else.

All we need is a way to “frequently enough” check a few cases, and trigger some activity when processing is required. Each of these cases can be dealt with sequentially. There’s no need to interrupt workflows in the middle of what they’re doing, and resume them later.

So what we can do is keep one task for Mecrisp’s command-line interpreter, and perform all the real work in a single second task. This way, we can continue to interactively type in commands, while all the main JEM activity continues in a separate task - i.e. in the background, essentially.

Here’s a general outline of how that second task in JEM could be structured:

a main loop, which processes ADC data when a new buffer has been collected
this main loop it needs to call a “chores” word at least once every millisecond or so
this chores word is set up to go through (i.e. call) several different, ehm… chores
important: each chore must leave the data and return stacks unaffected, once it is done
each chore checks for some condition, i.e. time to send an RF packet, or time to report the pulse counts, or new P1 data has arrived, etc.
when needed, each of these chores can do some processing, as long as it takes no more than say one millisecond (send an RF packet, parse P1 data, etc)

This approach is considerably simpler than switching between multiple independent tasks. There is merely a main loop, which branches off to do a few other things once in a while.

The reason this approach should be good enough here, is that we’ve been careful to do all time-critical work in interrupts or via DMA. It’ll be ok if some chores take a few ms once in a while.

There’s a major convenience with the above design w.r.t. development: it allows us to continue entering Forth commands at any time, using the Mecrisp UART1 console. This includes peeking and poking in a running system, but also restarting or even re-flashing the JEM board.

↧

Using a buffered serial console

May 26, 2016, 5:00 pm

≫ Next: USB serial in Forth, progress!

≪ Previous: The need for multitasking

Mecrisp Forth comes with a serial-port command line interface. This makes both tinkering and uploading new code a breeze, but it’s nevertheless a fairly limited setup:

no input buffering: if characters come in while the code is busy, they can get lost
no output buffering: sending any text to the console will block until all data is sent
the greeting sent to USART1 cannot be changed or redirected (at least in Mecrisp 2.2.7)
even if the console is reconfigured later on, a reset will still revert to USART1

It’s very easy to redirect console I/O, using a built-in mechanism to re-vector 4 essential words:

key?            ( -- Flag ) Checks if a key is waiting
key             ( -- Char ) Waits for and fetches the pressed key
emit?           ( -- Flag ) Ready to send a character ?
emit            ( Char -- ) Emits a character.

This can be done by assigning new handlers to these corresponding 4 variables:

hook-key?       ( -- a-addr )   terminal IO
hook-key        ( -- a-addr )     on the fly
hook-emit?      ( -- a-addr ) Hooks for redirecting
hook-emit       ( -- a-addr )

After reset, those variables are set as follows to use USART1 in polled mode:

' serial-key?  hook-key?  !
' serial-key   hook-key   !
' serial-emit? hook-emit? !
' serial-emit  hook-emit  !

If we want to change to an interrupt-based USART2 driver, for which an implementation has been created here and here, all we need to do is include those files and add this init code:

compiletoflash

: init ( -- )
  init  1000 ms  key? if eraseflash then  \ safety escape hatch
  +uart-irq
  ['] uart-irq-key? hook-key?  !
  ['] uart-irq-key  hook-key   !
  ['] uart-emit?    hook-emit? !
  ['] uart-emit     hook-emit  !
  cr init ;

This points the input vectors to the interrupt-based driver, and the output vectors to the (polled) driver for USART2. Note the compiletoflash - this code needs to be in flash to survive a reset.

The first line allows recovering from this setup. With “init”, it’s extremely important to prepare for the worst, as this code gets called after every reset. If there is any error in this code, we’ll never get control back! With the extra line, we can hit a key on USART1 to restore Mecrisp to its original state and remove this additional init word.

The above code relies on other code to generate a 1000 ms delay, which is why there needs to be a call to an earlier initinside the above code. In addition, init is called again just before exit, so that the custom greeting gets sent to the new console output device, i.e. USART2.

The above works well: on power-up and reset, the console is automatically adjusted to USART2, with all input stored in a ring buffer, so that incoming data is no longer at risk of being dropped.

But there’s still a risk: if we enter any of eraseflash, eraseflashfrom, or flashpageerase - then we could lose console access via USART2, since this can wipe out the above init override.

The simplest solution is: don’t do that… i.e. never enter these commands if you want to keep the console functioning as is. It can be inconvenient, but luckily we can still easily erase the last few definitions in flash and replace them with new ones using a cornerstone, defined as follows:

: cornerstone ( "name" -- )  \ define a flash memory cornerstone
  <builds begin here dup flash-pagesize 1- and while 0 h, repeat
  does>   begin dup  dup flash-pagesize 1- and while 2+   repeat  cr
  eraseflashfrom ;

Followed by this re-definition (thanks to Matthias Koch for this neat suggestion):

cornerstone eraseflash

What this does is to create a “reference point” for clearing flash definitions. When called, it will partially clear flash memory and remove all definitions entered after this one. Since this name overrides the earlier definition by appearing later in the dictionary, it effectively hides that older one. So from now on, typing eraseflash will clear flash memory, but keep the USART2 console implementation and the corresponding init definition.

This does not prevent calling the other two erasing words, but those are harder to use and intended for internal reference anyway (such as in the cornerstone definition itself).

What if we want to revert the code and restore a pristine USART1-based Mecrisp setup? Well… given that all previously-defined words are still present in the dictionary, that’s also possible:

enter the “words” commands and look for the address of the original eraseflash word
then enter “<address> execute” to run it - or, alternatively…
enter “$4000|$5000 eraseflashfrom” (the address depends on the Mecrisp build)

To summarise: with the above tricks, we can make Mecrisp (semi-) permanently use a different console I/O channel (serial or anything else, really), yet still regain control and restore the original polled USART1 implementation when absolutely needed.

The only risk is when we mess up and install an incorrect init word: in the worst case, we lose console access for good and can’t recover anymore. At that point, there will be no other recourse than to re-flash the entire µC memory by other means (see this summary for some options).

Note that all of the above could also have been used to turn USART1 into an interrupt-driven & buffered console. USART2 was used as example here because it’s easier during development to switch between two separate interfaces.

↧