Deep Dives for CSV
Last year I got the chance to start working on the computer system validation of a new manual equipment for eletcropolishing and passivation for a customer of us in the Medical Device Business. This did sound easy and straight forward, especially since the planned software was considered standard off-the-shelve software. Only minimal adaptations due to specific local requirements were expected. The base software has been developed over the course of more than fifteen years and is running in about 90 fully automated installations in various industries around the world.
The installation is split into two almost independent equipment parts. A first smaller part consists of nine ponds for secondary passivation processes of just a few minutes duration. The second larger part consists of thirteen ponds for electropolishing and/or primary passivation processes for up to 20 and more minutes duration. In both parts of the installation the chemical ponds and the dryer ponds are redundant. The different cleaning ponds in both installation parts are singular.
The parts to be processed are arriving in batches as they were produced. They are manually put on racks in defined maximum group sizes. The racks are labeled with unique numbers and carry RFID chips for electronic recording. The loading of articles is done at charging stations, of which the larger part of the installation has four and the smaller part originally had one.
This manual charging step is one reason why the equipment is called a “manual equipment”. The other reason is that transportation of racks from pond to pond according to the recipe is also done manually. Moreover, movement of racks in the non-chemical ponds is as well done manually because trained and experienced operators move the racks more effectively than any robot. Only rack movements in the chemical ponds is done by motors for security reasons.
In order to navigate the racks safely on their journey following the given pond sequence of the recipe each pond is equipped with a three color panel, which is also used for data display. In some cases the panels carry soft buttons for simple local commands like “start processing” or “open cover”. Last but not least the charging stations have black and white panels for information display and also simple functionality like “start processing of rack”, “finished processing of rack” etc.
The administrative controlling master of the equipment is a Windows PC running the HMI software and serving, as the name says, as the interface between the operators and the equipment. Actually it fulfills a variety of tasks. Processing orders are recorded here using a barcode reader. The required recipe is downloaded from the controlled quality environment and transferred to the PLC for execution. Charging stations are reserved for processing orders and assigned to them.
After a processing order has been started on the HMI the order number is displayed on the panels of the charging stations reserved for the processing order. The operator reads the RFID of a rack allowed for the processing order at the RFID reader of a charging station reserved for the processing order. This registers the rack for the processing order. After loading the rack with articles the rack is released by the operator by pressing the respective soft button at the panel. The PLC now shows the number of the first processing pond on the panel of the charging station and starts the panel at the first processing pond blinking in green. The operator verifies processing order and rack number at the processing pond and registers the rack there by reading the RFID chip. The PLC switches the panel to red blinking in case the operator is at the wrong pond. Otherwise the PLC switches the panel color to constant orange to signal that processing can now start or is automatically started. For the chemical ponds and the dryer ponds the rack will be placed in the respective rack holder at the pond and the operator starts the process by pressing the soft button “Start” on the panel of the pond. In the cleaning ponds the operator dunks the rack in the pond and moves it manually. When the foreseen processing time has been reached the PLC signals end of processing by letting the panel at the pond start blinking in orange. The operator reads again the RFID chip and deregisters the rack from the pond. With that the panel at the next processing pond starts blinking in green and the panel at the last processing pond goes to constant green or red, in case one of the critical parameters for the pond is out of tolerance. The PLC collects for each pond time of registration, time of deregistration and associated values of controlled parameters (temperature, density, conductivity etc. at time of deregistration). After leaving the last processing station the rack is brought back to a charging station reserved for the processing order. After unloading the articles manually from the rack the operator deregisters the rack from the processing order. At this time the PLC transfers the rack file containing all relevant processing information to the HMI where the content of the rack file is loaded into a local Oracle data base for subsequent reporting.
During processing of the racks in the ponds the PLC monitors critical parameters such as temperature, density, conductivity etc. as far as these apply. In case any of these critical parameters violates the tolerance defined for it a respective alarm message (deviation) is generated by the PLC, signaled at the panel and transmitted to HMI where it is collected for later inclusion in the processing protocol of the processing order.
After processing of all racks needed for a processing order the operator closes the processing order in the HMI. At this point the processing protocol is generated showing all racks used for the processing order, the number of articles on each rack, processing times and critical processing parameters and especially any deviation which occurred during processing of a rack in a pond.
As it turns out the equipment is a bit more complex and complicated than originally anticipated.
When we reached a certain depth in the user requirement specification (URS) it turned out that there was a classic misunderstanding between the software provider and the customer. It has been clearly articulated by the customer that it must be possible to process up to four processing orders in parallel in the larger equipment part. The vendor therefore did foresee four charging stations. However, it was not explicitly stated by the customer that the same feature was expected for the smaller equipment part as well. The customer simply assumed that this was given. The vendor on the other hand deducted from the single charging station in the smaller equipment part that never more than one processing order was to be processed in parallel in the smaller equipment part. The precisely formulated URS made the misunderstanding transparent. In the end the vendor was able to resolve it by throwing in a bit more hardware and installing three additional charging stations for the smaller equipment part.
The software vendor pointed out several times, that his application architecture supports a great independence of the PLC from the HMI. In contrast to other solutions of competitors the HMI needs no longer to be present after a processing order had been started. All the processing and management is done by the PLC. If the HMI hangs or is unavailable for some time the PLC manages everything nicely and safely. This looks appealing and everybody likes it. However, when we started working on the user requirements for the reporting aspects we dug deeper into the mechanisms of handling the deviation messages. As these are critical from a validation point of view we needed to understand in detail how things worked. This digging revealed that the alarm messages were solely handled by the alarm system and not covered by the rack file. So, deviations are detected by PLC, communicated to the HMI and then kind of forgotten (for the PLC). But what happens if the HMI is out of operation when a deviation occurs? It turned out that deviations could be lost under certain (rare) circumstances if the HMI is unavailable for some time. With a PC and Windows as operating system there is always a chance that this happens. The solution to this problem finally was the introduction of a heartbeat signal send out by the HMI and received by the PLC. A heartbeat failure then causes corrective measures in the PLC. One of the unwanted side effects is that the PLC is no longer independent from the HMI. But we now are sure that deviations will not be lost.
Another tricky detail is the fact, that the PC has its time and the PLC also has its time. Both systems run their own hardware clock. As clocks in the real world they bear the risk of running out of sync. This is no problem in principle. However, for the alarm messages this turned out to be a potential problem. The reason simply being the fact that alarm situations are communicated from the PLC to the HMI via data structures and single bits being set and cleared following a well-thought scheme. What was not that well-thought is the assumption that both clocks are sufficiently in sync. Therefore the timestamps for the deviations are not communicated “by design”. For the HMI the deviation happens at the time, when the HMI recognized the bit being set by the PLC. So, on top of the heartbeat mechanism a regular and narrow synchronization of both clocks is now enforced.
When we started testing the application in respect to human error we found that many situations were not covered. In the beginning the software vendor asked the question “Why should an operator do this?”. Of course operators don’t make mistakes deliberately. But as we all they’re humans and we humans have a clear tendency to fail. With a fully automated system most of such erroneous situations simply don’t happen. The robot executes usually all orders literally. Humans fortunately don’t. Very often this prevents damage. But sometimes it causes difficulties. This is why developers of software must take human error into account. And this makes the software for human use much more complicated than software for machines.
Without deep diving, much of this had not been detected and had not been corrected.
As it turns out CSV can significantly drive improvement. Deep diving can be difficult but one learns a lot. And this often is great fun!