Vladimir Prus

Desktops and Startups

2016-05-31T21:59:00.000+03:00

I spent a good part of 2015 working on a desktop app for a startup, not a typical combination these days. If you're building a mobile app, there are multiple companies offering platforms for every task you need, and a of recommendations which platform to choose. On desktop, you have to build or pick everything yourself. In this post, I'll share some advice.

UI framework

What does one use to create cross-platform desktop applications these days? Let’s first look at standard options at each platform.

On Windows, there are two modern technologies:

XAML, .NET and C#. Good stack if you like C#. In our case, we have a shell extention too, which per Microsoft guidelines should not be written with managed code, but doing that one part in C++ is not very hard.
XAML, Windows Runtime and C++/CX. This is newer technology, apparently coming from the part of Microsoft that prefers C++. The C++/CX language is native, with no CLR runtime involved, and is pretty close to standard C++, and is therefore attractive if you're a mostly C++ engineer. On the downside, you're limited to Windows 8.1 and later. Also, the application will be a Metro-style one—while there we announcements that desktop apps will be possible soon, that did not happen yet. Shell extension or tray icon will likely be a big problem.

On OSX, there’s only one native stack, although you have the choice between Objective C (not a language I want to write core logic in, ever) and Swift (not a language I want to write core logic in, yet).

Two technologies attempt to be fully cross-platform:

Qt and C++ (using either Qt Widgets or Qt Quick for UI). Works, in theory, on all Windows versions and OSX versions in existence, with Qt Widgets extemely stable and Qt Quick maturing.
Xamarin and C# (using Xamarin.Forms and XAML). Also a nice stack, and compiles to native code, but requires Windows 8.1 or later.

Given the above, what do you do if you definitely want to reuse core logic level? Since we wanted to support Windows version prior to 8.1, this rules out C#, and we actually already had ths level written in C++ with Qt. And while putting XAML UI on Windows and Objective-C UI on OSX on top of C++ core would be possible, that would add fair number of incidental complexity. Therefore, I went with the Qt option, and since the UI was not going to be extremly complicated, decided to give Qt Quick a try.

Of course, this was a tradeoff between development speed and native look. I have originally produced Windows version, and then creating first OSX version took about a day—clearly good at a startup. On top of that, any interface changes were immediately available for both platforms, just a rebuild away. The UI did not look exactly native on OSX, but close enough for most people, and on Windows, there’s already enough of style variety that nobody had any concerns.

On the negative side, it turned that Qt Quick is not as mature as I’ve hoped. I have covered this in detail in the previous post.

Conclusion: Qt with Qt Quick is fine if you have little UI and is already using C++. Xamarin might be a viable option as well

Installer

We wanted the installation and update to be as transparent as possible, with no questions asked. On OSX, using DMG for initial installation is the standard solution, and autoupdate can be handled by a library called Sparkle.

On Windows, Microsoft documents two options. Windows Installer (MSI) is the standard technology, but does not support autoupdate, and any custom solutions has to display prompt dialogs. ClickOnce is a newer technology for .NET applications, supporting installing into per-user location without touching any system directory and transparent updates. Sadly, it does not appear to support native C++ applications, only C++ applications built for CLR. I’ve ended up writing a custom installer from scratch.

The installer downloads an update manifest (in the Sparkle format), then downloads an archive for the most recent version, checks signature, unpacks to a subdirectory of users AppData folder, and run the binary from there. Auto-update works much the same—detect that update manifest has changed, and start the installer again. When the new version of the application starts, it uses IPC to ask any previous version to shut down.

This approach works suprisingly well—most installation attempts succeed, and most auto-update attempts do indeed update. It is interesting to note how diverse environments are over the world. For example, downloading 20 megabytes is almost instance for me, but it takes minutes, and sometimes several tries, for users in other locations. Adopting binary deltas, like Chrome does, sounds a good idea.

Conclusion: on Windows, custom installer is a viable option for a native binary that does not modify system, and on OSX, Sparkle works out of the box.

Signing

I first run into Windows binary signing at Mentor, when we were building our P2 installer, and at one point, the IT-mandated anti-virus software started deleting our own binaries. It would only stop when we got a code signing certificate. This time, couple years later, we had as much as three obstacles. First, Chrome would flag our installer as risky on download. Second, Windows SafeFilter would refuse to run it. Finallay, finally the antivirus software would randomly wake up and break things. Naturally, we had to get code signing certificate too.

The code signing certificate involves organization validation. Not much of a problem for a large company whose address can be validated by Google maps. But a small startup with no landline, no office number on the door, and no utility bills is deemed suspect by the registars. It took a while for our CEO to build appropriate chain of trust.

I would say the whole system is quite pointless. Surely, organization validation makes it possible to pass an address and a phone number to police, in case good guys become bad overnight. However, it does absolutely nothing to protect against good-willing but incompetent guys — for example who leak signing keys, build on virus-infected machines or do not secure auto-update. And it does nothing to protect against really-bad guys, who surely can fake addresses and phone numbers.

Conclusion: if you plan to publish windows applications, get a code signing certificate as soon as possible.

Crash Reporting

Crash reporting is part of Winwdows for a while, but by default, only Microsoft gets to look at crashes. One can apply for a developer account with permission to look at your own crashes, but the instructions start with ‘get extended validation certificate from one of these 3 CA’. Extended validation is even more messy than organization validation, and Microsoft could well not approve us anyway, so I looked at other options.

Chrome includes a library called Breakpad and while they have no server-side, Mozilla fills the void with Socorro. The setup could be easiert—in particular there are no official Ubuntu packages, one should use specific deployment system, and it requires ELK stack. However, after a few hours I could get the first test crash reported, and was going to start using it for real when I came across DrDump.

That service provides a library to handle crashes, including UI, cloud backend to store crashes, and a web app to review the crashes. On a crash, minidump is sent automatically and user is prompted to submit a full dump, unless one was previously collected for the same stacktrace. Overall it took about an hour to integrate and play with, and it worked quite satisfactory since then. It was surprising, when asked about pricing, to hear that the service is free—reportedly because with matching crash stacktraces, the disk storage required for full dumps is quite low. I doubt it would be still free for an app with million users each hitting unique bugs, but it remains free for us.

Conclusion: DrDump is a fine solution for an early stage startup. Breakpad and Soccorro will be also fine when you have time for devops.

Analytics

The mechanics of collecting analytics is easy. We use Mixpanel for mobile analytics and wanted to use for desktop apps as well. On mobile, there’s SDK that sends events, collects system properties and manages event queue. There’s no such option for desktop, but there is a REST API, which is sufficient. Adding basic system details like OS version is easy, and we did not bother implementing event queue. Surely some events do get lost, but it is not a big practical problem.

With the data in, Mixpanel makes it very easy to show a chart of event counts, filtered and segmented as you wish. Whether it is useful is not clear.

No chart comes with any statistical analysis. Want to check whether Windows 7 users really have lower conversion rate, and you need to do the math using other tools.
Box plot, the standard way to look at a relation between a continuous and a categorical variable, is nowhere to be seen.
You can only look at 90 days of data.

In essence, given that drawing basic charts is frictionless while statistical analysis demands external tools, Mixpanel lures you into making simplistic analysis. In the end, I have just exported everything and looked at data in R.

Conclusion: you likely don’t need any analytics services. It’s easy to send data into a database, aggregate it, and analyze using a decent statistical package.

Summing it up

While there's no compherehensive platform that helps to quickly launch a desktop application, there are enough pieces of technology that can be put together. The most important I've learned are:

Pick a cross-platform UI framework; you probably don’t have time to write separate code for different operating systems. I found Qt to work OK, but you might want to also consider Xamarin.
Get code signing certificate early
Do your own analytics. Decide what will be most important when you launch, figure what statistical tools you’d need, and create events to support that.

Qt Quick on Desktop

2016-05-19T11:08:00.000+03:00

I worked with Qt quite a bit over the years, but it was only in 2015 where I had a chance to do substantial work with Qt Quick. I wanted to share some impressions, specifically for desktop applications.

Summary

Here is an advance summary of the key points, starting with advantages:

Qt Quick is now a mature way to build desktop application that either use standard-looking desktop controls, or have relatively simple custom UI.
Data binding is quite pleasant in all ways. However, it's only limited to binding UI properties to expressions over model properties. Dynamically changing UI structure is possible, but is rather convoluted.
Animation system is solid and support for GL effects is much more convenient than writing GL directly.
For a pleasant surprise, it has a state machine built in, thought with some quirks.

But not everything is perfect:

The set of standard controls and styles could be larger. If you wish to achieve Metro design or Material design on desktop, you might need to use third-party extensions or do it yourself.
Styling merchanisms (in Qt Quick Controls 1) are quite limited, having neither inheritance nor attributes, and Qt Quick Controls 2 have something else entirely.
There are two different layout mechanisms, each with its own quirks.
As of Qt 5.5, High DPI support involved doing the math yourself. This might have improved since.
Open GL is required, and especially on Windows, the set of possible GL configuration is large, the documentation is imperfect, and there are "interesting" differences in behaviour.

What is Qt Quick?

Let's clarify some terminology first:

QML is a language that defines a tree of objects, along with some property bindings and executable code. It uses custom language for the tree proper, and Javascript for expressions and functions
Qt Quick is a set of basic visual components, and an engine to render them
Qt Quick Controls is a set of standard UI controls and layouts.

These are pretty much always used together, so I'll use "Qt Quick" throughout regardless of what layer I really talk about.

QML and Qt Quick

In order to illustrate how Qt Quick UI is put together, I'll use a simple part of registration UI, shown below

As you start typing a phone number a decorative line under the input fields turns into progress bar, and as you submit the form, the progress bar becomes indefinite one, and when error is returned, the hint below the input fields becomes an error display.

The progress bar is a custom component that I won't discuss in detail, but once but once written, it can be easily used:

CustomInput {
    id: phoneNumber
}
CustomPercentageLine {
    percentage: model.validPhoneLength == -1 ? 1 : phoneNumber.text.length/model.validPhoneLength
    animated: model.working
}

Here, model is an instance of C++ class that we've injected into QML. The binding expression checks whether a phone has fixed size, and if so, computes progress bar percentage. And if model is busy validating phone number, and its working property is true, the animated property of the view is also updated, causing progress bar to show indefinite state. Similarly, this is how hint and error display is implemented

CustomLabel {
    text: model.error ? model.error : "We'll confirm your number by sending a one-time SMS"
}

Data binding is certainly good, and QML offers certain syntactic convenience, as we can use JavaScript expressions with no escaping - unlike XML-based templating engines. This gets particularly important for larger expressions, like the one below:

CustomLabel {
    text: {
        if (model.error) {  return model.error; }
        if (model.state === "resendingPin") {
            return "Enter the previous code or wait for a new code to be sent";
        } else {
            return "Enter the code";
        }
    }
    color: model.error ? "#ce4844" : "#999ba4"
}

The animated property of my custom progress bar is used to indicate indefinite progress, and is shown by three rectangles moving across the blue progress line. Making the rectangles move is easy with the animation framework, for example there's the animation definition for the first rectangle:

NumberAnimation {
    id: animation1
    target: rectangle1
    property: "x"
    duration: 2000
    easing.type: Easing.OutCubic
    loops: Animation.Infinite
    from: -6
    to: parent.width
}

Animation for the second and third rectangles is similar, but they should start with a delay, so we need to use a separate timer item:

Timer {
    id: timer2
    interval: 500
    onTriggered: animation2.start()
    repeat: false
}

As soon as the animated property of progress bar is set to true, we start animating the first rectangle, as well as the timers that will start animating two others:

animation1.start();
timer2.start();
timer3.start();

It would be more convenient if animation supported delayed start without auxilliary timers, but with exception of that, the mechanism is OK.

For managing larger UI changes, the state machine framework comes handy. For example, upon startup we show a spalsh screen and attempt to connect to backend server and check for authorization, showing the login screen only if necessary. If we can't connect quickly, we'll show a separate screen with progress bar. Here's the relevant code:

DSM.State {
       id: initial
       DSM.TimeoutTransition {
           targetState: waiting
           timeout: 10000
       }
       DSM.SignalTransition {
           signal: model.connectedChanged
           guard: model.connected && !model.isAuthenticated()
           targetState: needPhone
      }
}

It illustrates two features. First, it's possible to transition to a different state simply after a timeout. Second, it's possible to transition to a different state on a signal, if a particular condition is met. However the second example also shows one of the largest inconveniences: you can't easily transition when a particular expression becomes true. I'd much rather not bother thinking what signals are emitted, and write this:

DSM.SignalTransition {
    guard: model.connected && !model.isAuthenticated()
    targetState: needPhone
}

After all, tracking changes to expression already works for properties. There is a workaround to get this effect with auxilliary items, but it ought to be a standard feature. Still, having state machines as standard feature much simplifies UI logic.

Controls Styling

I'll start the critical part of my post with controls and styling. The below screenshots shows the controls gallery example with the two styles available to Windows desktop application in Qt 5.5.

You might wonder what is the difference between Desktop and Base styles. The Desktop style, which is the default, actually uses Qt Widgets style engine to render everything, so it looks exactly the same as the standard Qt Widgets. It's good if you want to take advantage of Qt Quick without changing the look, but creating custom styles in C++ certainly does not look very attractive.

One can easily switch to the Base style, implemented entirely in QML, but it it slightly less polished, and has quite of lot of hardcoded styling, such as:

Rectangle {
    radius: TextSingleton.implicitHeight * 0.16
    border.color: control.activeFocus ? "#47b" : "#999"
}

The literal 0.16 is repated in 8 places over several files in the base style definitions, while #47b is found at 14 locations - not what I'd call solid engineering, compared to say most CSS frameworks or Android style system, where you can make consistent changes with a few attribute declarations. There are various third-party solution that offer modern styles for Qt, including Ubuntu Components and Papyros, but these are not actually styles -- you need to use their own buttons and inputs and other components, you can't just restyle your existing code.

The set of controls is quite standard too. For example, there's not even implementation of filtered list view.

Layouts

Qt Quick offers two approaches to layout. Anchor-based layout allows you to position an item relatively to another item - for example you can fill parent and add margin, or you can put an item to the right of another item. It's easy, but not very adaptive.

There's is also dynamic layout, where desired size of items and available geometry is used to place items on screen. That's what I used, and it mostly works, except for annoying inability to set margins - as the margins feature in anchor layout is actually specific to anchor layout. I ended up writing a custom component that specifically adds margin around content.

Not invented here

Both styling and layout issues above makes me thing that Qt Quick is being too original. It's a declarative UI language, so looking at CSS and trying to be more conceptually similar would have helped. Sure, CSS is not perfect, but a lot of people know how to use it to style everything from text to buttons to toolbars, and even create random geometric shapes. Common behaviours like borders, margins, and shadows are trivial to accomblish. Android also has declarative UI with what I find a better style system and layout mechanisms. XAML is also a popular solution. It would be great if Qt Quick was at least conceptually similar to one of these.

Also, given that QML is heavily based on JavaScript, one would imagine JavaScript modules would work. Maybe not the fancy async module systems, but just standard CommonJS modules. They don't work, with no progress since 2011, as you can see in the issue.

Overall, it would be great if Qt Quick were more aligned with other technologies. I have no idea whether it was possible given actual development timelines.

Quality of implementation

There are a couple of issues that were important, but not fundamental.

As of Qt 5.5, we found that High DPI support is not quite good. Every time you specify a position or size, it's in pixels, so for good look on a High DPI system, one needs to have an utility function to scale pixels according to physical resolution. This might be fixed in Qt 5.6, but I haven't had a chance to try.

Qt Quick uses Open GL for all rendering. Of course, it creates lots of possiblities, but also configuration problems. Say, on Windows you might end up with 5 different GL implementations, not counting different video card vendors. We ended up with quite some crashes from users that had 'GL' in backtrace, and could not be reproduced. We also had fun getting things to work under Virtual Box, at one point even reverting particular Windows update for things to start working. This should eventually improve, but beware for now.

Wrapping Up

I found Qt Quick to be a pleasant framework for developing a desktop app with a relatively simple interface. There were some limitations and issues, but they probably will eventually be fixed.

The biggest concern is that the set of controls and styles is quite basic, and that the styling mechanism is not quite flexible. Furthermore, the new direction is is Qt Quick Controls 2, and it is both focused on mobile devices and uses a different styling approach, so unlikely to bring improvements on desktop. If I were to use Qt Quick on a larger project, I would probably use a third-party controls library.

Qt remains the best framework for desktop applications using C++. Whether it's the best desktop framework overall, given that Xamarin recently was open-sourced, remains to be seen.

iPhone app clearance

2015-06-26T22:48:00.002+03:00

I moved from iPhone to an Android phone recently, and among dozens of iOS apps I've tried over couple years, some deserve to be mentioned. I include links where appropriate for convenience, but have no affiliation with any of the developers.

Productivity

DayOne is a very nice journal application. I've used it to record what I've done each day, and then review entries weekly and copy into a different application. It served that purpose very well - the UI is clean, there's reminder feature, and swipe navigation between entries. It also supports sync via dropbox, so I could make entries on a phone and review on a tablet.

DocScan is an app to take a shot of a document with camera and convert it into PDF, as if produced by flatbed scanner. Its primary benefit is perspective correction that works very well. The files can obviously be sent by email, or added to cloud storage, though I found the number of required clicks a tad large. There's also some automatic conversion to black-and-white but it never produced results I've liked, so I always disabled it.

Fitness

Pretty much every fitness app is about using accelerometer and GPS to track steps and runs, which is quite boring, but I've came across two kinds of different apps.

Runtastic Push Ups is a push ups trainer and tracker. You put it on the floor, and it uses proximity sensor to count your push ups. Or, you can press the screen with you nose. That sound a bizarre idea, but I found it a fun in practice. They also make two other apps - Pull Ups and Squats - that use accelerometer to track something other than steps.

There are also apps that flash and built-in camera to measure heart rate. Cardiio does exactly that, while Azumio Stress Check determines your stress level using heart rate variability. The latter can be quite entertaining.

Meditation

In this saturated app category, I've picked Deep Relax and Self. The former has about 40 different sounds you can mix, and set an arbitrary timer. The latter is beautifully minimalist, maybe too simple for me.

Lifetracking

I've tried a crazy number of those, and only a few were still installed after 5 minutes. Those that survived were Step Journal, Charge, TnS, Lumen Trails and rTracker, but none of them ended up actively used. In particular, rTracker, which is widely praised, ended up ugly app that could not draw sensible charts.

Games

This mini-review is concluded by the single game app I found installed, called Music Tiles, where one clicks on black tiles that scroll from the top, producing sounds, and trying to go as far as possible. That's surprisingly addictive, especially on long-haul flights.

Last Nine Years

2015-06-01T22:29:00.000+03:00

Exactly nine years ago today, on June 1 2006, I have joined CodeSourcery, as the first Eclipse engineer. I had zero Eclipse and Java experience, but knowing KDevelop and GDB was deemed sufficient. Look like I did passably well, and for my part, I’m happy to have played a small part in a huge change to open-source embedded development tools.

Every current GDB tutorial for embedded development say to just load your binary to the target. It was my first big project, in 2006, to make it work, since GDB knew nothing about flash memory. I’ve ended up teaching it about memory maps, translating memory writes into flash erase and programming operations, throwing together support for some ColdFire chip, and finally adding a single checkbox in the UI.

GDB non-stop mode was entirely done by CodeSourcery. In this mode, each thread can be independently stopped, and examined, while others are running. There I’ve contributed to asynchronous processing of commands and reworking breakpoint machinery. We’ve made GDB handle breakpoints in constructors and function templates, implemented tracepoints, and different flavours of OS awareness. I was also part of initial prototyping for Python scripting.

On Eclipse side, we made just as many changes, but had less luck submitting them upstream, so describing them is similar to a research paper - it tells what’s possible, but you’re on your own if you want an implementation. Still, we’ve made Eclipse scan for hardware debug device automatically, modified project wizard to include debug settings and create projects you can immediately debug, implemented a IDE editor for hardware board descriptions, and modified register view to effectively deal with thousands of memory-mapped registers. Among that, I did manage to create and submit a new Eclipse CDT view - OS Resources - that shows tables of different objects on the debugged system.

Between Eclipse and GDB, there’s a small interface called GDB/MI. It also saw significant changes, becoming less stateful, adding new notifications (so that Eclipse view don’t have to explicitly pull the data on each stop), and improving variable access methods.

In November 2010, CodeSourcery was acquired by Mentor Graphics and our product went on to became Sourcery CodeBench, the decision based in part on progress made by open-source tools in the previous years. Understably, a lot of work after that went into integration with other products - including Mentor’s hardware debug devices, profiling tools and Mentor Embedded Linux. Personally, I went on to lead the IDE team, learning how to run a full distributed team across 12 time zones. We were less active in the open-source for a while, but gradually returned, and one of the biggest recent contribution is a product installer based on Eclipse P2 we’ve announced in 2014.

And then technology went full circle. The most recent open-source contributions from CodeSourcery team are patches for LLDB-MI, a bridge between LLDB and Eclipse.

In 2006, I’ve joined CodeSourcery in part because at the previous position, there was no longer anything to learn. Over years, I worked with the best people in each area: Daniel Jacobowitz and Pedro Alves on GDB, Carlos O’Donnel on Glibc and GCC, Mikhail Khodjaiants on Eclipse and of course Mark Mitchell, CEO who wrote a C++ frontend once. It was a great experience. Now, it's time to learn something new. See you there!

Branding Eclipse Products

2015-03-30T15:50:00.000+03:00

Last year we worked on a new Eclipse-based IDE, in particular creating product branding from scratch. Despite visual editors and several existing online tutorials, that still proved confusing, so I've decided to documented what we've learned.

In this post, we'll review branding of a simple product. It has:

One functional plugin, and one functional feature including that one plugin
One product plugin and one product feature including the product plugin and functional feature
Product configuration

To make things simpler:

I use artwork we used in our products.
There's no localization of strings
The welcome/intro screen is almost neglected, since we use custom HTML for that, and it might be best approach for any new product.

Of course, I assume you know what is plugin and what is feature.

Functional features

Functional feature, along with contained plugins, implements some useful behavior that you can potentially install into any Eclipse-based product. For example, EGit feature is equally useful for Java and C++ IDE. For proprietary products, it is often tempting to just mix everything together, but creating separate features is often beneficial. The first example is of exactly such standalone feature, see sources.

The important files in the plugin are META-INF/MANIFEST.MF, about.ini and about.html. For the feature, feature.xml and about.html are important. In particular, feature.xml relays most of the branding to a 'branding plugin', which, in our case, is the lonely functional plugin.

Starting with that source, you can import it into your PDE, click on feature.xml, export feature into a directory, install it into a second Eclipse installation, and then "Help->About" dialog will look like this:

In the row of icons that represent feature providers, there is our "crystal ball" logo. The icon is defined by the branding plugin, via featureImage attribute in about.ini. Should you have multiple feature with the same icon (pixel-wise), they will be merged in this dialog. Clicking "Installation Details" gives us in-depth information:

This dialog shows root P2 installable units in the Eclipse instance. The 'name' column is taken from 'label' attribute of feature.xml, and the description below comes from the 'description' element in feature.xml. The same description is shown when installing the feature. We can also use the "Properties" button to see more details from feature.xml, like copyright and license:

The above is fairly reasonable. The features tab, however, brings some surprises:

The "Feature Name" column is coming from branding plugin, the Bundle-Name attribute of MANIFEST.MF. The description below is composed from 'label' attribute in feature.xml and 'aboutText' attribute in plugin's about.ini. The icon is also coming from about.ini - and if you specify icon attribute in feature.xml, it is ignored. Finally, the "License" button opens about.html file in the feature directory - which is generally different from license attribute in feature.xml. Clicking the "Plug-in Details" button shows branding information for plugins, which is rather simple:

The "Provider" and "Plug-in Name" fields correspond to Bundle-Vendor and Bundle-Name in MANIFEST.MF. The "Legal Info" button opens about.html in plugin's root directory. As an aside, I'm not sure why it's called "License" for features and "Legal Info" for plugins and "License Agreement" for installable units.

Products

Products put together a set of functional features that make sense for a particular audience, and add particular overall branding. Physically, product consists of a product feature and product plugin, organized the same way as functional feature and plugin. The example source for that is here, which you can again import into PDE, export into P2 repository, and install into separate Eclipse instance, and then run it with the "-product com.codesourcery.seed.product.product" command-line option.

The key element of product branding is this extension in plugin.xml:

<product name="Example Eclipse Product" application="org.eclipse.ui.ide.workbench">
<property name="appName" value="Example Eclipse Product"/>
<property name="windowImages" value="images/csl16.png,images/csl32.png,images/csl48.png"/>
<property name="aboutImage" value="images/IDE_about.png"/>
<property name="aboutText" value="About text for the example product."/>
...
</product>

The first two properties define outside appearance of the product - it's name, shown in the window title, and its icon, shown in taskbar, or launcher, or window switcher, depending on your OS. The other two attributes affect the about dialog box, making it look like below:

Now it actually looks like a custom product! The installation details in this case are almost the same, except that it has two features, one depending on the other:

There are several other properties in product definition that are related to welcome screen, but as I've said, we replace it completely, so I'm not going to describe it. The example source code has some definitions if you're interested.

Launcher and Product Build

The product feature we've built can be exported from PDE (or built with Maven, if you wish), and installed into Eclipse, but we usually want to build a complete product that can be immediately run. We need product configuration (.product file) for that, and it's covered in detail elsewhere. As far as branding goes, we only need two details:

Showing custom splash screen on startup
Starting our product

The product configuration specifies them in fairly direct way - the product is specified as attributes of the top-level 'product' element, and the splash screen becomes a command-line attribute to the launcher. In the exported product directory, two files control this behaviour. First, the eclipse.ini files in the root directory contains '-showsplash com.codesourcery.seed.product' for the splash screen. Second, the 'configuration/config.ini' specifies the product to run. That almost completes our product branding.

Almost, because while product extension point can specify window icon and similar properties, the .product file also can specify those. When you do product export in PDE, the properties from .product files are copied into product extension point, so unless you duplicate them, you get product with no window icons. This problem is accounted for in the final version. We don't have this problem in practice, since we built the final product from the command line, and so splash screen and product id is the only branding we need in the '.product' file.

Do it yourself

I have put together a seed Eclipse product over at GitHub, and you are free to use it if you are creating a new product. I would suggest these tips:

Use high-resolution artwork, preferably created from vector originals, and keep those originals.
Having license in every about.html and every feature.xml is awkward. Either automate it, or refer to documentation for license terms.
Use the same label for each functional feature and its branding plugin
If you can get HTML support working on your target systems, use custom HTML instead of default welcome screen.

Hope this help!

Acknowledgements

Dmitry Kozlov has worked with me on this, while Sourcery Services allowed me to take time to summarize our experience.

Lean Analytics

2015-01-27T04:00:00.000+03:00

Last year, I often needed to display and analyze timestamped events, such as product evaluations, issue tracker activity or credit card expenses. After trying a few approaches, I've ended up writing a JavaScript library called Lean Analytics. It's based on dc.js, crossfilter.js and D3.js, and looks like this:

The easiest way to understand it is to just play with the demo or take a look at the demo source code. Below I'll explain what it is, when you'd want to use it, and when not.

Overview

The primary goal was to just visually show the trends in already collected, but rather dry data. The amount of data is fairly small, dimensions are few, and there's no need to extract hidden correlations between dozens of values nor there's a need for dedicated analysts to tweak the charts on a full-time basics. Rather, I wanted it to be extra easy to chart new type of data, don't store anything in the cloud, and embed the charts in existing web apps.

The library itself is bundled into a single JavaScript file, plus you need to include 3 CSS files. You also need to write code to define where do get data, what metrics to show, and how to group your entries - all of which is straightforward. For that, you get a lot of fine-tuned visuals:

Chart showing main metric (such as transaction amount) aggregated per week, as well as derived metric (such as trendline). There are also dropdowns to select desired metrics.
Compact linear charts showing distribution of the chosen metric over categories.
Tabular view of the data.
Filtering of main chart by category values in real time in your browser. The filters are even stored as part of URL, so you can share links easily.
Buttons to select time ranges.
Automatic progress and error reporting for loading data.

The charts are meant to replace a div in your host HTML document, and they use Bootstrap for styling, so probably will work just fine inside your internal webapps.

Alternatives

DC.js is the foundation for Lean Analytics, and together with crossfilter, does all the hard stuff of filtering data in real time in your browser. It can be used to create way more interesting visualizations, but it requires a considerable amount of code to configure all the details - way more that I was comfortable with.

Several libraries are implementing charts on top of D3, such as NVD3 and C3. Sadly, those are not integrated with crossfilter, and are somewhat in a state of flux.

Google Charts is very solid as far as charting goes, but does not support any crossfiltering either.

Kibana is a full-blown dashboard solution, on top of ElasticSearch. It's certainly great for serious data analysis, but is both not trivial to setup, and is not embeddable in webapps.

Mixpanel is fairly nice, but it's a cloud service, and I did not want, or could not, put data in the cloud.

Zenobase, finally, is a very nice solution specific to lifetracking, to answer questions like "how is my blood pressure correlated with weight". It is inspiring in some ways, but is also a cloud service, and too specified for life tracking to be directly useful.

Conclusion

If you want to chart timestamped events with numeric values that are naturally aggregated over weeks, and you want to filter data by categories in real time, and the amount of data is not very large, give Lean Analytics a try.

Calendars and Timezones

2014-12-23T19:12:00.001+03:00

It's boring to talk about calendars and time zones, but apparently major software companies still get this wrong, and a fair number of people end up very confused. For the latest example, in summer 2014, the time zone in Moscow, Russia was UTC+4, and it were to stay this way all year. Then the government decided to switch to UTC+3 in Autumn anyway, and I woke up with iPhone showing wrong time.

iOS: hardcoded time zone

Since iOS has time zone as part of OS, it still thought Moscow is UTC+4, so the clock was one hour later. That was easy to fix, I've changed iPhone time zone to a nearby UTC+3 one. And then, the calendar events started to randomly misbehave, showing one hour off, in different directions.

Google Calendar: works just fine

Suppose I create an event in Google Calendar, via web, at 18:00 Moscow (UTC+3). The invitation email to guests has a total of 4 MIME parts:

HTML part describing the event, with buttons to accept or decline
text part with a plain-text version of same
attachment with the application/ics content type, and invite.ics name
invisible part with text/calendar content type, and same content as invite.ics

Applications that understand invites will look at one of the last two parts, and see this:

DTSTART:20141210T150000Z
DTEND:20141210T160000Z

That is, the invite specifies event type as 15:00 UTC, with no timezone information, which is the correct time.

iOS: using time zone name

When I access my Google Calendar from iPhone, I see the same event on 19:00. Apparently, when accessing Google Calendar via Exchange protocol, iOS receives time zone information of the event, as "Moscow", checks its outdated database, and decides the time is 18:00 UTC+4 - one hour later than it should be.

Exchange: totally confused

Like iOS, our corporate exchange did not know about recent changes, and still thought Moscow is UTC+4, I've switched it to a time zone called "(UTC+03:00)‎‎ Kaliningrad, Minsk". When I create event at 18:00 the invitation email has 2 MIME parts:

text part that briefly say "When: Saturday, December 06, 2014 19:00-20:00. (UTC+03:00) Minsk", which looks right
invisible part with text/calendar content type

That calendar data is just the opposite to what Google does, since it names time zone:

DTSTART;TZID=Kaliningrad Standard Time:20141205T180000
DTEND;TZID=Kaliningrad Standard Time:20141205T190000

Not only that, but the name of timezone is different from the text part, and the timezone itself is also defined inside the content, as UTC+2, so that makes for 18:00 UTC+2 - one hour earlier than it should be. I don't have a good theory how it can be that broken.

Solution

What I ended up doing, and what you should do in similar situation, is find a timezone with the right offset and DST rules, but far away geographically and politically. I've ended up using Madagascar iOS and Nairobi in Exchange. Then, you should make sure that every single calendar system you use is switched to that timezone:

In Google Calendar, under Calendar Settings, modify Country and then Time Zone.
On iOS, modify Settings → General, Date & Time.
On iOS, also modify Setting → Mail, Contacts,Calendars → Time Zone Support.
In Exchange, or rather Outlook Web App, modify Settings → Regional → Current Time Zone. You can also go to Settings → Calendar and click "change your work week to the current time zone", though that does not matter much.

Psychology of Software Development

2014-10-28T19:07:00.000+03:00

In 1987, Tom DeMarco and Timothy Lister wrote Peopleware, and it goes like this:

The major problems of our work are not so much technological as sociological in nature.

25 years later, it looks still true. I have mostly worked around open source, where I do recall big public clashes, like gcc versus egcs, or glibc versus eglibs, caused by social reasons. Ongoing systemd saga is mostly social. Everywhere, design discussions become personal ones, with no way to decently get out. Regardless of technology, good maintainers often have particular character traits. Now that's just me, and the world of open-source and cute nonsense hacks. Do real psychology researchers have anything to say about real professionals?

Already in 1975, Flow and Intrinsic Motivation began to question classic ways to run a business. Since then, psychologists found many pieces of the puzzle, showing that sociology and psychology are indeed important. Managers often don't care. Waterfall was sometimes replaced with agile, with its "Individuals and interactions over process and tools", but that soon became a new cargo cult. In the original curgo cult (Surely You're Joking, Mr. Feynman!), natives sat with fake wooden headphones, waiting for airplanes with goods to arrive. These days, we sit with actual headphones, hold all the right meetings and wait for team to "self-organize" and for the breakthrough to happen.

Let me try to give an index of current psychology that apply to software engineering, so that if you want to a better software developer, or engineering manager, you can have an easy starting point. The material here follows the SECR 2014 talk I recently gave, but is edited to be more hands-off and is in English.

The puzzle pieces

We like to think that evolution meant to create humans all the time, but it's not a smooth directed process. Rather, it's a random competition between every two different genomes - including between viruses and animals, animals and humans, humans and other humans, and even humans of different genders - with no ultimate winner (The Red Queen). Multiple behavior strategies were formed in humans as the result, with emotions to activate them, and all of us have a custom mix of them to use. Each strategy is good enough, statistically, to stay in human genes, but no single one is guaranteed to work for any particular human. Worse, these strategies were created to work in prehistoric environments, which no longer exist (The Moral Animal). No wonder that we're driven by emotions, and nudges, and often make wrong decisions.

We know that all kinds of emotions are important to work, such as motivation (Switch), happiness (The Happiness Advantage) as well as overall ability to deal with emotions in oneself and others (Emotional Intelligence). Doing the right things is easier with right habits (The Power of Habit) and with helpful nudges (Nudge), but at times, willpower need to be involved (The Willpower Instinct).

Behind the surface, our brains are nothing but neural networks, and those are not perfect. They can quickly learn, and then respond very quickly, basically automatically. The clusters of strong neuron connections created by learning, called attractors, get stronger each time they are used, and we get progressively efficient. But that's only when learning has a proper feedback loop - fast, positive and consistent. Otherwise, we form false attractors - that fire fast, and get stronger with each use, but produce wrong results. We're particularly prone to create false attractors when feedback is slow, and probabilistic.

For thinking, false attractors are responsible for thinking biases (Thinking Fast and Slow), such as coming up unrealistic estimates, siding with other's unrealistic estimates, picking up first design that comes to mind without checking, and trying hard to avoid past mistakes that are unlikely to happen again. For emotions, seriously false attractors require attention of a psychotherapist, while in common case, we can get excited or upset just because we were in similar situation before, not because the response makes sense today (General Theory of Love). At the intersection of thinking and emotions, we often get excited about rewards, and regret later. One classic example is quarter-end rush to sign deals - and then figure out how to get them done. This is one case where introverts can be extremely helpful at work, since they don't get excited about rewards all that much (Quiet). We do have a capacity for slow, more formal and symbolic thinking, but it's much harder, and the need to use it often can only be pointed by others.

The picture

There is no final picture, and many pieces don't fit. Many are broken, despite passing statistical tests, because most published research findings are false. Many are discolored, having so small effect it's not clear where to put them. The sampling bias is caused by using mostly students for experiments. The measurement bias is here - say, how would anybody measure programmer's performance? I would hope nobody counted lines of code written or tickets closed. And many psychological theories are just not good scientific theories, if we use the definition that Steven Hawking gives in Brief History of Time:

good theory [...] must accurately describe a large class of observations on the basic of a model that contains only a few arbitrary elements, and it must make definite predictions about the results of future observations

Surely physics cannot predict the future of entire universe, but it is fairly good at predicting where a rocket would go if an engine is fired. Psychology, in contrast, has a large set of observations but struggles with useful predictions or interventions. Influential books are particularly at fault, explaining how flow, and happiness and emotions are important, but failing to mention that finding flow in gardening, and walking around smiling might not be great strategy for a person or society. Evolutionary psychology (The Moral Animal) holds some promise here, having small foundation, and trying to predict everything, but it's too early to judge its success.

The corners

Not having a complete picture is unfortunate, but looks like the corners are filled in - we know few important areas where neglect is detrimental.

Emotions. Emotions matter, and while Real Professional can do unpleasant things for a while, without emotions, habits and nudges, he will likely give up. I'd conjecture that often, team opinion and motivation about possible new features is more important than the estimates of market value.
Trust. Is everybody in the same boat, heading in the same direction, sharing the same prizes, and actually rowing when needed?
Feedback. For automatic thinking to work well, we need feedback that is positive, timely, and consistent. That does not happen by itself in complex areas, say quality, or architecture design, and need to be explicit part of the process - be it review of key architecture decisions few months down the road, or having low defect count celebrated.
Slow thinking. Automatic thinking will eventually make mistakes, and then start repeating them. If you have orderly predictable Scrum process, you can easily become too complacent. There must be time to make complex technical decisions, and time to review direction from the basic principles.

Do these corners receive systematic attention on your team? Are they among process metrics? Probably not, and probably some of them need improvement. Lacking full picture, we don't know how much effect any changes will have. Maybe, things are already not too bad in ad-hoc way. Maybe, some are totally broken. Anyway, we're software engineers, we're good at setting up processes, and if we keep these four corners in mind, we can make our processes work together with our psychology, and not against it.

Text Baselines in HTML

2014-07-31T11:55:00.001+04:00

I usually write HTML as part of a some quick hack, often personal, so I take every shortcut I can. But one thing I cannot stand is misaligned text - where two text elements appear next to each other horizontally, but their text baselines are not perfectly aligned. This is a much-discussed topic, but somehow most of the solutions that professional designers discuss don't work for me.

For example, take a blog post called Setting Type on the Web to a Baseline Grid. It's fairly detailed about global structure of the page, which is indeed nice, and it comes with an example, and when you look closely, you see this clear misalignment:

Incidentally, aligning baselines of two text elements with different font size is what I wanted to do yesterday, so let's see how to do it, starting with a case that works just fine. Here are the essential parts of HTML:

<body>
    <div>
        <span class="big">Hello</span>
        <span class="small">World</span>
    </div>
</body>

The way it looks, with boxes added for exposition, is this:

Each of the span elements creates a box with text. The height of each box is equal to CSS font-size property - which is 36px for the first box and 24px for the second. They are then laid out using inline formatting context that by default aligns baselines of each box - so the smaller box is moved down. If you examine the example 1 you'll see that second span element as offsetTop property of 11px - which is exactly what is needed to align baselines, helpfully computed by the browser.

If we were to compute that magic value of 11 ourselves, we'd be in trouble. The CSS font-size property effectively only tells the height of the text box. Although the size of actual letters (x-height and cap height) is proportional to box height, the proportion depends on the font. The position of the text baseline in the text box also depends on the font, so we have nothing to work from. For a very detailed discussion, see Point Size and the Em Square: Not What People Think. There are libraries that try to compute font metrics, such as Font.js, but they do it by drawing font on the canvas and then looking at pixels - rather roundabout way.

If it's hard to compute the right offset for the smaller box, maybe we can make the smaller box have the same height? There is the line-height CSS property indeed, which can be larger than font-size, but it adds the extra height equally on the top of the bottom. In our case, it will add (36-24)/2=6 pixels at the top, very different from the desired 11, as shown by example 2 and the below screenshot.

The bottom line is that when text elements with different font size are inside single inline formatting context, they are perfectly aligned by default, and replicating such alignment manually is hard. But what if want alignment outside inline context - for example if I want to send the second text element to the right?

Floats

One way to send an element to the right is "float: right" property, but example 3 looks rather bad.

This is not a browser bug - CSS spec is clear that when element is floated to the right in inline context, it will be aligned to the top, and this behavior cannot be changed. As explained above, shifting the element manually is rather unpractical. What we can do though, is to add a strut - text element with the large font size, serving only to force the alignment. Here's example 4 and a screenshot:

The final observation is that if inline box has no content, the browser adds true strut - of zero width. So, here's example 5 - with all borders removed, just perfectly aligned text

Using positioning

Instead of floats, one can use either "position: relative" or "position: absolute" on the second span. Using absolute position is fairly easy - with "right: 0" it will be moved to the right border of the container (example 6) but also move to the top - just like happened with float. Strut can be used to fix this, likewise. When using relative positioning, we don't need to do anything about baseline - everything remains aligned by default. However, we need to figure out relative horizontal shift - not too hard, but requires JavaScript (example 7).

Using text alignment

I have intentionally used a simple example here, but if displaying two words on different sides of a page is exactly what you need, you can just give width to both span elements, so that they cover entire width, and then make the second element right-aligned.

Conclusions

It is unfortunate that CSS neither supports baseline alignment across different blocks, nor exposes font metrics is a usable way. If you know exactly what fonts are used, you can compute offsets by hand. In other cases, use struts or relative positioning.

Rectangles with SVG

2014-07-09T19:58:00.001+04:00

One would think drawing a rectangle using SVG is easy - it's a basic shape, and there are numerous tutorials with straightforward instructions, such as:

<rect x="10" y="10" width="100" height="100" stroke="black" stroke-width="1" fill="white"/>

What is rarely explained is the exact meaning of the coordinates, and dimensions.

SVG operates in a pure mathematical coordinate system. Coordinates are real numbers and the lines of the shapes are infinitely thin. Width and height of a rectangle are distances between these infinitely thin lines. The stroke is applied alongside these mathematical lines, with equal width on each size. So, the above rectangle is actually 101 units wide - with 0.5 unit of stroke applied to the left of left line, then 100 pixels to the right right and 0.5 unit of stroke to the right. And there's a total of 1 unit of stroke inside the rectangle, so the empty inside of the rectangle is 99 units wide.

Coordinates don't need to be integer, and it can even be more logical to have fractional coordinates. The picture below shows a rectangle with extra content that is scaled up, in SVG, for exposition. The grid lines corresponds to integer coordinates in the unscaled world. The y coordinate of the rectangle is 2.5. With 0.5 of stroke added on both sides, it nicely occupies space between y=2 and y=3.

But when we try to display SVG using default coordinate system, when one uint is a pixel, things break down. There, integer values of coordinates fall between the pixels, and the browsers cannot quite agree about how to apply 0.5px stroke on both sides of a line. The below picture shows rendering of rectangles by Chrome and Firefox, using both whole-pixel and half-pixel coordinates.

Chromium does a logical thing. With integer coordinates, it applies 0.5px stroke on both sides of the line by making the pixel half-saturated - so we get line that is twice as wide and twice less dark. With half-pixel coordinates the mathematical lines go through the middle of pixels, so 0.5px stroke on both side result in a single pixel painted with the stroke color. What Firefox does, I don't know. If you look carefully, vertical and horizontal lines are pained differently, and top line is painted differently from the bottom line. Both with integer and half-pixel coordinates most of the lines are two-pixel, with different saturation of pixels. This looks like explicit programming, but I miss the logic. Of all of these I prefer Chromium rendering with half-pixel coordinates.

Of course, drawing a rectangle was not my goal. Rather, I was about to draw program control flow graphs using JointJS and Dagre. Overall, these libraries are rather good - I could quickly produce a reasonable prototype and JointJS code was easy to read and extend. However, it also effectively forces integer coordinates. There is one place in code where it did rounding unintentionally, and another place where coordinates are rounded explicitly. Will certainly work around this locally, but it would be great if people building on top of SVG pay attention to this issue.

If you want to follow along, here is the first example and second example.

IRC visualization using D3

2014-04-21T14:00:00.000+04:00

In order to learn D3, I've created some basic visualization of IRC messages, with pictures like below. GitHub has the sources and demo.

I started with an internal IRC channel, and thought there might be 2 or 3 group of people mostly talking to each other. The created graph was a perfectly round cloud. Some surprises in the center, but that's it. The histograms showed that some people are very chatty, and pairwise histogram of messages was even more telling - 1300 pairs that only exchanged a few messages, and 60 very active paths. Neither showing just the top 20 users nor excluding top 20 users revealed any grouping. Then I've made the each node in the graph be pulled to the only another node - the top recipient of messages - and got the above picture. There are some groups visible, and the grouping is indeed around functional areas.

The data in the demo is artificial, but I've tried to recreate the same distribution, and the results are fairly close. Trying on logs for some other channels, mostly open-source, was a failure. The histograms are similar - few people with a large number of messages - but the graphs had no grouping at all.

D3 itself is fairly nice framework, as many of its examples show already. Nice data binding mechanism, animation, and a lot of utilities. Still, it's a construction kit. There are no charts one can use. of out the box and even things like margins are copy-pasted between all examples. It was also required to fine-tune everything to the data, from chart dimensions to margins to parameters of the graph layout algorithm. If one tries on another data set, the axes of histograms break down, and the nodes of the graph disappear to the sides of the screen.

There are libraries that try to create reusable charts on top of D3. C3 appears to have nice standard charts, but completely hides all of D3's power. NVD3 is quite promising, but has no documentation and is apparently in the middle of extensive rewrite. DC is meant to be a frontend to crossfilter, but not so useful with arbitrary data - and in another context, I just could not get it to create charts I want.

I will likely use D3 in future, though will need a personal set of chart helpers first.

Eclipse P2 Product Installer

2014-03-20T12:19:00.000+04:00

Last year, the Sourcery CodeBench team have implemented a new installer, using Eclipse P2. This week, at EclipseCon 2014, two of key developers gave a talk about it, and announced that all of the source code is available under Eclipse Public License, at https://github.com/MentorEmbedded/p2-installer.

Before, we used certain commercial installer technology, and it was not much fun. There were annoying bugs (like randomly creating corrupted installers), and there were fundamental issues. It did not know anything about Eclipse components, so it could only install everything. And it used an XML-based description, which worked nice for simple projects, but became a nightmare to create or generate. We decided it's best to start from square one.

The P2-based installer has everything you would expect. There are pages to select components, to review licenses, and to select installation path. It can create shortcuts, change environment variables and supports uninstall. All components to install are provided in a P2 repository, and you can use bundled P2 repositories as well as remote. So, for example, you can include core functionality in the installer users download, and provide optional components via HTTP, and the installer supports selecting of optional components. It is also extensible using Java, both via P2 touchpoints and using installer modules that contribute custom pages. There's also an utility to pack everything together in an executable file you can download and run.

We hope it will be useful for anybody building products based on Eclipse platform. Kudos go to Mark Bozeman, Mike Wrighton and Richard Memory who worked hard on this. I am also thankful to management of Mentor Graphics' Embedded Software Division, who supported releasing this project into open source. Enjoy!

STL Visualization

2009-06-06T21:24:00.006+04:00

As of today, KDevelop can nicely display std::vector. I'll probably omit the obvious snapshot, and will point to a mailing list post with instructions for trying it. Instead, I'll tell the story of this feature.

For its entire history, GDB did not have any official way to display types from the C++ Standard Library in a sensible way. Several third-party scripts appeared, written in GDB's internal scripting language. However, they were fairly limited. You had to explicitly run those scripts, and all you got was text output without structure, making robust IDE integration impossible. Also, GDB's scripting language is itself unpleasant, and does not even have access to internal data structures and functions. It was clear that we need a way to write pretty-printers using real scripting language, with full access to GDB data structures, and proper integration with frontend interface.

The first prototype of Python-based pretty printing was written by myself during free hack slot at a CodeSourcery company meeting. It took maybe 4 hours, if not less, and could display std::string as string automatically. Some 4 hours more lead to the first public prototype. This version could automatically display std::vector as "[1,2]". The second prototype could finally display elements of std::vector as children, like one would expect in a variables tree of a frontend, and even report when new elements are added to the vector. However, this version took a couple of days of work, exposed a mere 4 functions from GDB to Python, and was a mess internally. It was clearly already outside the "quick hack" range.

Those prototypes would never turn into anything, were it not for Tom Tromey and Thiago Bauermann, who started a project to add complete Python scripting to GDB. This is much more ambitious than just pretty-printing. In particular, it includes defining new commands in Python, with full access to GDB internals. You can read more details in a post series by Tom.

Pretty-printing became a part of that large effort, and was greatly improved. One of the most notable change was incremental fetch of children. According to the C++ standard, an object does not exist until its constructor has exited. However, gcc debug info just lists all local variables in a block. A naive pretty-printer, when invoked on such a variable, would likely go into uncharted part of memory trying to fetch all children, and never return. To fix this, the Python pretty-printers were designed to use incremental fetch, using Python iterators, and GDB MI interface was also adjusted to be more incremental (yes, it's a trend). Beyond that, we've spend at least 3 weeks iterating on finer details. The GDB patch was finally checked in on Sep 15, and KDevelop4 patch shortly after.

This is still early implementation, and might have bugs, but now it's out for everybody to try.

Linking 101

2009-06-01T14:41:00.004+04:00

Recently, I see more and more people having trouble with link-time errors—as if such an error is the worst kind of luck and cannot be fixed by mere mortals. There are many possible reasons, including Java as default language in universities, and alarming spread of header-only-philia, but that's for another post. Here, I want to give a simple diagnostic procedure for link-time errors.
Let's lay some groundwork first. If your job is programming in C++, you need to know what the -I and -L options do, and how they are different. Also, given a full path to a library file (with .a or .so or .lib extension), you should be able to link to that file—in two different ways. If you don't know any of the above already, all hope is lost—you might want to consider other occupations. Otherwise, let's look at the diagnosis steps for most common error—'undefined symbol'.

First, understand where the missing symbol is supposedly defined. Educated guess is usually fine. For example, a symbol named boost::system::foobar is most likely contained in the Boost.System library (and it's surprising how many folks fail to guess so). Then, find how you are supposed to link to that logical component, using documentation for the component or the corresponding Linux package. For example, you might decide to add -lboost_filesystem to the linker command line.

Second, make sure that used physical library file is the right one, and that the linker is not picking a different version of the library from a directory you don't expect. On Linux, you can use the -t flag for the GNU linker (or use -Wl,-t on gcc command line). This will print full paths for every library used, including those specified with the -lfoo syntax. For static linking, this will also tell which object files from the static libraries were used. If you get error when running the application, you one can use the LD_DEBUG environment variable. If you set that variable to help prior to running your program, you'll get a list of possible values. The most handy value in our case is files.
On Windows, the /VERBOSE:LIB option to the Visual Studio linker will produce comparable diagnostics.

Third, if you seem to link to the right library, there are three further possibilities. First, maybe the library actually should not include the symbol. This can happen if you use wrong headers during the compilation, and can be debugged by passing the -save-temps option to gcc and checking the generated .ii file. Second, the symbol might be almost there—but slightly different—either using different calling convention (on Windows), or wchar_t mode (also on Windows) or a somewhat different types of parameters, or different namespace. In that case, you'll have to make sure the compilation options of the application match library's requirements. Finally, it could be that the library actually lacks the symbol due to library bug, and you have to complain to maintainer. To distinguish those cases, you need to manually examine the list of library symbols. With gcc, the 'nm' command will do for static libraries, while 'readelf' can be used on shared libraries (Unix only). On Windows, dumpbin.exe /symbols /out:symbols.txt somelib.lib can be used.

That's it for the common case. Below, I list some relatively common specific problems. The list does not claim to be complete, so if you know some other cases, drop me a line.

Static linking. For static linking, the order of libraries on the command line matters, so if you don't see the linking grabbing the object file with your symbol, you might want to either reorder the libraries or use the --start-group option. See ld documentation for details and note that the performance cost of the --start-group option might not be a concern these days.

References to vtable. The GNU C++ compiler sometimes reports unresolved reference to 'vtable for SomeClass'. This generally is a pure way to say that the first non-inline method of SomeClass is not defined. See GCC FAQ

Windows DLLs. On Windows, if an application wants to use a function in DLL, then both DLL and the application should record this intention, using __declspec(dllexport) and __declspec(dllimport) pair. If either party does not do so, linker complains. With mingw, a typical error is undefined reference to `_imp___WHATEVER'. It means that the library is static, whereas the applications wants to use shared library.

Windows import libraries. On Windows, it's not possible to directly link to a DLL. Instead, an import library is created and used—typically by passing /IMPLIB option to the linker. If the linker does not report any errors, but does not produce import library either, it's a sure sign that you have not exported any function from the DLL, and have to check the logic that adds __declspec(dllexport)

64-bit compilation. When building 64-bit applications with GCC, you can get an error that say something about "relocation R_X86_64_32", and suggesting the -fPIC option. The issue here is that 64-bit applications should include only code compiled with -fPIC, and if you link against any static libraries, those libraries should also be compiled with -fPIC. On Windows with Visual Studio, if you try to use 32-bit libraries when building 64-bit application, you won't see any warnings, just undefined references. If you look at symbols at the library, and see exactly the symbols that are reported as undefined, 32/64 mismatch is the most likely reason

KDevelop error display

2009-05-20T20:22:00.004+04:00

For quite a while I wanted KDevelop to display compilation errors directly inside the editor, as opposed to separate window you have to click in. It works now, as shown below. This was implemented by Ivan Ruchkin, a student at Moscow State University, who will be defending a term paper about various KDevelop-related work tomorrow. The patches will be posted to appropriate mailing lists right after that.

Variable tooltips

2008-04-19T19:42:00.005+04:00

The most voted-for feature request for KDevelop3 debugger was variable tooltips. I don't think KDevelop3 will ever get them, but KDevelop4 will, as shown below.

Debugger stories: pending breakpoints

2007-12-21T21:47:00.000+03:00

KDevelop 3.5 has a subtle bug. Sometimes, when you step over a function call, you don't stop on the next line. Instead, the application is resumed until it hits a breakpoint, or exits. This bug, in fact, is consequence of how breakpoints in shared libraries are implemented.

Suppose you've just started a debugger, and try to set a breakpoint on a function in a shared library. The library itself might not be loaded yet, in which case GDB cannot find the address of the symbol to set the low-level breakpoint. To handle this case, starting with version 6.1, GDB supports pending breakpoints. Such breakpoints don't correspond to any address in program, they only keep the specified breakpoint location as string. Whenever a new shared library is loaded, GDB tries to re-parse breakpoint location again, and if that succeeds, creates an ordinary breakpoint.

Now, this does not work when using the MI interface, for a couple of reasons:

When a pending breakpoint is resolved, it is deleted, and new one is created. And GDB fails to inform MI frontend about this.
It's actually not possible to create pending breakpoint using MI at all.

Because of these issues (and a bit of historic reasons) KDevelop 3.5 simulates pending breakpoints. GDB is asked to stop whenever a shared library is loaded, and when that happens, KDevelop tries to reinsert breakpoints. This works pretty well, except for the bug I mention in the beginning. Suppose you're stepping over a function call (this uses the "next" command on GDB level). The function opens some shared library, and which point GDB stops and KDevelop tries to reinsert breakpoints. After that KDevelop would like to continue the "next" operation, but it's already aborted by GDB. All we can do is continue the program.

But it's not longer the case today. As I wrote earlier GDB was recently modified so that a breakpoint can correspond to several addresses, such as of template instantiations. A breakpoint is re-evaluated each time a shared library is loaded or unloaded, and locations are added to breakpoint and removed as appropriate, but it remains the same breakpoint. The nice side effect is that pending breakpoints are now just breakpoints with zero locations, that are reevaluated just like other breakpoints, and don't ever change their number.

In addition to that, I wrote patches to add pending breakpoint support to MI -- which mainly involved getting rid of two parallel breakpoint-setting code paths -- one for MI and one for CLI. Thanks to review of Joel Brobecker and Daniel Jacobowitz, those patches went in GDB CVS eariler this month. KDevelop 3.5 SVN was modified to automatically detect and use this GDB feature. So, if you're willing to build CVS HEAD of gdb and KDevelop from KDE 3.5 branch, you can finally have breakpoints in shared library just working.

This was probably my last KDevelop 3.5 commit. KDevelop 4 is ahead.

Breakpoints in constructors

2007-11-26T21:30:00.000+03:00

Presently, no release of GDB properly handles breakpoints in contructors. This summer, I've worked on fixing that, and while it took longer than expected, it was eventually done, just in time for Sourcery G++ Fall release. The patches were also submitted for GDB FSF, missed the window for 6.7, but will be present in 6.8 release.

The underlying problem with breakpoints in constructors was that gcc generates two distinct function bodies for a constructor. One is a regular one that constructs the entire object, including all bases. Another one constructs everything except for virtual base classes. As it happens, gcc emits both constructors even for classes that have no virtual bases at all. GDB was not prepared that a given function name or source line corresponds to several addresses in program, so it picks one. And usually it picked the wrong one.

Constructor is the most common case, but is not the only one. If you set a breakpoint in a function template, you can have multiple template instantiations that correspond to a source line. An inline function can be inlined in multiple places, and lead to exactly the same problem.

The solution, obviously, is to teach GDB that a breakpoint can correspond to several addresses, and then create multiple-location breakpoints when needed. Now, whenever a user creates a breakpoint that resolves to a source line, GDB traverses line tables for all modules, and if it finds another address for the same line, that address is added to breakpoint. For a template or inline function, you can end up with quite a lot of locations, so you can review list of locations, and disable the unwanted ones.

The nicest bit of this is interaction with shared libraries. Say, you've set a breakpoint inside function template. If you load a new shared library, and it contains an instantiation of that function template, a new location will be added to the breakpoint, transparently. If a library is unloaded, the location will become 'pending', until you load the library back.

The side effect of this work was a serious improvement in the way breakpoints in shared libraries work, but that's a topic for another post.

Debugger Stories: Stack widget

2007-02-01T00:28:00.000+03:00

In KDevelop 3.4, the stack widget was not changed much. I can remember just two changes—one that is apparent and one that is subtle.

The apparent change is that we actually parse gdb output, and show it it a readable way, while in KDevelop 3.3 the stack frame formatting was entirely at mercy of gdb's "backtrace" command.

The subtle change is at the bottom of the screenshot—that "(click to get more frame)" thing. When a program stops, KDevelop fetches very few frames from gdb. If you click on that last item, then another chunk of frames will be fetched.

This behaviour is needed for two reasons. First, if your program is stuck in infinite recursion, and you try to interrupt it from KDevelop, in KDevelop 3.3 you're out of luck. As soon as the program is interrupted, KDevelop asks gdb for the list of all frames. Since your program is in infinite recursion, the number of frames is very large, and gdb is not very speedy stack-walker. So, you get to wait 5 mins for the stack to be shown. With incremental display, in a few clicks you'll see what function went astray.

The second reason is embarassing. Even without infinite recursion, getting the list of frames from gdb takes a lot of time. Something like half-a-second for getting 30 frames is not unheard of. Ideally, we'd fix gdb, but since we need incremental fetch anyway, fetching sufficiently small number of frames initially greatly improves responsiveness.

Debugger stories: Watchpoints

2006-05-31T13:12:00.000+04:00

One of my faviourite debugger changes in KDevelop 3.4 is proper watchpoint handling. Before explaining it, some introduction is due.

Say you're debugging and see that the 'foo' field of 'pointer_to_some_data' is completely bogus. You are sure that it was valid some time ago, like when its containing object was constructed, so the question is where the corruption happened. That's exactly what watchpoints are for. You set breakpoint at a code where 'foo' is known to be valid, and then ask the debugger to stop whenever the value of 'foo' changes. The debugger in turn writes the address of 'foo' to a special processor register, and processor will call back the operating system, and then debugger, when 'foo' changes.

Except that GDB does not work this way by default. If you say:

watch pointer_to_some_data->foo

there are two interpretations. First is to stop when memory location referred to by pointer_to_some_data->foo is modified. Second is to stop when the value of the expression pointer_to_some_data->foo changes, which can happen also if pointer_to_some_data changes. Obviously, when debugging memory corruption, you care about memory address, and pointer_to_some_data is just a way to specify the memory address. Alas, by default GDB uses the second interpretation, so to set watchpoint on address you should use:

print &(pointer_to_some_data->foo)
watch *$

But the problem is not just that you'll get false hits when pointer_to_some_data changes. The thing is that if that variable is a local one, or a function parameter, then GDB will immediately remove watchpoint when you exit the containing function. So, for KDevelop user it will be like that: you pick a local variable in a variables widget, you expand it, right-click on some member, select "Toggle watchpoint", and continue. The watchpoint you've just added immediately goes away.

KDevelop 3.4 solves this problem in a radical way. All watchpoints are address watchpoints. For any expression you enter, address is computed and watchpoint is set on address. Expression without address (rvalue) can't be watched and you'll get an error message if adding watchpoints for rvalue. Additionally, when the application exits, all watchpoints are disabled, because data addresses can well be different on the next run. When user decides to enable a watchpoint, the address of expression is evaluated again, and a new watchpoints is set to that address.

Hopefully this will make watchpoints more usable for the ordinary programmer.

Introducing MI branch

2006-05-02T11:17:00.000+04:00

For recent months, I was working on an internal reorganization of KDevelop debugger, informally known as "MI branch". Now that it mostly works, it's time to describe the goals and results.

The original goal was to use a different interface with GDB, called "MI", that's specifically meant for frontends. In MI mode, gdb output can be easily parsed into DOM-like structure, and examined in a nice C++ way, something like:

(*last_stop_result)["value"]["old"].literal()

Before, KDevelop was parsing GDB output intended for humans, and could in some cases misinterpret it. Like thinking that application is running, when it's actually stopped. This unreliability was the primary reason for switching to MI.

But MI is not a silver bullet. Both John Birch (original author of debugger part), and I had reservation about maturity of MI, which proved true eventually. I'll talk about this later, but basically, using MI does not automatically make debugger better, or faster, or anything, contrary to what many think. However, since using different protocol is a big code change already, I've sneaked in a number of architectural and GUI changes, hopefully for better.

So in the end MI branch had two goals:

Fix all glitches. Do you know that in some cases KDevelop 3.3 shows only half of local variables? Or that setting watchpoint in a natural way is rarely what you want? Or that if a program is stuck in infinite recursion, KDevelop will take minutes to show the stack? Each issue is not very significant in itself, but together they make user experience not confortable.
Cleanup internal architecture. Original architecture was a bit too centralized and adding new features required a lot of work. And since many cool features come to mind, it's better be fixed quick.

The "MI branch" itself is already merged to KDevelop 3.4 branch. In future posts I'll describe all changes the debugger has compared to 3.3 release. Stay tuned.

Non-constant size

2006-04-27T13:48:00.000+04:00

Quite some time ago, when I was learning STL, all information sources stressed the importance of learning complexity guarantees that methods of various containters make. One specific subtle thing is that the std::list<>::size() method runs in linear time, not in constant time. It was explicitly designed that way for a reason, described in STL faq, but what matters to an ordinary programmer is that testing lists for emptyness should be done with the empty method, not by comparing size to zero, otherwise it's easy to kill performance.

Today, I run into another case where non-const-time size() matters. I was testing KDevelop on some testcase, and noticed that getting list of stack frames from gdb takes a lot of time. I've added some profiling code, and found that a one command takes 200ms to execute. Adding profiling code to gdb revealed that gdb itself takes some 70ms. Of course, that's not ideal, but even larger fraction of time was apparently spend in KDevelop, ehm, parsing the response.

So I've quickly put up a testcase that repeatedly parses a specific response, and ran it under callgrind. Ten minutes later I've got a profile with strlen on top. It turned out that the parsing code was using QCString and calling it's length method at least one for each token, and for certain tokens -- once for each character. The length, in turn, just calls strlen. Since the input string was 20K in size, most of runtime was spend measuring the size of that string.

Another unexpected behaviour was found in the QCString::mid function. Internally, it also calls length, and mid was called once for each token.

After uses of non-const-time methods were reduced to minimum, the parsing time my test case decreased 40x. No so bad, I think. The only problem is that time spend in gdb is still to high for a GUI, and that won't be that easy to fix.

Printf debugging 2006

2006-04-06T13:57:00.001+04:00

One of the oldest methods of debugging is "printf debugging" -- putting various print statements in the code and then staring at the output. That's useful not only if you can't use a debugger. If the program does not crash, but produces wrong results after long computation, it's hard to figure where exactly the problem lies. In that case, printing intermediate data can be a very efficient method.

The only problem is that after adding print statements the program must be recompiled, and after debugging print statements must be removed. But it's possible to make gdb into printing machine using so called "breakpoint commands". Each breakpoint can have a list of commands that will be executed when breakpoint is hit. The commands can include printing and "continue". Here's a simplified example of gdb script I've used recently:

break main.cpp:1353
commands
   print ('lvk::nm_model::NM'*)this
   printf "Entering 'run', proc %d\n", $->processor_number
   continue
end
run

After putting this to a file "script", gdb can be run as:

gdb -batch -x script name_of_program > log

producing logs of variable values as certain points of the program.

Starting with version 3.3.0, similar functionality is available in KDevelop. Just click on the "Tracing" column in breakpoints window, select variables to print and click OK.

More screenshots here and here. This is a beginning, future KDevelop version will allow to specify custom commands for breakpoints.

Unlucky numbers: 48, 58 and 388

2006-04-04T18:23:00.000+04:00

One day I've got a remarkable bug report:

We're calling the function from your library, and after 48 successfull calls, it crashes. Can you look into this?

Initially, I was curious how did they count that '48'. It turned out that the application was repatedly doing the same action, calling my code alogn the way, and counting the number of repeatitions, so '48' was the exact number, and the thing conistently crashed after 48 calls.

The bug report did not include any calling code, so I've asked for the code to be sent, and went home, and while in a bus decided that either it's fixed size buffer in the calling code, or resource leak, like file descriptors leak.

And sure thing, next morning I looked at the only thing where I used plain FILE* in order to use Bison-based parser, and there was missing fclose call. Feeling rather smart, I've sent the fixed version back.

After 30 minutes new bug report arrived saying that the code fails after 58 calls. And this time, the bug does not reproduces for me. After several tries I found out that for me, the unlucky number is actually 388, so I need to wait a bit to reproduce the bug.

This was resource leak too, though a subtle one. The library was calling external tool, and modified the PATH environment to make sure the tool is found. As the result, the length of PATH variable steadily increased, and finally some OS limit would be reached. After that the value of PATH becomes completely bogus and the external tool won't be found.

I'm really glad we have Valgrind so at least memory leaks don't require any magic to debug.

Not my bug

2005-12-17T10:12:00.000+03:00

Often most debugging time is spent of pretty uninteresting bugs.

For example, most part of last Friday I was staring into debugger backed up by MIPS simulator, trying to figure out why exactly one test of 10 run decided to crash on a MIPS board. Let me first show you the code that was found guilty in the end:


static void itoa_really(unsigned long long value, unsigned int base)
{
    static char digits[] = {'0', '1', '2', '3', '4', '5',
                '6', '7', '8', '9', 'a', 'b', 'c', 
                'd', 'e', 'f'};

    if (!value)
        return;
    else {
        long rem;
        itoa_really(value / base, base);
        rem = value % base;
        target_putchar(digits[rem]);
    }
}

This is extremely basic code, but yes, it did broke everything.

Initially, the crash happened on a real MIPS board, on a pretty complex test. It was reduced to a test printing just two 64-bit values, and crash depended on the printed values. Say, printing 0 worked, but printing 0x11223344556677 crashed. It was not possible to debug the problem on the board, and the few floating licences for simulator were in use, so I tried to solve the problem by poking around. At one time I've added an extra line of code before call to "target_putchar", and the line was:


rem = 3;

Strangely, the crash was gone.

At this point I though that maybe there's some really wrong with 64 on 32 division, and 'rem' can become wrong, so I replace that "rem = 3" line with:


if (rem < 0) { rem = 0; }
if (rem > 15) { rem = 15; }

The crash appeared again.

Another try was:


if (rem < 3) { rem = 3; }
if (rem > 3) { rem = 3; }

and the crash was still there.

Finally, with:


if (rem <= 3) { rem = 3; }
if (rem > 3) { rem = 3; }

everything worked. Of course, all values was printed as "3333.....333", but it did not crash. Completely buffled, I decided simulator is the only help and eventually we've grabbed the license.

A couple of hours later, I found the one-character fix (literally). The linker config file specified mere 1KB of stack size, and when printing too large values, itoa_really recursed too deep, overflowing the stack, and overwriting a bunch of program variables. Changing to 4KB of stack set it all right. And what's even more funny, I never ever touched that linker config file, and have no idea what kind of programs are supposed to work on such stack space diet.