(This page isn't on the nav-bar yet because it isn't ready for prime time.)
Because I posted this page to the CFRG list, but have made changes since then, I will track changes in this changelog in the short term.
Ed448-Goldilocks and its cousin Ed480-Ridinghood were designed to optimize the security vs performance tradeoff on popular platforms. This comparison shows that these curves succeed at their goal, as they are on the lower-right (fast and large) side of the graphs. The graphs here are log-log plotted, both so that all the data fits, and so that the power-law relationship of curve order to speed is evident.
The following image shows performance of various curves on Intel Sandy Bridge. It compares OpenSSL in green; the Microsoft NUMS curves in blue; Curve25519 in yellow and Ed448-Goldilocks in red. Of the Microsoft NUMS curves, the fit line only includes the proposed ed-{256,384,512}-mers. The Microsoft NUMS data and the OpenSSL data do not include point compression or decompression, except for mont-*-mers.
Estimates for Ed480-Ridinghood are shown in red next to Goldilocks. The arithmetic code for Ridinghood is exactly the same as for Goldilocks but with different shift values. So the expected ratio is about the ratio of field operations, i.e. 478/446.
This graph is slightly unfair in favor of the NUMS curves, because they are not performing point compression; and against the NIST curves, which is not as heavily optimized as the others; and against Curve25519, which was measured using the less forgiving SUPERCOP benchmarking platform. I don't have my Sandy Bridge test machine anymore, so I don't have SUPERCOP numbers for Ed448-Goldilocks on Sandy Bridge. Now that it's submitted to SUPERCOP, such numbers should become available by early October 2014. To my knowledge, Curve41417 has not been implemented and benchmarked on x86-64.
The OpenSSL curves in particular don't get a fair shake here, because the OpenSSL on my Mac test machine has terrible compilation settings and almost no optimization. For comparison, the results in teal are from the Gueron-Krasnov paper. From top to bottom, they show a newer and better-optimized default OpenSSL; OpenSSL with Käsper and Langley's NIST-p256 patch; and OpenSSL with Gueron and Krasnov's NIST-p256 patch.
The next graph shows the situation on a few different 32-bit ARM processors. I tested on a Trim-Slice with NVIDIA Tegra 2 (Cortex-A9, no NEON) and a BeagleBone Black with Sitara AM3358 (Cortex-A8, NEON). On the BeagleBone Black platform I used SUPERCOP to measure Ed448-Goldilocks, but on the Trim-Slice Tegra 2 it didn't compile properly (the older clang on that machine has crasher bugs). I also wasn't able to get Curve25519 to work in SUPERCOP on Tegra 2, so I reported the slower Curve25519-donna results and the faster WM 8850 (Cortex-A9, NEON) results from bench.cr.yp.to.
The larger dots are the A8 with NEON, and the smaller ones are A9 without NEON.
To my knowledge, the Microsoft NUMS curves do not yet have ARM implementations; and Curve41417 has only been benchmarked on Cortex A8+NEON.