I ran across this document page of pytransform3d, and it claims:

There are two different quaternion conventions: Hamilton’s convention defines ijk = -1 and the JPL convention (from NASA’s Jet Propulsion Laboratory, JPL) defines ijk = 1. We use Hamilton’s convention.

It’s not new to know about different definitions (mostly the sequency differs), but what is this ijk=1 definition? First time to hear about.

Then I continue diving into the reference source it provided.

Only after this, I found that the problem is not only about the sequence of the components, but about something more fundamental. So I put down this summary for my future reference.

(q0,q1,q2,q3)(q_0, q_1, q_2, q_3) or (q1,q2,q3,q4)(q_1, q_2, q_3, q_4) ?

The answer is it doesn’t matter that much. This is not a mathematical or fundamental difference.

Equations can be easily converted. Codes can be easily modified.

ij=kij=k or ij=kij=-k

This is about math!

  1. Harold L. Hallock, Gary Welter, David G. Simpson, and Christopher Rouff, ACS without an attitude, London: Springer, 2017.
  • (p.16) Alternatively, one could follow a different convention with quaternion multiplication. Many authors prefer a convention that, although not expressed as such, essentially redefines Hamilton’s hyper-complex commutation relations (Eq. 1.5b above) into ij=k,kj=i,ki=ji j = -k, k j = -i, ki = -j

The quaternion representation is one of the best characterizations, and this chapter will focus on this representation. The presentation in this chapter follows the style of [99, 205, 219].

Which one is used in references?

Will keep updating as I read more references…

Using ij=kij=k and (q0,q1,q2,q3)(q_0, q_1, q_2, q_3)

  1. Yaguang Yang, Spacecraft Modeling, Attitude Determination, and Control Quaternion-based Approach, Boca Raton, FL : CRC Press, 2019. | “A science publishers book.”: CRC Press, 2019. [Link].

Using ij=kij=k and (q1,q2,q3,q4)(q_1, q_2, q_3, q_4)

  1. Harold L. Hallock, Gary Welter, David G. Simpson, and Christopher Rouff, ACS without an attitude, London: Springer, 2017.

Using ij=kij=-k and (q1,q2,q3,q4)(q_1, q_2, q_3, q_4)

还是没有搞明白为什么这就相当于重新定义了 ij=kij=-k

  1. F. Landis Markley, and John L. Crassidis, Fundamentals of Spacecraft Attitude Determination and Control, New York, NY: Springer New York, 2014.

  2. Malcolm D. Shuster, “The nature of the quaternion”, The Journal of the Astronautical Sciences, vol. 56, Sep. 2008, pp. 359–373.

  3. Hanspeter Schaub, and John L. Junkins, Analytical Mechanics of Space Systems (Second Edition), Reston, VA: American Institute of Aeronautics and Astronautics, 2009.
    (p.107) 似乎是默认了与 Rotation matrix 顺序一致的一种,即 ij=kij=-k



Reddit: [Discussion] Confidence Intervals for Forecasting

TF blog: Regression with Probabilistic Layers in TensorFlow Probability

  • Note that in this example we are training both P(w) and Q(w). This training corresponds to using Empirical Bayes or Type-II Maximum Likelihood. We used this method so that we wouldn’t need to specify the location of the prior for the slope and intercept parameters, which can be tough to get right if we do not have prior knowledge about the problem. Moreover, if you set the priors very far from their true values, then the posterior may be unduly affected by this choice. A caveat of using Type-II Maximum Likelihood is that you lose some of the regularization benefits over the weights. If you wanted to do a proper Bayesian treatment of uncertainty (if you had some prior knowledge, or a more sophisticated prior), you could use a non-trainable prior (see Appendix B).

RG: Is it possible to get confidence intervals in LSTM forecasting?

  • use simulations of multiple predictions to then calculate the prediction intervals
  • predict the parameters of a predefined distribution
  • predict forecast quantiles directly: Amazon’s MQ-RNN forecaster uses this approach (check this)

Prediction interval around LSTM time series forecast


Gaussian process 的重要组成部分——关于那个被广泛应用的Kernel的林林总总

Pontryagin duality


Latent variable

Latent variable model

An Introduction to Latent Variable Models

Stochastic (partial) differential equations and Gaussian processes, Simo Särkkä, Aalto University, Finland

Gaussian Process Regression using Spectral Mixture Kernel in GPflow

Other interests

Official documents


Probabilistic Programming & Bayesian Methods for Hackers (Version 0.1)

PyMC3 is a Python library for programming Bayesian analysis [3]. It is a fast, well-maintained library. The only unfortunate part is that its documentation is lacking in certain areas, especially those that bridge the gap between beginner and hacker. One of this book’s main goals is to solve that problem, and also to demonstrate why PyMC3 is so cool.

We assign them to PyMC3’s stochastic variables, so-called because they are treated by the back end as random number generators.

I use CNN for time series prediction, not for image works.

Learning Materials

  • How to Develop 1D Convolutional Neural Network Models for Human Activity Recognition
    • time series classification
    • two 1D CNN layers, followed by a dropout layer for regularization, then a pooling layer. 为什么这样?
      • It is common to define CNN layers in groups of two in order to give the model a good chance of learning features from the input data. 为什么这样?
      • CNNs learn very quickly, so the dropout layer is intended to help slow down the learning process
      • The pooling layer … consolidating them to only the most essential elements.
    • After the CNN and pooling, the learned features are flattened to one long vector
    • a standard configuration of 64 parallel feature maps and a kernel size of 3 (Where comes this “standard” configuration?)
    • a multi-headed model, where each head of the model reads the input time steps using a different sized kernel.

to read


Stacked with RNN

an effective approach might be to combine CNNs and RNNs in this way: first we use convolution and pooling layers to reduce the dimensionality of the input. This would give us a rather compressed representation of the original input with higher-level features. (from here)


像Fourier analysis这种,用一组完备的基函数,去表示任意一个函数,这种研究,wavelet analysis, taylor expansion,这些感觉都是一个思路,只是不同的基函数。

那么,有没有研究用 非正交的、非完备的、冗余很大的 一组基函数,去展开任意一个函数的数学分支?


Change Content root at Project Structure, so that I have the same pwd when run and execute selection in console.

  • 不然的话,两者的pwd有可能不同