首页 Python的隐马尔科夫HMMLearn库的应用教学

Python的隐马尔科夫HMMLearn库的应用教学

举报
开通vip

Python的隐马尔科夫HMMLearn库的应用教学Python HMMLearn Tutorial Edited By 毛片物语 hmmlearn implements the Hidden Markov Models (HMMs). The HMM is a generative probabilistic model, in which a sequence of observable \(\mathbf{X}\) variables is generated by a sequence of internal hidden states \(\mathbf...

Python的隐马尔科夫HMMLearn库的应用教学
Python HMMLearn Tutorial Edited By 毛片物语 hmmlearn implements the Hidden Markov Models (HMMs). The HMM is a generative probabilistic model, in which a sequence of observable \(\mathbf{X}\) variables is generated by a sequence of internal hidden states \(\mathbf{Z}\). The hidden states are not be observed directly. The transitions between hidden states are assumed to have the form of a (first-order) Markov chain. They can be specified by the start probability vector \(\boldsymbol{\pi}\) and a transition probability matrix \(\mathbf{A}\). The emission probability of an observable can be any distribution with parameters \(\boldsymbol{\theta}\) conditioned on the current hidden state. The HMM is completely determined by \(\boldsymbol{\pi}\), \(\mathbf{A}\) and \(\boldsymbol{\theta}\). There are three fundamental problems for HMMs: l Given the model parameters and observed data, estimate the optimal sequence of hidden states. l Given the model parameters and observed data, calculate the likelihood of the data. l Given just the observed data, estimate the model parameters. The first and the second problem can be solved by the dynamic programming algorithms known as the Viterbi algorithm and the Forward-Backward algorithm, respectively. The last one can be solved by an iterative Expectation-Maximization (EM) algorithm, known as the Baum-Welch algorithm. References: [Rabiner89] Lawrence R. Rabiner “A tutorial on hidden Markov models and selected applications in speech recognition”, Proceedings of the IEEE 77.2, pp. 257-286, 1989.     [Bilmes98] Jeff A. Bilmes, “A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models.”, 1998.     Available models hmm.GaussianHMM Hidden Markov Model with Gaussian emissions. hmm.GMMHMM Hidden Markov Model with Gaussian mixture emissions. hmm.MultinomialHMM Hidden Markov Model with multinomial (discrete) emissions     Read on for details on how to implement an HMM with a custom emission probability. Building HMM and generating samples You can build an HMM instance by passing the parameters described above to the constructor. Then, you can generate samples from the HMM by calling sample. >>> import numpy as np >>> from hmmlearn import hmm >>> np.random.seed(42) >>> model = hmm.GaussianHMM(n_components=3, covariance_type="full") >>> model.startprob_ = np.array([0.6, 0.3, 0.1]) >>> model.transmat_ = np.array([[0.7, 0.2, 0.1], ...                             [0.3, 0.5, 0.2], ...                             [0.3, 0.3, 0.4]]) >>> model.means_ = np.array([[0.0, 0.0], [3.0, -3.0], [5.0, 10.0]]) >>> model.covars_ = np.tile(np.identity(2), (3, 1, 1)) >>> X, Z = model.sample(100) The transition probability matrix need not to be ergodic. For instance, a left-right HMM can be defined as follows: >>> lr = hmm.GaussianHMM(n_components=3, covariance_type="diag", ...                     init_params="cm", params="cmt") >>> lr.startprob_ = np.array([1.0, 0.0, 0.0]) >>> lr.transmat_ = np.array([[0.5, 0.5, 0.0], ...                         [0.0, 0.5, 0.5], ...                         [0.0, 0.0, 1.0]]) If any of the required parameters are missing, sample will raise an exception: >>> hmm.GaussianHMM(n_components=3) >>> X, Z = model.sample(100) Traceback (most recent call last): ... sklearn.utils.validation.NotFittedError: This GaussianHMM instance is not fitted yet. Call 'fit' with appropriate arguments before using this method. Fixing parameters Each HMM parameter has a character code which can be used to customize its initialization and estimation. EM algorithm needs a starting point to proceed, thus prior to training each parameter is assigned a value either random or computed from the data. It is possible to hook into this process and provide a starting point explicitly. To do so 1. ensure that the character code for the parameter is missing from init_params and then 2. set the parameter to the desired value. For example, consider an HMM with explicitly initialized transition probability matrix >>> model = hmm.GaussianHMM(n_components=3, n_iter=100, init_params="mcs") >>> model.transmat_ = np.array([[0.7, 0.2, 0.1], ...                             [0.3, 0.5, 0.2], ...                             [0.3, 0.3, 0.4]]) A similar trick applies to parameter estimation. If you want to fix some parameter at a specific value, remove the corresponding character from params and set the parameter value before training. Examples: l Sampling from HMM Training HMM parameters and inferring the hidden states You can train an HMM by calling the fit method. The input is a matrix of concatenated sequences of observations (aka samples) along with the lengths of the sequences (see Working with multiple sequences). Note, since the EM algorithm is a gradient-based optimization method, it will generally get stuck in local optima. You should in general try to run fit with various initializations and select the highest scored model. The score of the model can be calculated by the score method. The inferred optimal hidden states can be obtained by calling predict method. The predict method can be specified with decoder algorithm. Currently the Viterbi algorithm ("viterbi"), and maximum a posteriori estimation ("map") are supported. This time, the input is a single sequence of observed values. Note, the states in remodel will have a different order than those in the generating model. >>> remodel = hmm.GaussianHMM(n_components=3, covariance_type="full", n_iter=100) >>> remodel.fit(X)  GaussianHMM(algorithm='viterbi',... >>> Z2 = remodel.predict(X) Monitoring convergence The number of EM algorithm iteration is upper bounded by the n_iter parameter. The training proceeds until n_iter steps were performed or the change in score is lower than the specified threshold tol. Note, that depending on the data EM algorithm may or may not achieve convergence in the given number of steps. You can use the monitor_ attribute to diagnose convergence: >>> remodel.monitor_  ConvergenceMonitor(history=[...], iter=12, n_iter=100, tol=0.01, verbose=False) >>> remodel.monitor_.converged True Working with multiple sequences All of the examples so far were using a single sequence of observations. The input format in the case of multiple sequences is a bit involved and is best understood by example. Consider two 1D sequences: >>> X1 = [[0.5], [1.0], [-1.0], [0.42], [0.24]] >>> X2 = [[2.4], [4.2], [0.5], [-0.24]] To pass both sequences to fit or predict first concatenate them into a single array and then compute an array of sequence lengths: >>> X = np.concatenate([X1, X2]) >>> lengths = [len(X1), len(X2)] Finally just call the desired method with X and lengths: >>> hmm.GaussianHMM(n_components=3).fit(X, lengths)  GaussianHMM(algorithm='viterbi', ... Examples: l Gaussian HMM of stock data Saving and loading HMM After traning an HMM can be easily persisted for future use with the standard pickle module or its more efficient replacement in the joblib package: >>> from sklearn.externals import joblib >>> joblib.dump(remodel, "filename.pkl") ["filename.pkl"] >>> joblib.load("filename.pkl")  GaussianHMM(algorithm='viterbi',... Implementing HMMs with custom emission probabilities If you want to implement other emission probability (e.g. Poisson), you have to subclass _BaseHMMand override the following methods base._BaseHMM._init(X,lengths) Initializes model parameters prior to fitting. base._BaseHMM._check() Validates model parameters prior to fitting. base._BaseHMM._generate_sample_from_state(state) Generates a random sample from a given component. base._BaseHMM._compute_log_likelihood(X) Computes per-component log probability under the model. base._BaseHMM._initialize_sufficient_statistics() Initializes sufficient statistics required for M-step. base._BaseHMM._accumulate_sufficient_statistics(...) Updates sufficient statistics from a given sample. base._BaseHMM._do_mstep(stats) Performs the M-step of EM algorithm.     Sampling from HMM This script shows how to sample points from a Hiden Markov Model (HMM): we use a 4-components with specified mean and covariance. The plot show the sequence of observations generated with the transitions between them. We can see that, as specified by our transition matrix, there are no transition between component 1 and 3. print(__doc__) import numpy as np import matplotlib.pyplot as plt from hmmlearn import hmm Prepare parameters for a 4-components HMM Initial population probability startprob = np.array([0.6, 0.3, 0.1, 0.0]) # The transition matrix, note that there are no transitions possible # between component 1 and 3 transmat = np.array([[0.7, 0.2, 0.0, 0.1], [0.3, 0.5, 0.2, 0.0], [0.0, 0.3, 0.5, 0.2], [0.2, 0.0, 0.2, 0.6]]) # The means of each component means = np.array([[0.0,  0.0], [0.0, 11.0], [9.0, 10.0], [11.0, -1.0]]) # The covariance of each component covars = .5 * np.tile(np.identity(2), (4, 1, 1)) # Build an HMM instance and set parameters model = hmm.GaussianHMM(n_components=4, covariance_type="full") # Instead of fitting it from the data, we directly set the estimated # parameters, the means and covariance of the components model.startprob_ = startprob model.transmat_ = transmat model.means_ = means model.covars_ = covars # Generate samples X, Z = model.sample(500) # Plot the sampled data plt.plot(X[:, 0], X[:, 1], ".-", label="observations", ms=6, mfc="orange", alpha=0.7) # Indicate the component numbers for i, m in enumerate(means): plt.text(m[0], m[1], 'Component %i' % (i + 1), size=17, horizontalalignment='center', bbox=dict(alpha=.7, facecolor='w')) plt.legend(loc='best') plt.show() Total running time of the script: (0 minutes 0.528 seconds) Gaussian HMM of stock data This script shows how to use Gaussian HMM on stock price data from Yahoo! finance. For more information on how to visualize stock prices with matplotlib, please refer to date_demo1.py of matplotlib. from __future__ import print_function import datetime import numpy as np from matplotlib import cm, pyplot as plt from matplotlib.dates import YearLocator, MonthLocator try: from matplotlib.finance import quotes_historical_yahoo_ochl except ImportError: # For Matplotlib prior to 1.5. from matplotlib.finance import ( quotes_historical_yahoo as quotes_historical_yahoo_ochl ) from hmmlearn.hmm import GaussianHMM print(__doc__) Get quotes from Yahoo! finance quotes = quotes_historical_yahoo_ochl( "INTC", datetime.date(1995, 1, 1), datetime.date(2012, 1, 6)) # Unpack quotes dates = np.array([q[0] for q in quotes], dtype=int) close_v = np.array([q[2] for q in quotes]) volume = np.array([q[5] for q in quotes])[1:] # Take diff of close value. Note that this makes # ``len(diff) = len(close_t) - 1``, therefore, other quantities also # need to be shifted by 1. diff = np.diff(close_v) dates = dates[1:] close_v = close_v[1:] # Pack diff and volume for training. X = np.column_stack([diff, volume]) Run Gaussian HMM print("fitting to HMM and decoding ...", end="") # Make an HMM instance and execute fit model = GaussianHMM(n_components=4, covariance_type="diag", n_iter=1000).fit(X) # Predict the optimal sequence of internal hidden state hidden_states = model.predict(X) print("done") Out: fitting to HMM and decoding ...done Print trained parameters and plot print("Transition matrix") print(model.transmat_) print() print("Means and vars of each hidden state") for i in range(model.n_components): print("{0}th hidden state".format(i)) print("mean = ", model.means_[i]) print("var = ", np.diag(model.covars_[i])) print() fig, axs = plt.subplots(model.n_components, sharex=True, sharey=True) colours = cm.rainbow(np.linspace(0, 1, model.n_components)) for i, (ax, colour) in enumerate(zip(axs, colours)): # Use fancy indexing to plot data in each state. mask = hidden_states == i ax.plot_date(dates[mask], close_v[mask], ".-", c=colour) ax.set_title("{0}th hidden state".format(i)) # Format the ticks. ax.xaxis.set_major_locator(YearLocator()) ax.xaxis.set_minor_locator(MonthLocator()) ax.grid(True) plt.show() Out: Transition matrix [[  9.79217702e-01  3.55338063e-15  2.72110180e-03  1.80611963e-02] [  1.21602143e-12  7.73505488e-01  1.85141936e-01  4.13525763e-02] [  3.25253603e-03  1.12652335e-01  8.83404334e-01  6.90794633e-04] [  1.18928464e-01  4.20116465e-01  1.91329669e-18  4.60955072e-01]] Means and vars of each hidden state 0th hidden state mean =  [  2.40689227e-02  4.97390967e+07] var =  [  7.42026137e-01  2.49469027e+14] 1th hidden state mean =  [  2.19283454e-02  8.82098779e+07] var =  [  1.26266869e-01  5.64899722e+14] 2th hidden state mean =  [  7.93313395e-03  5.43199848e+07] var =  [  5.34313422e-02  1.54645172e+14] 3th hidden state mean =  [ -3.64907452e-01  1.53097324e+08] var =  [  2.72118688e+00  5.88892979e+15] Total running time of the script: (0 minutes 2.205 seconds)
本文档为【Python的隐马尔科夫HMMLearn库的应用教学】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_083599
暂无简介~
格式:doc
大小:59KB
软件:Word
页数:18
分类:金融/投资/证券
上传时间:2018-09-06
浏览量:107