腳本之家服務器常用軟件

快捷導航

Python時間序列的實現(xiàn)

更新時間：2023年02月28日 14:42:40 作者：機器學習Zero

本文主要介紹了Python時間序列的實現(xiàn)，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧

1. datetime模塊

1.1 datetime對象

datetime.datetime對象（以下簡稱datetime對象）以毫秒形式存儲日期和時間。datetime.timedelta表示datetime對象之間的時間差。

import pandas as pd
import numpy as np
from datetime import datetime,timedelta
%matplotlib inline
now = datetime.now() #now為datetime.datetime對象

now

輸出：

datetime.datetime(2019, 10, 11, 15, 33, 5, 701305)

now.year,now.month,now.day

輸出：

(2019, 10, 11)

delta = datetime.now()-datetime(2019,1,1) #delta為datetime.timedelta對象

datetime.now() + timedelta(12)

輸出：

datetime.datetime(2023, 3, 10, 22, 13, 25, 3470)

1.2 字符串和datatime的相互轉換

（1） 利用str或datetime.strftime方法（傳入一個格式化字符串），datetime對象和pandas的Timestamp對象可以被格式化為字符串；datetime.strptime可以將字符串轉換為日期。

stamp = datetime(2011,1,3)
stamp.strftime('%Y-%m-%d') #或str(stamp)

輸出：

‘2011-01-03’

datetime.strptime('2019-10-01','%Y-%m-%d')

輸出：

datetime.datetime(2019, 10, 1, 0, 0)

（2） 對于一些常見的日期格式，可以使用datautil中的parser.parse方法（不支持中文）

from dateutil.parser import parse
parse('2019-10-01') #形成datetime.datetime對象

輸出：

datetime.datetime(2019, 10, 1, 0, 0)

（3） pandas的to_datetime方法可以解析多種不同的日期表示形式

import pandas as pd
datestrs = ['7/6/2019','8/6/2019']
dates = pd.to_datetime(datestrs) #將字符串列表轉換為Timestamp對象

type(dates)

輸出：

pandas.core.indexes.datetimes.DatetimeIndex

dates[0]

輸出：

Timestamp(‘2019-07-06 00:00:00’)

2. 時間序列基礎

pandas最基本的時間序列類型就是以時間戳（通常以Python字符串或datetime對象表示）為索引的Series。
時期（period）表示的是時間時區(qū)，比如數(shù)日、數(shù)月、數(shù)季、數(shù)年等。

from datetime import datetime

dates = [datetime(2019,1,1),datetime(2019,1,2),datetime(2019,1,5),datetime(2019,1,10),datetime(2019,2,10),datetime(2019,10,1)]

ts = pd.Series(np.random.randn(6),index = dates) #ts就成為一個時間序列，datetime對象實際上是被存放在一個DatetimeIndex中

ts

輸出：

2019-01-01 1.175755
2019-01-02 -0.520842
2019-01-05 -0.678080
2019-01-10 0.195213
2019-02-10 2.201572
2019-10-01 0.115911
dtype: float64

dates = pd.DatetimeIndex(['2019/01/01','2019/01/02','2019/01/02','2019/5/01','3/15/2019']) #同一時間點上多個觀測數(shù)據(jù)
dup_ts = pd.Series(np.arange(5),index = dates)

dup_ts

輸出：

2019-01-01 0
2019-01-02 1
2019-01-02 2
2019-05-01 3
2019-03-15 4
dtype: int32

dup_ts.groupby(level = 0).count()

輸出：

2019-01-01 1
2019-01-02 2
2019-03-15 1
2019-05-01 1
dtype: int64

pd.date_range可用于生成指定長度的DatetimeIndex

pd.date_range('2019/01/01','2019/2/1') #默認情況下產生按天計算的時間點。

輸出：

DatetimeIndex([‘2019-01-01’, ‘2019-01-02’, ‘2019-01-03’, ‘2019-01-04’,
‘2019-01-05’, ‘2019-01-06’, ‘2019-01-07’, ‘2019-01-08’,
‘2019-01-09’, ‘2019-01-10’, ‘2019-01-11’, ‘2019-01-12’,
‘2019-01-13’, ‘2019-01-14’, ‘2019-01-15’, ‘2019-01-16’,
‘2019-01-17’, ‘2019-01-18’, ‘2019-01-19’, ‘2019-01-20’,
‘2019-01-21’, ‘2019-01-22’, ‘2019-01-23’, ‘2019-01-24’,
‘2019-01-25’, ‘2019-01-26’, ‘2019-01-27’, ‘2019-01-28’,
‘2019-01-29’, ‘2019-01-30’, ‘2019-01-31’, ‘2019-02-01’],
dtype=‘datetime64[ns]’, freq=‘D’)

pd.date_range('2010/01/01',periods = 30) # 傳入起始或結束日期及一個表示時間段的數(shù)字。

輸出：

DatetimeIndex([‘2010-01-01’, ‘2010-01-02’, ‘2010-01-03’, ‘2010-01-04’,
‘2010-01-05’, ‘2010-01-06’, ‘2010-01-07’, ‘2010-01-08’,
‘2010-01-09’, ‘2010-01-10’, ‘2010-01-11’, ‘2010-01-12’,
‘2010-01-13’, ‘2010-01-14’, ‘2010-01-15’, ‘2010-01-16’,
‘2010-01-17’, ‘2010-01-18’, ‘2010-01-19’, ‘2010-01-20’,
‘2010-01-21’, ‘2010-01-22’, ‘2010-01-23’, ‘2010-01-24’,
‘2010-01-25’, ‘2010-01-26’, ‘2010-01-27’, ‘2010-01-28’,
‘2010-01-29’, ‘2010-01-30’],
dtype=‘datetime64[ns]’, freq=‘D’)

pd.date_range('2010/01/01','2010/12/1',freq = 'BM') 
#傳入BM（business end of month），生成每個月最后一個工作日組成的日期索引

輸出：

DatetimeIndex([‘2010-01-29’, ‘2010-02-26’, ‘2010-03-31’, ‘2010-04-30’,
‘2010-05-31’, ‘2010-06-30’, ‘2010-07-30’, ‘2010-08-31’,
‘2010-09-30’, ‘2010-10-29’, ‘2010-11-30’],
dtype=‘datetime64[ns]’, freq=‘BM’)

pd.Series(np.arange(13),index = pd.date_range('2010/01/01','2010/1/3',freq = '4h'))

輸出：

2010-01-01 00:00:00 0
2010-01-01 04:00:00 1
2010-01-01 08:00:00 2
2010-01-01 12:00:00 3
2010-01-01 16:00:00 4
2010-01-01 20:00:00 5
2010-01-02 00:00:00 6
2010-01-02 04:00:00 7
2010-01-02 08:00:00 8
2010-01-02 12:00:00 9
2010-01-02 16:00:00 10
2010-01-02 20:00:00 11
2010-01-03 00:00:00 12
Freq: 4H, dtype: int32

period_range可用于創(chuàng)建規(guī)則的時期范圍

pd.Series(np.arange(10),index = pd.period_range('2019/1/1','2019/10/01',freq='M'))

輸出：

2019-01 0
2019-02 1
2019-03 2
2019-04 3
2019-05 4
2019-06 5
2019-07 6
2019-08 7
2019-09 8
2019-10 9
Freq: M, dtype: int32

3. 重采樣及頻率轉換

重采樣（resampling）指的是將時間序列從一個頻率轉換到另一個頻率的處理過程。

降采樣（downsampling）：將高頻率數(shù)據(jù)聚合到低頻率數(shù)據(jù)
升采樣（upsampling）：將低頻率數(shù)據(jù)轉換到高頻率

rng = pd.date_range('2019/01/01',periods = 100,freq='D')
ts = pd.Series(np.random.randn(len(rng)),index=rng)

ts.resample('M').mean()

輸出：

2019-01-31 0.011565
2019-02-28 -0.185584
2019-03-31 -0.323621
2019-04-30 0.043687
Freq: M, dtype: float64

ts.resample('M',kind='period').mean()

輸出：

2019-01 0.011565
2019-02 -0.185584
2019-03 -0.323621
2019-04 0.043687
Freq: M, dtype: float64

rng = pd.date_range('2019/01/01',periods = 12,freq='T')
ts = pd.Series(np.random.randn(len(rng)),index=rng)
ts.resample('5min').sum()

輸出：

2019-01-01 00:00:00 1.625143
2019-01-01 00:05:00 2.588045
2019-01-01 00:10:00 2.447725
Freq: 5T, dtype: float64

金融領域中有種時間序列聚合方式，稱為OHLC重采樣，即計算各面元的四個值：

Open：開盤
High：最高值
Low：最小值
Close：收盤

輸出：

	open	high	low	close
2019-01-01 00:00:00	-0.345952	1.120258	-0.345952	1.120258
2019-01-01 00:05:00	-0.106197	2.448439	-1.014186	-1.014186
2019-01-01 00:10:00	1.445036	1.445036	1.002688	1.002688

另一種降采樣的辦法是實用pandas的groupby方法。

rng = pd.date_range('2019/1/1',periods = 100,freq='D')
ts = pd.Series(np.arange(len(rng)), index = rng)

ts.resample('m').mean()

輸出：

2019-01-31 15.0
2019-02-28 44.5
2019-03-31 74.0
2019-04-30 94.5
Freq: M, dtype: float64

ts.groupby(lambda x:x.month).mean()

輸出：

1 15.0
2 44.5
3 74.0
4 94.5
dtype: float64

4. 時間序列可視化

需要加載stock.csv文件，該文件格式如下：

	AA	AAPL	GE	IBM	JNJ	MSFT	PEP	SPX	XOM
1990/2/1 0:00	4.98	7.86	2.87	16.79	4.27	0.51	6.04	328.79	6.12
1990/2/2 0:00	5.04	8	2.87	16.89	4.37	0.51	6.09	330.92	6.24
1990/2/5 0:00	5.07	8.18	2.87	17.32	4.34	0.51	6.05	331.85	6.25
1990/2/6 0:00	5.01	8.12	2.88	17.56	4.32	0.51	6.15	329.66	6.23
1990/2/7 0:00	5.04	7.77	2.91	17.93	4.38	0.51	6.17	333.75	6.33

close_px_all = pd.read_csv('datasets/stock.csv',parse_dates = True, index_col=0)
close_px = close_px_all[['AAPL','MSFT','XOM']]
close_px.plot()  #'AAPL','MSFT','XOM'股價變化

close_px.resample('B').ffill().plot()  #填充工作日后，股價變化

close_px.AAPL.loc['2011-01':'2011-03'].plot() #蘋果公司2011年1月到3月每日股價

close_px.AAPL.loc['2011-01':'2011-03'].plot() #蘋果公司2011年1月到3月每日股價

5. 窗口函數(shù)

5.1 移動窗口函數(shù)

移動窗口函數(shù)（moving window function）指在移動窗口（可帶指數(shù)衰減權數(shù)）上計算的各種統(tǒng)計函數(shù)，也包括窗口不定長的函數(shù)（如指數(shù)加權移動平均）。與其他統(tǒng)計函數(shù)一樣，移動窗口函數(shù)會自動排除缺失值。

close_px.AAPL.plot()
close_px.AAPL.rolling(250).mean().plot()   #250日均線

close_px.rolling(250).mean().plot(logy=True)   #250日均線 對數(shù)坐標

close_px.AAPL.rolling(250,min_periods=10).std().plot()  #標準差

5.2 指數(shù)加權函數(shù)

指數(shù)加權函數(shù)：定義一個衰減因子（decay factor），以賦予近期的觀測值擁有更大的權重。衰減因子常用時間間隔（span），可以使結果兼容于窗口大小等于時間間隔的簡單移動窗口（simple moving window）函數(shù)。

appl_px = close_px.AAPL['2005':'2009']
ma60 = appl_px.rolling(60,min_periods=50).mean()  #60日移動平均 
ewma60 = appl_px.ewm(span = 60).mean()           #60日指數(shù)加權移動平均

appl_px.plot()
ma60.plot(c='g',style='k--')
ewma60.plot(c='r',style='k--')  #相對于普通移動平均，能“適應”更快的變化

在這里插入圖片描述

5.3 二元移動窗口函數(shù)

相關系數(shù)和協(xié)方差等統(tǒng)計運算需要在兩個時間序列上執(zhí)行，如某只股票對某個參考指數(shù)（如標普500）的相關系數(shù)。

aapl_rets = close_px_all.AAPL['1992':].pct_change()
spx_rets = close_px_all.SPX.pct_change()
corr = aapl_rets.rolling(125,min_periods=100).corr(spx_rets) #APPL6個月回報與標準普爾500指數(shù)的相關系數(shù)
corr.plot()

在這里插入圖片描述

all_rets = close_px_all[['AAPL','MSFT','XOM']]['2003':].pct_change()
corr = all_rets.rolling(125,min_periods=100).corr(spx_rets) #3支股票月回報與標準普爾500指數(shù)的相關系數(shù)
corr.plot()

在這里插入圖片描述