pandas重置索引標簽的實現(xiàn)示例

更新時間：2024年04月23日 14:38:49 作者：名本無名

在使用Pandas進行數(shù)據(jù)處理時,有時候我們可能會需要對數(shù)據(jù)進行重置索引的操作,本文主要介紹了pandas重置索引標簽的實現(xiàn)示例,具有一定的參考價值,感興趣的可以了解一下

In [205]: s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])

In [206]: s
Out[206]: 
a    1.695148
b    1.328614
c    1.234686
d   -0.385845
e   -1.326508
dtype: float64

In [207]: s.reindex(["e", "b", "f", "d"])
Out[207]: 
e   -1.326508
b    1.328614
f         NaN
d   -0.385845
dtype: float64

在這里，標簽 f 不包含在 Series 對象中，因此在結果中顯示為 NaN。

對于 DataFrame 對象，您可以重新建立索引和列名

In [208]: df
Out[208]: 
        one       two     three
a  1.394981  1.772517       NaN
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435
d       NaN  0.279344 -0.613172

In [209]: df.reindex(index=["c", "f", "b"], columns=["three", "two", "one"])
Out[209]: 
      three       two       one
c  1.227435  1.478369  0.695246
f       NaN       NaN       NaN
b -0.050390  1.912123  0.343054

您也可以使用 axis 參數(shù)

In [210]: df.reindex(["c", "f", "b"], axis="index")
Out[210]: 
        one       two     three
c  0.695246  1.478369  1.227435
f       NaN       NaN       NaN
b  0.343054  1.912123 -0.050390

注意，軸標簽的索引可以在對象之間共享。

因此，如果我們有一個 Series 和一個 DataFrame，可以執(zhí)行以下操作

In [211]: rs = s.reindex(df.index)

In [212]: rs
Out[212]: 
a    1.695148
b    1.328614
c    1.234686
d   -0.385845
dtype: float64

In [213]: rs.index is df.index
Out[213]: True

這意味著重建后的 Series 的索引與 DataFrame 的索引是同一個 Python 對象

DataFrame.reindex() 還支持 “軸樣式” 調用，可以指定單個 labels 參數(shù)，并指定應用于哪個 axis。

In [214]: df.reindex(["c", "f", "b"], axis="index")
Out[214]: 
        one       two     three
c  0.695246  1.478369  1.227435
f       NaN       NaN       NaN
b  0.343054  1.912123 -0.050390

In [215]: df.reindex(["three", "two", "one"], axis="columns")
Out[215]: 
      three       two       one
a       NaN  1.772517  1.394981
b -0.050390  1.912123  0.343054
c  1.227435  1.478369  0.695246
d -0.613172  0.279344       NaN

在后面的多級索引和高級索引方式中我們將會介紹更加簡便的重置索引方式

注意：如果編寫的代碼對性能要求較高的話，預先對齊的數(shù)據(jù)操作會更快。例如，對兩個未對齊的 DataFrame 相加后臺也會先調用 reindex，但是這種是不是的調用會拖慢運行速度。

1 重建索引并與另一個對象對齊

您可能有時會希望獲取一個對象并為其軸重建索引，使其與另一個對象相同。

盡管其語法很簡單，但略顯冗長，因此 reindex_like() 方法可用于簡化此操作

In [216]: df
Out[216]: 
        one       two
a  0.509335  0.840612
b  0.086555  0.523010
c  0.588121  0.351784
d  0.121684  0.027703

In [217]: df2
Out[217]: 
        one       two
a  0.700215  0.112092
b  0.098034  0.791992
d  0.251426  0.770680

In [218]: df.reindex_like(df2)
Out[218]: 
        one       two
a  0.509335  0.840612
b  0.086555  0.523010
d  0.121684  0.027703

2 用 align 對齊對象

align() 方法是同時對齊兩個對象的最快方法，它還支持 join 參數(shù)（與連接和合并相關）

join='outer': 使用兩個索引的并集，默認方式
join='left': 使用左側對象的索引
join='right': 使用右側對象的索引
join='inner': 使用兩個索引的交集

對于 Series，它返回一個帶有兩個重置索引的 Series 的元組

In [219]: s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])

In [220]: s1 = s[:4]

In [221]: s2 = s[1:]

In [222]: s1.align(s2)
Out[222]: 
(a   -0.186646
 b   -1.692424
 c   -0.303893
 d   -1.425662
 e         NaN
 dtype: float64,
 a         NaN
 b   -1.692424
 c   -0.303893
 d   -1.425662
 e    1.114285
 dtype: float64)

In [223]: s1.align(s2, join="inner")
Out[223]: 
(b   -1.692424
 c   -0.303893
 d   -1.425662
 dtype: float64,
 b   -1.692424
 c   -0.303893
 d   -1.425662
 dtype: float64)

In [224]: s1.align(s2, join="left")
Out[224]: 
(a   -0.186646
 b   -1.692424
 c   -0.303893
 d   -1.425662
 dtype: float64,
 a         NaN
 b   -1.692424
 c   -0.303893
 d   -1.425662
 dtype: float64)

對于 DataFrame，連接方法默認同時應用于索引和列

In [225]: df.align(df2, join="inner")
Out[225]: 
(        one       two
 a  1.394981  1.772517
 b  0.343054  1.912123
 c  0.695246  1.478369,
         one       two
 a  1.394981  1.772517
 b  0.343054  1.912123
 c  0.695246  1.478369)

你也可以設置 axis 參數(shù)，只在指定的軸上對齊

In [226]: df.align(df2, join="inner", axis=0)
Out[226]: 
(        one       two     three
 a  1.394981  1.772517       NaN
 b  0.343054  1.912123 -0.050390
 c  0.695246  1.478369  1.227435,
         one       two
 a  1.394981  1.772517
 b  0.343054  1.912123
 c  0.695246  1.478369)

如果將 Series 傳遞給 DataFrame.align()，則可以使用 axis 參數(shù)指定在 DataFrame 的索引或列上對齊兩個對象

In [227]: df.align(df2.iloc[0], axis=1)
Out[227]: 
(        one     three       two
 a  1.394981       NaN  1.772517
 b  0.343054 -0.050390  1.912123
 c  0.695246  1.227435  1.478369
 d       NaN -0.613172  0.279344,
 one      1.394981
 three         NaN
 two      1.772517
 Name: a, dtype: float64)

3 重建索引時 NaN 的填充方式

reindex() 接受一個可選參數(shù) method，用于指定產生缺失值時的填充方法

來看一個簡單的例子

In [228]: rng = pd.date_range("1/3/2000", periods=8)

In [229]: ts = pd.Series(np.random.randn(8), index=rng)

In [230]: ts2 = ts[[0, 3, 6]]

In [231]: ts
Out[231]: 
2000-01-03    0.183051
2000-01-04    0.400528
2000-01-05   -0.015083
2000-01-06    2.395489
2000-01-07    1.414806
2000-01-08    0.118428
2000-01-09    0.733639
2000-01-10   -0.936077
Freq: D, dtype: float64

In [232]: ts2
Out[232]: 
2000-01-03    0.183051
2000-01-06    2.395489
2000-01-09    0.733639
Freq: 3D, dtype: float64

In [233]: ts2.reindex(ts.index)
Out[233]: 
2000-01-03    0.183051
2000-01-04         NaN
2000-01-05         NaN
2000-01-06    2.395489
2000-01-07         NaN
2000-01-08         NaN
2000-01-09    0.733639
2000-01-10         NaN
Freq: D, dtype: float64

In [234]: ts2.reindex(ts.index, method="ffill")
Out[234]: 
2000-01-03    0.183051
2000-01-04    0.183051
2000-01-05    0.183051
2000-01-06    2.395489
2000-01-07    2.395489
2000-01-08    2.395489
2000-01-09    0.733639
2000-01-10    0.733639
Freq: D, dtype: float64

In [235]: ts2.reindex(ts.index, method="bfill")
Out[235]: 
2000-01-03    0.183051
2000-01-04    2.395489
2000-01-05    2.395489
2000-01-06    2.395489
2000-01-07    0.733639
2000-01-08    0.733639
2000-01-09    0.733639
2000-01-10         NaN
Freq: D, dtype: float64

In [236]: ts2.reindex(ts.index, method="nearest")
Out[236]: 
2000-01-03    0.183051
2000-01-04    0.183051
2000-01-05    2.395489
2000-01-06    2.395489
2000-01-07    2.395489
2000-01-08    0.733639
2000-01-09    0.733639
2000-01-10    0.733639
Freq: D, dtype: float64

這些方法要求索引按遞增或遞減順序排列

注意，使用 fillna 或 interpolate 可以實現(xiàn)相同的效果（method ='nearest'除外）

In [237]: ts2.reindex(ts.index).fillna(method="ffill")
Out[237]: 
2000-01-03    0.183051
2000-01-04    0.183051
2000-01-05    0.183051
2000-01-06    2.395489
2000-01-07    2.395489
2000-01-08    2.395489
2000-01-09    0.733639
2000-01-10    0.733639
Freq: D, dtype: float64

如果索引不是單調遞增或遞減的，reindex() 將引發(fā) ValueError。而 fillna() 和 interpolate() 不會對索引的順序進行任何檢查

4 對重建索引的填充方式的限制

limit 和 tolerance 參數(shù)可以對 reindex 的填充操作進行額外的控制。

limit 限定了連續(xù)匹配的最大數(shù)量

In [238]: ts2.reindex(ts.index, method="ffill", limit=1)
Out[238]: 
2000-01-03    0.183051
2000-01-04    0.183051
2000-01-05         NaN
2000-01-06    2.395489
2000-01-07    2.395489
2000-01-08         NaN
2000-01-09    0.733639
2000-01-10    0.733639
Freq: D, dtype: float64

而 tolerance 用于指定索引值之間的最大距離

In [239]: ts2.reindex(ts.index, method="ffill", tolerance="1 day")
Out[239]: 
2000-01-03    0.183051
2000-01-04    0.183051
2000-01-05         NaN
2000-01-06    2.395489
2000-01-07    2.395489
2000-01-08         NaN
2000-01-09    0.733639
2000-01-10    0.733639
Freq: D, dtype: float64

注意：當索引為 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 時，tolerance 會盡可能將這些索引強制轉換為 Timedelta 類型，

因此需要你為 tolerance 參數(shù)設置恰當?shù)淖址?/p>

5 刪除標簽

drop() 函數(shù)經常會與 reindex 配合使用，用于刪除軸上的一組標簽

In [240]: df
Out[240]: 
        one       two     three
a  1.394981  1.772517       NaN
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435
d       NaN  0.279344 -0.613172

In [241]: df.drop(["a", "d"], axis=0)
Out[241]: 
        one       two     three
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435

In [242]: df.drop(["one"], axis=1)
Out[242]: 
        two     three
a  1.772517       NaN
b  1.912123 -0.050390
c  1.478369  1.227435
d  0.279344 -0.613172

注意，雖然下面的方法也可以實現(xiàn)，但不太明顯也不太干凈

In [243]: df.reindex(df.index.difference(["a", "d"]))
Out[243]: 
        one       two     three
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435

6 標簽的重命名與映射

可以使用 rename() 方法，來基于某些映射（字典或 Series）或任意函數(shù)來重新標記軸

In [244]: s
Out[244]: 
a   -0.186646
b   -1.692424
c   -0.303893
d   -1.425662
e    1.114285
dtype: float64

In [245]: s.rename(str.upper)
Out[245]: 
A   -0.186646
B   -1.692424
C   -0.303893
D   -1.425662
E    1.114285
dtype: float64

如果傳遞的是函數(shù)，則該函數(shù)必須返回一個值（并且必須生成一組唯一的值）。

此外，也可以使用 dict 或 Series

In [246]: df.rename(
   .....:     columns={"one": "foo", "two": "bar"},
   .....:     index={"a": "apple", "b": "banana", "d": "durian"},
   .....: )
   .....: 
Out[246]: 
             foo       bar     three
apple   1.394981  1.772517       NaN
banana  0.343054  1.912123 -0.050390
c       0.695246  1.478369  1.227435
durian       NaN  0.279344 -0.613172

如果傳入的映射不包含索引和列名標簽，則它不會被重命名。

注意，映射中的多出來的標簽不會觸發(fā)異常

也可以為 axis 指定名稱，對相應的軸執(zhí)行映射操作

In [247]: df.rename({"one": "foo", "two": "bar"}, axis="columns")
Out[247]: 
        foo       bar     three
a  1.394981  1.772517       NaN
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435
d       NaN  0.279344 -0.613172

In [248]: df.rename({"a": "apple", "b": "banana", "d": "durian"}, axis="index")
Out[248]: 
             one       two     three
apple   1.394981  1.772517       NaN
banana  0.343054  1.912123 -0.050390
c       0.695246  1.478369  1.227435
durian       NaN  0.279344 -0.613172

rename() 方法還提供了一個默認為 False 的 inplace 參數(shù)，并復制一份數(shù)據(jù)。

當 inplace=True 時，會在原數(shù)據(jù)上重命名

rename() 還支持使用標量或列表的方式來更改 Series.name 屬性

In [249]: s.rename("scalar-name")
Out[249]: 
a   -0.186646
b   -1.692424
c   -0.303893
d   -1.425662
e    1.114285
Name: scalar-name, dtype: float64

還可以使用 DataFrame.rename_axis() 和 Series.rename_axis() 方法來更改 MultiIndex 的名稱

In [250]: df = pd.DataFrame(
   .....:     {"x": [1, 2, 3, 4, 5, 6], "y": [10, 20, 30, 40, 50, 60]},
   .....:     index=pd.MultiIndex.from_product(
   .....:         [["a", "b", "c"], [1, 2]], names=["let", "num"]
   .....:     ),
   .....: )
   .....: 

In [251]: df
Out[251]: 
         x   y
let num       
a   1    1  10
    2    2  20
b   1    3  30
    2    4  40
c   1    5  50
    2    6  60

In [252]: df.rename_axis(index={"let": "abc"})
Out[252]: 
         x   y
abc num       
a   1    1  10
    2    2  20
b   1    3  30
    2    4  40
c   1    5  50
    2    6  60

In [253]: df.rename_axis(index=str.upper)
Out[253]: 
         x   y
LET NUM       
a   1    1  10
    2    2  20
b   1    3  30
    2    4  40
c   1    5  50
    2    6  60

到此這篇關于pandas重置索引標簽的實現(xiàn)示例的文章就介紹到這了,更多相關pandas重置索引標簽內容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

Python中optparse模塊使用淺析
這篇文章主要介紹了Python中optparse模塊使用淺析,optparse模塊主要用來為腳本傳遞命令參數(shù)功能,需要的朋友可以參考下
2015-01-01
在Python程序中操作MySQL的基本方法
這篇文章主要介紹了再Python程序中操作MySQL的基本方法,主要借助了MYSQLdb module,需要的朋友可以參考下
2015-07-07
在Python操作時間和日期之asctime()方法的使用
這篇文章主要介紹了在Python操作時間和日期之asctime()方法的使用,是Python入門學習中的基礎知識,需要的朋友可以參考下
2015-05-05
django模型中的字段和model名顯示為中文小技巧分享
這里給大家分享2個可以讓django模型中的字段和model名顯示為中文的小技巧，非常的簡單實用，給需要的小伙伴參考下。
2014-11-11
在Pycharm中對代碼進行注釋和縮進的方法詳解
今天小編就為大家分享一篇在Pycharm中對代碼進行注釋和縮進的方法詳解，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2019-01-01
淺談flask中的before_request與after_request
這篇文章主要介紹了淺談flask中的before_request與after_request，小編覺得還是挺不錯的，具有一定借鑒價值，需要的朋友可以參考下
2018-01-01
Python基于FTP模塊實現(xiàn)ftp文件上傳操作示例
這篇文章主要介紹了Python基于FTP模塊實現(xiàn)ftp文件上傳操作,結合實例形式分析了Python引入ftp模塊及相關設置、文件傳輸?shù)炔僮骷记?需要的朋友可以參考下
2018-04-04
Python+AutoIt實現(xiàn)界面工具開發(fā)過程詳解
這篇文章主要介紹了Python+AutoIt實現(xiàn)界面工具開發(fā)過程詳解,文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下
2019-08-08
python字符串格式化函數(shù)
這篇文章主要介紹了python字符串格式化函數(shù)，主要概述內容有?格式化字符串輸出?、三引號、字符串字母處理函數(shù)等相關內容，下文詳細內容介紹需要的小伙伴可以參考一下
2022-04-04
Python中斷多重循環(huán)的幾種方法
跳出單循環(huán)不管是什么編程語言,都有可能會有跳出循環(huán)的需求,本文主要介紹了Python中斷多重循環(huán)的幾種方法,具有一定的參考價值,感興趣的可以了解一下
2023-11-11