快捷導(dǎo)航

Pandas技巧分享之創(chuàng)建測(cè)試數(shù)據(jù)

更新時(shí)間：2023年07月04日 09:14:42 作者：databook

學(xué)習(xí)pandas的過(guò)程中，為了嘗試pandas提供的各類功能強(qiáng)大的函數(shù)，常常需要花費(fèi)很多時(shí)間去創(chuàng)造測(cè)試數(shù)據(jù)，本篇介紹了一些快速創(chuàng)建測(cè)試數(shù)據(jù)的方法，需要的可以參考一下

1. 一般方法

一般創(chuàng)建測(cè)試數(shù)據(jù)的有兩種：

一種是直接創(chuàng)建每行每列的數(shù)據(jù)
用 numpy 隨機(jī)生成二維數(shù)組

1.1. 直接創(chuàng)建數(shù)據(jù)

這種方式之前的視頻中已經(jīng)多次使用，直接創(chuàng)建數(shù)據(jù)雖然麻煩，但好處是每個(gè)數(shù)據(jù)都可控，不論是數(shù)據(jù)類型還是值都高度可控。

import pandas as pd
df = pd.DataFrame(
    {
        "數(shù)學(xué)": [100, 88, 94, 76, 84],
        "語(yǔ)文": [98, 80, 86, 76, 90],
        "英語(yǔ)": [95, 91, 86, 95, 83],
    },
    index=["小紅", "小明", "小汪", "小李", "小張"],
)
df

1.2. 隨機(jī)二維數(shù)組

隨機(jī)生成二維數(shù)組需要用到numpy庫(kù)，通過(guò) numpy生成隨機(jī)二維數(shù)據(jù)，然后將其轉(zhuǎn)換為pandas的DataFrame。

比如，下面生成一個(gè)3行4列的隨機(jī)數(shù)據(jù)：

pd.DataFrame(np.random.rand(3, 4))

上面的數(shù)據(jù)是隨機(jī)的，每次運(yùn)行產(chǎn)生的結(jié)果會(huì)不一樣。

隨機(jī)創(chuàng)建數(shù)據(jù)時(shí)，也可以設(shè)置索引和列名。

pd.DataFrame(
    np.random.rand(3, 4),
    index=["row1", "row2", "row3"],
    columns=["col1", "col2", "col3", "col4"],
)

2. 特殊技巧

上面介紹隨機(jī)生成數(shù)據(jù)的方法只能生成浮點(diǎn)型數(shù)據(jù)，而且索引和列名都只能是默認(rèn)的自增數(shù)字，數(shù)據(jù)的多樣性不夠。

下面介紹pandas自身提供的一些隨機(jī)生成數(shù)據(jù)方法，可以生成不同類型的隨機(jī)數(shù)據(jù)。

2.1. makeDataFrame

makeDataFrame() 方法會(huì)隨機(jī)創(chuàng)建一個(gè) 30x4 的數(shù)據(jù)集。

df = pd.util.testing.makeDataFrame()
print(df.shape)
df.head()

索引是隨機(jī)字符串。

2.2. makeMissingDataFrame

makeMissingDataFrame() 方法會(huì)隨機(jī)創(chuàng)建一個(gè) 30x4 包含缺失值的數(shù)據(jù)集，缺失值的位置也是隨機(jī)的。

df = pd.util.testing.makeMissingDataframe()
print(df.shape)
df.head()

2.3. makeTimeDataFrame

makeTimeDataFrame() 方法會(huì)隨機(jī)創(chuàng)建一個(gè) 30x4 包含的數(shù)據(jù)集，索引是自增的日期。

df = pd.util.testing.makeTimeDataFrame()
print(df.shape)
df.head()

2.4. makeMixedDataFrame

makeMixedDataFrame()方法會(huì)隨機(jī)創(chuàng)建一個(gè) 5x4的數(shù)據(jù)集，其中列的類型是多樣的，有字符串，日期和數(shù)值。

df = pd.util.testing.makeMixedDataFrame()
print(df.shape)
df

3. 補(bǔ)充

上面介紹的方法生成的數(shù)據(jù)集不大，如果需要生成數(shù)據(jù)量較大的數(shù)據(jù)集的話，可以循環(huán)生成DataFrame，然后再拼接在一起。

上面介紹的方法，每次生成的數(shù)據(jù)集的值是隨機(jī)的，不用擔(dān)心拼接后全是重復(fù)的數(shù)據(jù)。

此外，除了上面介紹的方法之外，pd.util.testing 還有其他一些創(chuàng)建數(shù)據(jù)的方法，歡迎大家去探索，使用。

到此這篇關(guān)于Pandas技巧分享之創(chuàng)建測(cè)試數(shù)據(jù)的文章就介紹到這了,更多相關(guān)Pandas創(chuàng)建測(cè)試數(shù)據(jù)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: