快捷導(dǎo)航

使用Python設(shè)計(jì)一個(gè)代碼統(tǒng)計(jì)工具

更新時(shí)間：2018年04月04日 10:36:47 作者：FOOFISH-PYTHON之禪

這篇文章主要介紹了使用Python設(shè)計(jì)一個(gè)代碼統(tǒng)計(jì)工具的相關(guān)資料，包括文件個(gè)數(shù)，代碼行數(shù)，注釋行數(shù)，空行行數(shù)。感興趣的朋友跟隨腳本之家小編一起看看吧

問題

設(shè)計(jì)一個(gè)程序，用于統(tǒng)計(jì)一個(gè)項(xiàng)目中的代碼行數(shù)，包括文件個(gè)數(shù)，代碼行數(shù)，注釋行數(shù)，空行行數(shù)。盡量設(shè)計(jì)靈活一點(diǎn)可以通過輸入不同參數(shù)來(lái)統(tǒng)計(jì)不同語(yǔ)言的項(xiàng)目，例如：

# type用于指定文件類型
python counter.py --type python

輸出：

files:10
code_lines:200
comments:100
blanks:20

分析

這是一個(gè)看起來(lái)很簡(jiǎn)單，但做起來(lái)有點(diǎn)復(fù)雜的設(shè)計(jì)題，我們可以把問題化小，只要能正確統(tǒng)計(jì)一個(gè)文件的代碼行數(shù)，那么統(tǒng)計(jì)一個(gè)目錄也不成問題，其中最復(fù)雜的就是關(guān)于多行注釋，以 Python 為例，注釋代碼行有如下幾種情況：

1、井號(hào)開頭的單行注釋

# 單行注釋

2、多行注釋符在同一行的情況

"""這是多行注釋"""
'''這也是多行注釋'''
3、多行注釋符

"""
這3行都是注釋符
"""

我們的思路采取逐行解析的方式，多行注釋需要一個(gè)額外的標(biāo)識(shí)符in_multi_comment 來(lái)標(biāo)識(shí)當(dāng)前行是不是處于多行注釋符當(dāng)中，默認(rèn)為 False，多行注釋開始時(shí)，置為 True，遇到下一個(gè)多行注釋符時(shí)置為 False。從多行注釋開始符號(hào)直到下一個(gè)結(jié)束符號(hào)之間的代碼都應(yīng)該屬于注釋行。

知識(shí)點(diǎn)

如何正確讀取文件，讀出的文件當(dāng)字符串處理時(shí)，字符串的常用方法

簡(jiǎn)化版

我們逐步進(jìn)行迭代，先實(shí)現(xiàn)一個(gè)簡(jiǎn)化版程序，只統(tǒng)計(jì)Python代碼的單文件，而且不考慮多行注釋的情況，這是任何入門 Python 的人都能實(shí)現(xiàn)的功能。關(guān)鍵地方是把每一行讀出來(lái)之后，先用 strip() 方法把字符串兩邊的空格、回車去掉

# -*- coding: utf-8 -*-
"""
只能統(tǒng)計(jì)單行注釋的py文件
"""
def parse(path):
 comments = 0
 blanks = 0
 codes = 0
 with open(path, encoding='utf-8') as f:
 for line in f.readlines():
  line = line.strip()
  if line == "":
  blanks += 1
  elif line.startswith("#"):
  comments += 1
  else:
  codes += 1
 return {"comments": comments, "blanks": blanks, "codes": codes}
if __name__ == '__main__':
 print(parse("xxx.py"))

多行注釋版

如果只能統(tǒng)計(jì)單行注釋的代碼，意義并不大，要解決多行注釋的統(tǒng)計(jì)才能算是一個(gè)真正的代碼統(tǒng)計(jì)器

# -*- coding: utf-8 -*-
"""

可以統(tǒng)計(jì)包含有多行注釋的py文件

"""
def parse(path):
 in_multi_comment = False # 多行注釋符標(biāo)識(shí)符號(hào)
 comments = 0
 blanks = 0
 codes = 0
 with open(path, encoding="utf-8") as f:
 for line in f.readlines():
  line = line.strip()
  # 多行注釋中的空行當(dāng)做注釋處理
  if line == "" and not in_multi_comment:
  blanks += 1
  # 注釋有4種
  # 1. # 井號(hào)開頭的單行注釋
  # 2. 多行注釋符在同一行的情況
  # 3. 多行注釋符之間的行
  elif line.startswith("#") or \
    (line.startswith('"""') and line.endswith('"""') and len(line)) > 3 or \
   (line.startswith("'''") and line.endswith("'''") and len(line) > 3) or \
   (in_multi_comment and not (line.startswith('"""') or line.startswith("'''"))):
  comments += 1
  # 4. 多行注釋符的開始行和結(jié)束行
  elif line.startswith('"""') or line.startswith("'''"):
  in_multi_comment = not in_multi_comment
  comments += 1
  else:
  codes += 1
 return {"comments": comments, "blanks": blanks, "codes": codes}
if __name__ == '__main__':
 print(parse("xxx.py"))

上面的第4種情況，遇到多行注釋符號(hào)時(shí)，in_multi_comment 標(biāo)識(shí)符進(jìn)行取反操作是關(guān)鍵操作，而不是單純地置為 False 或 True，第一次遇到 """ 時(shí)為True，第二次遇到 """ 就是多行注釋的結(jié)束符，取反為False，以此類推，第三次又是開始，取反又是True。

那么判斷其它語(yǔ)言是不是要重新寫一個(gè)解析函數(shù)呢？如果你仔細(xì)觀察的話，多行注釋的4種情況可以抽象出4個(gè)判斷條件，因?yàn)榇蟛糠终Z(yǔ)言都有單行注釋，多行注釋，只是他們的符號(hào)不一樣而已。

CONF = {"py": {"start_comment": ['"""', "'''"], "end_comment": ['"""', "'''"], "single": "#"},
 "java": {"start_comment": ["/*"], "end_comment": ["*/"], "single": "http://"}}
start_comment = CONF.get(exstansion).get("start_comment")
end_comment = CONF.get(exstansion).get("end_comment")
cond2 = False
cond3 = False
cond4 = False
for index, item in enumerate(start_comment):
 cond2 = line.startswith(item) and line.endswith(end_comment[index]) and len(line) > len(item)
 if cond2:
 break
for item in end_comment:
 if line.startswith(item):
 cond3 = True
 break
for item in start_comment+end_comment:
 if line.startswith(item):
 cond4 = True
 break
if line == "" and not in_multi_comment:
 blanks += 1
# 注釋有4種
# 1. # 井號(hào)開頭的單行注釋
# 2. 多行注釋符在同一行的情況
# 3. 多行注釋符之間的行
elif line.startswith(CONF.get(exstansion).get("single")) or cond2 or \
 (in_multi_comment and not cond3):
 comments += 1
# 4. 多行注釋符分布在多行時(shí)，開始行和結(jié)束行
elif cond4:
 in_multi_comment = not in_multi_comment
 comments += 1
else:
 codes += 1

只需要一個(gè)配置常量把所有語(yǔ)言的單行、多行注釋的符號(hào)標(biāo)記出來(lái)，對(duì)應(yīng)出 cond1到cond4幾種情況就ok。剩下的任務(wù)就是解析多個(gè)文件，可以用 os.walk 方法。

def counter(path):
 """
 可以統(tǒng)計(jì)目錄或者某個(gè)文件
 :param path:
 :return:
 """
 if os.path.isdir(path):
 comments, blanks, codes = 0, 0, 0
 list_dirs = os.walk(path)
 for root, dirs, files in list_dirs:
  for f in files:
  file_path = os.path.join(root, f)
  stats = parse(file_path)
  comments += stats.get("comments")
  blanks += stats.get("blanks")
  codes += stats.get("codes")
 return {"comments": comments, "blanks": blanks, "codes": codes}
 else:
 return parse(path)

當(dāng)然，想要把這個(gè)程序做完善，還有很多工作要多，包括命令行解析，根據(jù)指定參數(shù)只解析某一種語(yǔ)言。

補(bǔ)充：

Python實(shí)現(xiàn)代碼行數(shù)統(tǒng)計(jì)工具

我們經(jīng)常想要統(tǒng)計(jì)項(xiàng)目的代碼行數(shù)，但是如果想統(tǒng)計(jì)功能比較完善可能就不是那么簡(jiǎn)單了，今天我們來(lái)看一下如何用python來(lái)實(shí)現(xiàn)一個(gè)代碼行統(tǒng)計(jì)工具。

思路：

首先獲取所有文件，然后統(tǒng)計(jì)每個(gè)文件中代碼的行數(shù)，最后將行數(shù)相加.

實(shí)現(xiàn)的功能：

統(tǒng)計(jì)每個(gè)文件的行數(shù)；
統(tǒng)計(jì)總行數(shù)；
統(tǒng)計(jì)運(yùn)行時(shí)間；
支持指定統(tǒng)計(jì)文件類型，排除不想統(tǒng)計(jì)的文件類型；
遞歸統(tǒng)計(jì)文件夾下包括子文件件下的文件的行數(shù)；

排除空行；

# coding=utf-8
import os
import time
basedir = '/root/script'
filelists = []
# 指定想要統(tǒng)計(jì)的文件類型
whitelist = ['php', 'py']
#遍歷文件, 遞歸遍歷文件夾中的所有
def getFile(basedir):
 global filelists
 for parent,dirnames,filenames in os.walk(basedir):
  #for dirname in dirnames:
  # getFile(os.path.join(parent,dirname)) #遞歸
  for filename in filenames:
   ext = filename.split('.')[-1]
   #只統(tǒng)計(jì)指定的文件類型，略過一些log和cache文件
   if ext in whitelist:
    filelists.append(os.path.join(parent,filename))
#統(tǒng)計(jì)一個(gè)文件的行數(shù)
def countLine(fname):
 count = 0
 for file_line in open(fname).xreadlines():
  if file_line != '' and file_line != '\n': #過濾掉空行
   count += 1
 print fname + '----' , count
 return count
if __name__ == '__main__' :
 startTime = time.clock()
 getFile(basedir)
 totalline = 0
 for filelist in filelists:
  totalline = totalline + countLine(filelist)
 print 'total lines:',totalline
 print 'Done! Cost Time: %0.2f second' % (time.clock() - startTime)

結(jié)果：

[root@pythontab script]# python countCodeLine.py
/root/script/test/gametest.php---- 16
/root/script/smtp.php---- 284
/root/script/gametest.php---- 16
/root/script/countCodeLine.py---- 33
/root/script/sendmail.php---- 17
/root/script/test/gametest.php---- 16
total lines: 382
Done! Cost Time: 0.00 second
[root@pythontab script]#

只會(huì)統(tǒng)計(jì)php和python文件，非常方便。

總結(jié)

以上所述是小編給大家介紹的使用Python設(shè)計(jì)一個(gè)代碼統(tǒng)計(jì)工具，希望對(duì)大家有所幫助，如果大家有任何疑問請(qǐng)給我留言，小編會(huì)及時(shí)回復(fù)大家的。在此也非常感謝大家對(duì)腳本之家網(wǎng)站的支持！

您可能感興趣的文章: