亚洲乱码中文字幕综合,中国熟女仑乱hd,亚洲精品乱拍国产一区二区三区,一本大道卡一卡二卡三乱码全集资源,又粗又黄又硬又爽的免费视频

java + dom4j.jar提取xml文檔內(nèi)容

 更新時(shí)間:2019年08月30日 10:27:12   作者:靜遠(yuǎn)小和尚  
這篇文章主要為大家詳細(xì)介紹了java + dom4j.jar提取xml文檔內(nèi)容,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下

本文實(shí)例為大家分享了java + dom4j.jar提取xml文檔內(nèi)容的具體代碼,供大家參考,具體內(nèi)容如下

資源下載頁(yè):點(diǎn)擊下載

本例程主要借助幾個(gè)遍歷的操作對(duì)xml格式下的內(nèi)容進(jìn)行提取,操作不是最優(yōu)的方法,主要是練習(xí)使用幾個(gè)遍歷操作。

xml格式文檔內(nèi)容:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE nitf SYSTEM "http://www.nitf.org/IPTC/NITF/3.3/specification/dtd/nitf-3-3.dtd"> 
-<nitf version="-//IPTC//DTD NITF 3.3//EN" change.time="19:30" change.date="June 10, 2005">
 
 
-<head>
 
<title>An End to Nuclear Testing</title>
 
<meta name="publication_day_of_month" content="7"/> 
<meta name="publication_month" content="7"/> 
<meta name="publication_year" content="1993"/> 
<meta name="publication_day_of_week" content="Wednesday"/> 
<meta name="dsk" content="Editorial Desk"/> 
<meta name="print_page_number" content="14"/> 
<meta name="print_section" content="A"/> 
<meta name="print_column" content="1"/>
<meta name="online_sections" content="Opinion"/>
 
 
-<docdata>
 
<doc-id id-string="619929"/>
 
<doc.copyright year="1993" holder="The New York Times"/>
 
 
-<identified-content>
 
<classifier type="descriptor" class="indexing_service">ATOMIC WEAPONS</classifier> 
<classifier type="descriptor" class="indexing_service">NUCLEAR TESTS</classifier> 
<classifier type="descriptor" class="indexing_service">TESTS AND TESTING</classifier> 
<classifier type="descriptor" class="indexing_service">EDITORIALS</classifier> 
<person class="indexing_service">CLINTON, BILL (PRES)</person> 
<classifier type="types_of_material" class="online_producer">Editorial</classifier> 
<classifier type="taxonomic_classifier" class="online_producer">Top/Opinion</classifier> 
<classifier type="taxonomic_classifier" class="online_producer">Top/Opinion/Opinion</classifier> 
<classifier type="taxonomic_classifier" class="online_producer">Top/Opinion/Opinion/Editorials</classifier> 
<classifier type="general_descriptor" class="online_producer">Nuclear Tests</classifier> 
<classifier type="general_descriptor" class="online_producer">Atomic Weapons</classifier> 
<classifier type="general_descriptor" class="online_producer">Tests and Testing</classifier> 
<classifier type="general_descriptor" class="online_producer">Armament, Defense and Military Forces</classifier>
 
</identified-content> 
</docdata> 
<pubdata name="The New York Times" unit-of-measure="word" item-length="390" ex-ref="http://query.nytimes.com/gst/fullpage.html?res=9F0CEFDF1439F934A35754C0A965958260" date.publication="19930707T000000"/>
 
</head>
 
 
-<body>
 
 
-<body.head>
 
 
-<hedline>
 
<hl1>An End to Nuclear Testing</hl1>
 
</hedline> 
</body.head>
 
 
-<body.content>
 
 
-<block class="lead_paragraph">
 
<p>For nearly half a century, test explosions in the Nevada desert were a reverberating reminder of cold war insecurity. Now the biggest worry is nuclear proliferation, not the Soviet threat. That's why President Clinton has quietly decided to extend the moratorium on tests of nuclear arms for at least 15 months.</p> 
<p>To persuade nuclear have-nots to stay out of the bomb-making business, it makes more sense to halt testing and try to get others to do likewise than to conduct more demonstrations of America's deterrent power.</p>
 
</block>
 
 
-<block class="full_text">
 
<p>For nearly half a century, test explosions in the Nevada desert were a reverberating reminder of cold war insecurity. Now the biggest worry is nuclear proliferation, not the Soviet threat. That's why President Clinton has quietly decided to extend the moratorium on tests of nuclear arms for at least 15 months.</p>
<p>To persuade nuclear have-nots to stay out of the bomb-making business, it makes more sense to halt testing and try to get others to do likewise than to conduct more demonstrations of America's deterrent power.</p> 
<p>Not that nuclear wannabes will necessarily follow America's lead. Nor will an end to all testing assure an end to bomb-making; states like Pakistan have developed nuclear devices without testing them first.</p>
<p>But calling a halt to U.S. nuclear testing makes it easier for leaders in Russia and France to extend the moratoriums they are now observing and improve the atmosphere for prompt negotiation of a treaty to ban all tests.</p>
<p>That test ban in turn should shore up international support for the 1968 Nonproliferation Treaty, linchpin of efforts to stop the spread of nuclear arms, when it comes up for review in 1995. It will also bolster the backing for tighter controls on exports used in bomb-making.</p>
<p>Mr. Clinton has taken three helpful steps. He has extended the Congressionally mandated moratorium on U.S. tests that was due to expire last week. He has declared that the U.S. will not test unless another nation does so first. And he wants to negotiate a total ban on testing.</p>
<p>But the President also wants the nuclear labs to be prepared for a prompt resumption of warhead safety and reliability tests. This could cost millions of dollars and doesn't make much sense, since in Mr. Clinton's own words, "After a thorough review, my Administration has determined that the nuclear weapons in the United States' arsenal are safe and reliable."</p>
<p>Moreover, preparations for testing can take on a life of their own: 30 years after the Limited Test Ban Treaty put an end to above-ground tests, the U.S. still spends $20 million a year on Safeguard C, a program to keep test sites ready.</p>
<p>American security no longer rests on that sort of eternal nuclear vigilance. Mr. Clinton's moratorium may make America safer than all the tests and preparations for tests that the nuclear labs can dream up.</p>
 
</block>
 
</body.content>
 
</body>
 
</nitf>

提取代碼:

對(duì)多文件進(jìn)行操作,首先遍歷所有文件路徑,存到遍歷器中,然后對(duì)遍歷器中的文件路徑進(jìn)行逐一操作。

package com.njupt.ymh;
 
import java.io.File;
import java.util.ArrayList;
import java.util.List;
 
import edu.princeton.cs.algs4.In;
 
/**
 * 返回文件名列表
 * @author 11860
 *
 */
public class SearchFile {
 
 public static List<String> getAllFile(String directoryPath,boolean isAddDirectory) {
  List<String> list = new ArrayList<String>(); // 存放文件路徑
  File baseFile = new File(directoryPath); // 當(dāng)前路徑
  
  if (baseFile.isFile() || !baseFile.exists()) 
   return list;
  
  File[] files = baseFile.listFiles(); // 子文件
  for (File file : files) {
   if (file.isDirectory()) 
   { 
    if(isAddDirectory) // isAddDirectory 是否將子文件夾的路徑也添加到list集合中
     list.add(file.getAbsolutePath()); // 全路徑
    
    list.addAll(getAllFile(file.getAbsolutePath(),isAddDirectory));
   } 
   else 
   {
    list.add(file.getAbsolutePath());
   }
  }
  return list;
 }
 public static void main(String[] args) {
 
 //SearchFile sFile = new SearchFile();
 List<String> listFile = SearchFile.getAllFile("E:\\huadai", false);
 System.out.println(listFile.size());
 File file = new File(listFile.get(3));
 In in = new In(listFile.get(4));
 while (in.hasNextLine()) {
 String readLine = in.readLine().trim(); // 讀取當(dāng)前行
 System.out.println(readLine);
 
 }
 System.out.println(file.length());
 
 }
 
}
package com.njupt.ymh;
 
import java.io.File;
import java.util.Iterator;
import java.util.List;
 
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
 
public class NewsPaper {
 int doc_id; // 文章id
 String doc_title; // 文章標(biāo)題
 String lead_paragraph ; // 文章首段
 String full_text; // 文章內(nèi)容
 String date; // 文章日期
 public NewsPaper(String xml) {
 doc_id = -1; // 文章id
 doc_title = null; // 文章標(biāo)題
 lead_paragraph = null; // 文章首段
 full_text = null; // 文章內(nèi)容
 date = null; // 文章日期
 searchValue(xml);
 }
 
 /**
 * 加載Document文件
 * @param fileName
 * @return Document
 */
 private Document load(String fileName) {
 Document document = null; // 文檔
 SAXReader saxReader = new SAXReader(); // 讀取文件流
 
 try {
 document = saxReader.read(new File(fileName));
 } catch (DocumentException e) {
 e.printStackTrace();
 }
 
 return document;
 }
 
 /**
 * 獲取Document的根節(jié)點(diǎn)
 * @param args
 */
 private Element getRootNode(Document document) {
 return document.getRootElement();
 }
 
 /**
 * 獲取所需節(jié)點(diǎn)值
 * @param xml
 */
 private void searchValue(String xml) {
 Document document = load(xml);
  Element root = getRootNode(document); // 根節(jié)點(diǎn) 
  
  // 文章日期
  date = xml.substring(10, 20);
  // 文章標(biāo)題
  doc_title = root.valueOf("http://head/title");
  
  // 文章-id
  List<Node> list_doc_id = document.selectNodes("http://doc-id/@id-string"); 
  for(Node ele:list_doc_id){
   doc_id = Integer.parseInt(ele.getText());
  }
  
  // 文章內(nèi)容
  for (Iterator<Element> i = root.elementIterator(); i.hasNext();) { 
   Element el = (Element) i.next(); // head、body
   
   // 對(duì)body節(jié)點(diǎn)進(jìn)行操作
   if (el.getName() == "body") { // body
    for (Iterator<Element> body = el.elementIterator(); body.hasNext();) {
  Element elbody = body.next();
  
  if (elbody.getName() == "body.content") { //body.content
  for (Iterator<Element> block = elbody.elementIterator(); block.hasNext();) {
  Element block_class = (Element) block.next();
  
  if (block_class.attributeValue("class").equals("full_text") ) { // full_text
  List<Node> list_text = block_class.selectNodes("p");
  for (Node text : list_text) 
   if (full_text == null) 
   full_text = text.getStringValue();
   else 
   full_text = full_text +" " + text.getStringValue();
  }
  
  else { // lead_paragraph
  List<Node> list_lead = block_class.selectNodes("p");
  for (Node lead : list_lead) 
   if (lead_paragraph == null)
   lead_paragraph = lead.getStringValue();
   else 
   lead_paragraph = lead_paragraph +" "+ lead.getStringValue();
  }
  }
  }
 }
   }
  } 
 }
 
 /**
 * 獲取文章標(biāo)題
 * @param args
 */
 public String getTitle() {
 return doc_title;
 }
 
 /**
 * 獲取文章id
 * @param args
 */
 public int getID() {
 return doc_id;
 }
 
 /**
 * 獲取文章簡(jiǎn)介
 * @param args
 */
 public String getLead() {
 if (getID() < 394070 && lead_paragraph != null && lead_paragraph.length() > 6)  //1990-10-22之前
 return lead_paragraph.substring(6);
 else       //1990-10-22之后
 return lead_paragraph;
 }
 
 /**
 * 獲取文章正文
 * @param args
 */
 public String getfull() {
 if (getID() < 394070 && full_text != null && full_text.length() > 6)   //1990-10-22之前
 return full_text.substring(6);
 else
 return full_text;
 }
 
 /**
 * 獲取文章日期
 * @param args
 */
 public String getDate() {
 return date;
 }
 
 /**
 * 判斷獲取的信息是否有用
 * @return
 */
 public boolean isUseful() {
 if (getID() == -1)
 return false;
 if (getDate() == null ) 
 return false;
 if (getTitle() == null || getTitle().length() >= 255) 
 return false;
 if (getLead() == null || getLead().length() >= 65535 ) 
 return false;
 if (getfull() == null || getfull().length() >= 65535) 
 return false;
 
 return !isnum();
 }
 
 /**
 * 挑出具有特殊開(kāi)頭的數(shù)字內(nèi)容文章
 * @return
 */
 private boolean isnum() {
 if (getfull() != null && getfull().length() > 24) {
 if (getfull().substring(0, 20).contains("*3*** COMPANY REPORT") ) { // 剔除數(shù)字文章 
 return true;
 }
 }
 return false;
 }
 
 
 public static void main(String[] args) {
 List<String> listFile = SearchFile.getAllFile("E:\\huadai\\1989\\10", false); // 文件列表
 //String date; // 日期
 int count = 0;
 int i = 0;
 for (String string : listFile) {
 NewsPaper newsPaper = new NewsPaper(string);
 count++;
 if (!newsPaper.isUseful()) {
 i++;
 System.out.println(newsPaper.getLead());
 } 
 }
 
 System.out.println(i + " "+ count);
 
 }
}

 以上就是本文的全部?jī)?nèi)容,希望對(duì)大家的學(xué)習(xí)有所幫助,也希望大家多多支持腳本之家。

相關(guān)文章

  • SpringBoot對(duì)接Spark過(guò)程詳解

    SpringBoot對(duì)接Spark過(guò)程詳解

    這篇文章主要介紹SpringBoot接入Spark的方法的相關(guān)知識(shí),小編通過(guò)實(shí)際案例向大家展示操作過(guò)程,操作方法簡(jiǎn)單快捷,實(shí)用性強(qiáng),希望能幫助大家解決問(wèn)題
    2023-02-02
  • springboot 啟動(dòng)時(shí)初始化數(shù)據(jù)庫(kù)的步驟

    springboot 啟動(dòng)時(shí)初始化數(shù)據(jù)庫(kù)的步驟

    這篇文章主要介紹了springboot 啟動(dòng)時(shí)初始化數(shù)據(jù)庫(kù)的步驟,幫助大家更好的理解和使用springboot框架,感興趣的朋友可以了解下
    2021-01-01
  • java開(kāi)發(fā)命名規(guī)范總結(jié)

    java開(kāi)發(fā)命名規(guī)范總結(jié)

    包名的書寫規(guī)范 (Package)推薦使用公司或機(jī)構(gòu)的頂級(jí)域名為包名的前綴,目的是保證各公司/機(jī)構(gòu)內(nèi)所使用的包名的唯一性。包名全部為小寫字母,且具有實(shí)際的區(qū)分意義
    2013-10-10
  • SpringBoot Security前后端分離登錄驗(yàn)證的實(shí)現(xiàn)

    SpringBoot Security前后端分離登錄驗(yàn)證的實(shí)現(xiàn)

    這篇文章主要介紹了SpringBoot Security前后端分離登錄驗(yàn)證的實(shí)現(xiàn),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧
    2020-09-09
  • java在linux本地執(zhí)行shell命令的實(shí)現(xiàn)方法

    java在linux本地執(zhí)行shell命令的實(shí)現(xiàn)方法

    本文主要介紹了java在linux本地執(zhí)行shell命令的實(shí)現(xiàn)方法,文中通過(guò)示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下
    2022-02-02
  • 淺談Java多進(jìn)程程序的運(yùn)行模式

    淺談Java多進(jìn)程程序的運(yùn)行模式

    這篇文章主要介紹了淺談Java多進(jìn)程程序的運(yùn)行模式,包括對(duì)進(jìn)程阻塞問(wèn)題的討論等,需要的朋友可以參考下
    2015-11-11
  • Spring-cloud Config Server的3種配置方式

    Spring-cloud Config Server的3種配置方式

    這篇文章主要介紹了Spring-cloud Config Server的3種配置方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2021-09-09
  • ScheduledThreadPoolExecutor巨坑解決

    ScheduledThreadPoolExecutor巨坑解決

    這篇文章主要為大家介紹了使用ScheduledThreadPoolExecutor遇到的巨坑解決示例,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪
    2023-02-02
  • Java中日期與時(shí)間的處理及工具類封裝詳解

    Java中日期與時(shí)間的處理及工具類封裝詳解

    在項(xiàng)目開(kāi)發(fā)中免不了有對(duì)日期時(shí)間的處理,但Java中關(guān)于日期時(shí)間的類太多了,本文就來(lái)介紹一下各種類的使用及我們項(xiàng)目中應(yīng)該怎么選擇吧
    2023-07-07
  • Spring超詳細(xì)講解事務(wù)

    Spring超詳細(xì)講解事務(wù)

    Spring事務(wù)的本質(zhì)就是對(duì)數(shù)據(jù)庫(kù)事務(wù)的支持,沒(méi)有數(shù)據(jù)庫(kù)事務(wù),Spring是無(wú)法提供事務(wù)功能的。Spring只提供統(tǒng)一的事務(wù)管理接口,具體實(shí)現(xiàn)都是由數(shù)據(jù)庫(kù)自己實(shí)現(xiàn)的,Spring會(huì)在事務(wù)開(kāi)始時(shí),根據(jù)當(dāng)前設(shè)置的隔離級(jí)別,調(diào)整數(shù)據(jù)庫(kù)的隔離級(jí)別,由此保持一致
    2022-07-07

最新評(píng)論