Skip to content

Latest commit

 

History

History
 
 

Week5

3/26 第五週 | Week 5

目錄 | Index

  • 課堂練習 (Exercise)
  • 作業 (Homework)
  • 課堂練習 | Exercise

    說明 | Introduction

    請同學使用 Python抓蟲 去抓 王經篤老師黃明祥老師 的 publication 網頁資料並將資料寫入到資料檔。將操作過程以 Github Repository 的 Readme 進行記錄。將 Github link 上傳至創課平台
    王經篤老師: http://dns2.asia.edu.tw/~jdwang/PaperList.htm
    黃明祥老師: http://isrc.ccs.asia.edu.tw/www/myjournal/myjournal.htm

    Eng. Ver

    Make a web-crawler (Worm) to fetch the Publication Page of Teacher Wang or Teacher Huang, and put these data into an output file e.g."output.txt".
    Record all of this step you've done, write it down into "README.md", then submit the Github link to TronClass.
    Teacher Wang: http://dns2.asia.edu.tw/~jdwang/PaperList.htm
    Teacher Huang: http://isrc.ccs.asia.edu.tw/www/myjournal/myjournal.htm

    程式碼 | Source Code

    f = open('output-publication.txt', 'w', encoding='utf8')
    soup = bs4.BeautifulSoup(rc, 'html.parser')
    for tagP in soup.find_all('p', 'MsoNormal'):
        t = tagP.text.replace('\t', '').replace('\n', '')
        f.write(t+'\n')
    f.close()
    

    檔案 | Files

  • Exercise.py
  • ExerciseEn.py (Eng. Ver)
  • 輸出檔 | Output File

  • output-publication.txt

  • 作業 | Homework

    說明 | Introduction

    請同學使用 Python 爬蟲程式抓出亞大資工系 103 學年度所有的畢業專題資訊,並將資料寫入到資料檔。
    https://csie.asia.edu.tw/project/semester-103
    ps. 請開新的 Github repository 完成作業後將 Github link 上傳至創課平台

    Eng. Ver

    Make a web-crawler(Worm) to fetch AU CSIE all of the Infomations of Guaduation Project at 103 School Year, and put these data into an output file, the link is down below.
    https://csie.asia.edu.tw/project/semester-103
    p.s. Create a new Github repository for this program, submit your Github Repository link to TronClass.

    程式碼 | Source Code

    response.encoding = "utf8"
    soup = bs4.BeautifulSoup(response.content, "html.parser")
    f = open("output-graduation_projects.txt", "w", encoding="utf8")
    for table in soup.find_all("table"):
        for row in table.find_all("tr"):
            for cell in row.find_all("td"):
                t = cell.text.replace("\t", "").replace("\n", "")
                f.write(t+"\t")
            f.write("\n")
        f.write("\n")
    f.close()
    

    檔案 | Files

  • Homework.py
  • HomeworkEn.py (Eng. Ver)
  • 輸出檔 | Output File

  • output-graduaction_projects.txt

  • Author: 109021331 CYouLiao