Discussion Aug 9th

Jump to bottom

Kai Wang edited this page Aug 9, 2022 · 6 revisions

Discussion on Aug 9th

Implementation plan

Screenshot the satellite imagery from 50000 lat long locations in 2017, 2019, 2021
Use a pretrained model to run segmentation/object detection on the satellite imagery
Train a model to compare the segmentation of 2017, 2019, 2021 at the same location to output a single scalar as the probability of finding illegal factory expansion.

TODO:

已經做好的地號多邊形資料
Pretrained pytorch segmentation model: https://github.com/qubvel/segmentation_models.pytorch#start
Similarity model: train on the 新增+擴建資料

國土利用監測整合資訊網

https://landchg.tcd.gov.tw/Module/RWD/Web/Default.aspx

地號多邊形

可以使用內政部的資料先過濾掉不是在這個地號裡面的東西，然後再進行比對
使用新增+擴建的資料來當作positive data

Crowdsourcing data v.s. 農委會 50000 筆資料點位差異

6000 筆資料是比對2017,2020農委會的資料之間的差異，來生成點位
至於 2017 或是更久之前的點位，是沒有經過確認的資料
Current result: 100~200 illegal factory expansion / 6000 data points, around 3-5% illegal factory identification
照道理現在這些點位，應該都會pass給地方區公所去確認，但那個確認是不是很嚴謹這個不確定因為也沒有相關裁罰紀錄
農委會釋出的疑似工廠點位都有一個「合法」的用途，這部份需要經濟部才能檢查，目前data不包含這個資料

Satellite imagery 紅外線 and RGB data

可見光波段不是最適合判斷有沒有建物的範圍
有沒有可能拿到紅外線資料？Google earth engine 解析度太差
中央大學圖資是從法國衛星
目前使用的圖資：https://livingatlas.arcgis.com/wayback/#active=13851&ext=-115.36279,36.02694,-115.23421,36.10104
福衛 not available

ML-crawler 討論紀錄:

重點是要學兩張衛星雲圖的 similarity ，去辨識是否有建物差異（無法辨識是否合法或是否是工廠）希望這樣之後可以拿來過濾其實不需要人工去看的點位，因為演算法辨識已太相似了主要是 @Tuo Hung @Kai Wang @YAlgorithm 下去做

https://github.com/Disfactory/ml-crawler/wiki/Discussion-Aug-9th 有幾個技術問題需要請大家幫忙回答一下

怎麼從經緯度或地號 polyglot @yukai @IU @yellowsoar （我記得我們已經做掉了）
農委會一開始的五萬筆資料是怎麼找變異點的，知道是哪個團隊用什麼資料和演算法做的嗎？不想重工。以及資料集中標示的合法用途分類是怎麼來的 @deeper @chewei
需要 merge crowdsourcing label + 地公 label for ML training （就是地公人員和實習生標圖像真的有差異的，不管合不合法，只管有沒有建物差異） @deeper @peii
建議拿來 train 的圖資來源？要看哪個波段拿來 train？（@peii 說政府資料是用紅外線波段） @Karen Chen

之前相關文件連結

大家來找廠企劃文件 https://g0v.hackmd.io/@yukaii/Disfactory/%2F5QH2TlaXQR21wMo3UIKZug
討論白板 https://docs.google.com/presentation/d/1hyak0PdXA3CpxSC82T7ahmp0AbqjEJS2c3AcnhURpnw/edit
地號辨識更新，另一個方法可以從 Ronny 的 API 拿到地段範圍，並繪製於目前座標的圖層上從 Easymap 拿到地段，從 https://twland.ronny.tw/ 拿到圖資繪製到地圖上，不過 1. 的 easymap 現在在後端是 async job，是否要開個 API、是否會增加後端負擔、速度表現，可能都需要再想

之前的空拍討論紀錄： https://g0v.hackmd.io/1wqed0vfQtW91i9CkM5emA