From d81feeb0b255bd2260a3ef09f647f0a0043e5cfb Mon Sep 17 00:00:00 2001
From: xishui <42759595+Github3G@users.noreply.github.com>
Date: Thu, 3 Jun 2021 16:47:43 +0800
Subject: [PATCH] Create the original code address and question

---
 the original code address and question | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
 create mode 100644 the original code address and question

diff --git a/the original code address and question b/the original code address and question
new file mode 100644
index 0000000..92c5057
--- /dev/null
+++ b/the original code address and question	
@@ -0,0 +1,15 @@
+https://github.com/irfanICMLL/structure_knowledge_distillation
+
+Q1: when the batch size is smaller (such as 8 or 4), the performance is drop a lot. so the author discover a new kind of distillation loss, which is more useful.
+A1：https://arxiv.org/abs/2011.13256. Channel-wise Distillation for Semantic Segmentation
+
+Q2: Table 2 shows it performs best when beta=2x2.
+But in run_train_val.sh, pool-scale is 0.5, thus finally in CriterionPairWiseforWholeFeatAfterPool, the patch size is half of the original feature map size, not 2x2.
+A2: this question is not be answered.
+
+Q3: why structure knowledge distillation is effective and how it can be used for regression tasks? How to choose the intermediate feature maps for pair-wise distillation?
+A3: it considers the correlation among pixel. If the unary part is hard to learn or can not be trained effectively, employing the structure KD will help training. I tend to choose some deeper features. Because abstract semantics makes sense. 
+Besides, the spatial size is smaller which is more efficient.
+
+Q4: paper use average pooling not max pooling.
+A4: max pooling can achieve better performance.