Skip to content

Commit

Permalink
Merge pull request #46 from CybercentreCanada/render_html
Browse files Browse the repository at this point in the history
Render html
  • Loading branch information
cccs-rs authored Feb 22, 2023
2 parents 2f5c9b7 + 906c4ba commit 87f2cbd
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 1 deletion.
4 changes: 4 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ RUN dpkg -i LibreOffice_${LIBRE_BUILD_VERSION}*/DEBS/*.deb && rm -rf LibreOffice
RUN apt-get install -y libdbus-1-3 libcups2 libsm6 libice6
RUN ln -n -s /opt/libreoffice${LIBRE_VERSION} /usr/lib/libreoffice

# Install Chrome for headless rendering of HTML documents
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
apt install -y ./google-chrome-stable_current_amd64.deb && rm -f ./google-chrome-stable_current_amd64.deb

# Switch to assemblyline user
USER assemblyline

Expand Down
9 changes: 9 additions & 0 deletions document_preview/document_preview.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,15 @@ def render_documents(self, request: Request, max_pages=1):
eml2image(file_contents, self.working_directory, self.log,
load_ext_images=self.service_attributes.docker_config.allow_internet_access,
load_images=request.get_param('load_email_images'))
# HTML
elif request.file_type == "code/html":
with tempfile.NamedTemporaryFile(suffix=".html") as tmp_html:
tmp_html.write(request.file_contents)
tmp_html.flush()
with tempfile.NamedTemporaryFile(suffix=".pdf") as tmp_pdf:
subprocess.run(['google-chrome', '--headless', '--no-sandbox', '--hide-scrollbars',
f'--print-to-pdf={tmp_pdf.name}', tmp_html.name], capture_output=True)
self.pdf_to_images(tmp_pdf.name, max_pages)

def execute(self, request):
start = time()
Expand Down
2 changes: 1 addition & 1 deletion service_manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: DocumentPreview
version: $SERVICE_TAG
description: Use OCR to detect for signs of malicious behaviour in Office and PDF files

accepts: document/(pdf$|office/.*|email)
accepts: document/(pdf$|office/.*|email)|code/html
rejects: empty|metadata/.*|document/office/onenote

stage: CORE
Expand Down

0 comments on commit 87f2cbd

Please sign in to comment.