You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experiencing difficulties in accessing a ScrapyRT service running on specific ports within a Kubernetes pod. My setup includes a Kubernetes cluster with a pod running a Scrapy application, which uses ScrapyRT to listen for incoming requests on designated ports. These requests are intended to trigger spiders on the corresponding ports.
Despite correctly setting up a Kubernetes service and referencing the Scrapy pod in it, I'm unable to receive any incoming requests to the pod. My understanding is that in Kubernetes networking, a service should be created first, followed by the pod, allowing inter-pod communication and external access through the service. Is this correct?
Below are the relevant configurations:
scrapy-pod Dockerfile:
# Use Ubuntu as the base image
FROM ubuntu:latest
# Avoid prompts from apt
ENV DEBIAN_FRONTEND=noninteractive
# # Update package repository and install Python, pip, and other utilities
RUN apt-get update && \
apt-get install -y curl software-properties-common iputils-ping net-tools dnsutils vim build-essential python3 python3-pip && \
rm -rf /var/lib/apt/lists/*
# Install nvm (Node Version Manager) - EXPRESS
ENV NVM_DIR /usr/local/nvm
ENV NODE_VERSION 16.20.1
RUN mkdir -p $NVM_DIR
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
# Install Node.js and npm - EXPRESS
RUN . "$NVM_DIR/nvm.sh" && nvm install $NODE_VERSION && nvm alias default $NODE_VERSION && nvm use default
# Add Node and npm to path so the commands are available - EXPRESS
ENV NODE_PATH $NVM_DIR/versions/node/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/v$NODE_VERSION/bin:$PATH
# Install Yarn - EXPRESS
RUN npm install --global yarn
# Set the working directory in the container to /usr/src/app
WORKDIR /usr/src/app
# Copy the current directory contents into the container at /usr/src/app
COPY . .
# Install any needed packages specified in requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
# Copy the start_services.sh script into the container
COPY start_services.sh /start_services.sh
# Make the script executable
RUN chmod +x /start_services.sh
# Install any needed packages specified in package.json using Yarn - EXPRESS
RUN yarn install
# Expose all the necessary ports
EXPOSE 14805 14807 12085 14806 13905 12080 14808 8000
# Define environment variable - EXPRESS
ENV NODE_ENV production
# Run the script when the container starts
CMD ["/start_services.sh"]
start_services.sh:
#!/bin/bash
# Start ScrapyRT instances on different ports
scrapyrt -p 14805 &
scrapyrt -p 14807 &
scrapyrt -p 12085 &
scrapyrt -p 14806 &
scrapyrt -p 13905 &
scrapyrt -p 12080 &
scrapyrt -p 14808 &
# Keep the container running since the ScrapyRT processes are in the background
tail -f /dev/null
> k logs scrapy-deployment-56b9d66858-p59gs -f
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Site starting on 12080
2024-01-09 21:53:27+0000 [-] Site starting on 14808
2024-01-09 21:53:27+0000 [-] Site starting on 14805
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f4cbdf44d60>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fef9b620a00>
2024-01-09 21:53:27+0000 [-] Site starting on 13905
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 14807
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f0892ff4df0>
2024-01-09 21:53:27+0000 [-] Site starting on 14806
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f00d3b99000>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fba9e321180>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f1782514f10>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 12085
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fb2054cd060>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
Issue:
Despite these configurations, no requests seem to reach the Scrapy pod. Logs from kubectl logs show that ScrapyRT instances start successfully on the specified ports. However, when I send requests from a separate debug pod running a Python Jupyter Notebook, they succeed for other pods but not for the Scrapy pod.
Question:
How can I successfully connect to the Scrapy pod? What might be preventing the requests from reaching it?
Any insights or suggestions would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
using 0.0.0.0 ensures Twisted server that runs ScrapyRT application binds the service to all available network interfaces on the host machine, allowing it to be accessible from both inside and outside the Docker container. Other devices on the network can access the service using the host machine's IP address.
Can you try starting service with this argument if you're not doing it already?
I'm experiencing difficulties in accessing a ScrapyRT service running on specific ports within a Kubernetes pod. My setup includes a Kubernetes cluster with a pod running a Scrapy application, which uses ScrapyRT to listen for incoming requests on designated ports. These requests are intended to trigger spiders on the corresponding ports.
Despite correctly setting up a Kubernetes service and referencing the Scrapy pod in it, I'm unable to receive any incoming requests to the pod. My understanding is that in Kubernetes networking, a service should be created first, followed by the pod, allowing inter-pod communication and external access through the service. Is this correct?
Below are the relevant configurations:
scrapy-pod Dockerfile:
start_services.sh:
service yaml file:
deployment yaml file:
scrapy-pod's logs in Powershell terminal:
Issue:
Despite these configurations, no requests seem to reach the Scrapy pod. Logs from kubectl logs show that ScrapyRT instances start successfully on the specified ports. However, when I send requests from a separate debug pod running a Python Jupyter Notebook, they succeed for other pods but not for the Scrapy pod.
Question:
How can I successfully connect to the Scrapy pod? What might be preventing the requests from reaching it?
Any insights or suggestions would be greatly appreciated.
The text was updated successfully, but these errors were encountered: