-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The consistently failing spec #126
Comments
Pretty sure this is a cascading failure in the adapter specs. Seems the most common culprit is the Reel adapter spec. If it fails, the next adapter test fails too. I struggled with the flakiness of these tests for a long time. They pass consistently on my machine every time, but not so much on Travis. |
Can you point me at one of the Reel failures? |
Here's a JRuby test run where the Reel adapter fails. Later, the Rack adapter fails. https://travis-ci.org/seancribbs/webmachine-ruby/jobs/12748305 |
I'm able to unreliably reproduce this with |
That's the spec seed of https://travis-ci.org/seancribbs/webmachine-ruby/jobs/12748303 |
I'm not sure we should bother fixing the root cause of this leak. We could as well just use a fresh random port for each example, and let garbage collection take care of the rest. And when the process exits, they'll get released anyway. Example: https://github.com/lgierth/aeee/blob/master/spec/support/helper.rb#L15-L20 |
Yeah, I previously had the adapter tests do just that (89ce86e). I removed that because I thought I had gotten the servers to play nice and shut down. Seems like web servers generally expect to be at the center of the universe. Maybe the Adapter#shutdown API is just a pipe dream. |
(PR reference was a typo) |
One of today's builds shows a new error from Reel on rbx-19mode: https://travis-ci.org/seancribbs/webmachine-ruby/jobs/14224187#L1622 Could it be this is the actual error? Maybe the error gets swallowed most of the time due to something. It also always seem to be the same three examples that fail, except for the mentioned JRuby failure - where it's all of the examples in that example group. On the other hand, the error could as well just be a different symptom of something network-related. cc @tarcieri |
I just experienced another symptom, the first example hangs indefinitely, and isn't even affected by Note how i send SIGINT in line 6, and the difference in time between the former and latter log timestamps. |
Also note the spec command which uses |
And obviously also the second example, as I sent SIGINT again in line 7... |
This is possibly inside of Celluloid's |
here's an example of random Reel errors I see often: |
@robgleeson that's the same error:
Again this is either a bug in:
|
I would guess http_parser.rb if the same fail occurs on jruby. |
If the problem is Travis running multiple adapters in one build, and granted that that's not a real use case (?), could the adapter be parameterized to an env var and the build split into per-adapter? |
@samwgoldman regarding shutdown vs. random port assignment, I'm having trouble with shutdown and Rack+Mongrel. On a sidenote, I also noticed that only the WEBrick and Hatetepe adapters register shutdown as a SIGINT handler. FWIW, Rack::Handler#start traps SIGINT as well, and it simply exits the process unless the adapter implements shutdown. |
@lgierth Confirm that the shutdown implementation on the Rack adapter only works with Rack+WEBrick. This is definitely a problem. From the land of wishes, I really wish Ruby web servers provided usable shutdown methods. Until they do, it would seem that I guess that leaves us with two options: 1) random port assignment for adapter tests in a single test process and 2) separate test processes for each adapter. I'll also note that adapters can now be extracted into gems which depend on the exported RSpec shared examples. I somewhat snuck the Adapter#shutdown method into the public API, so its removal might require a greater-than-patch-level version bump. |
From what I understand, adding support for PUT will require a major bump as well, so maybe we could combine the two. |
more fails that look random/buggy in nature: |
and my comments mysteriously disappear, anyway the failures I linked to are not related to that error. it looks like reel can't acquire a port in one of its specs. |
@robgleeson in those tests I'm seeing Reel crash due to that error, and a connection refused error because the server isn't running... unless there's a different test failure I'm not seeing |
ah youre right, i didn't see that happen earlier in the trace. im not sure where the other failures are coming from(jruby/MRI). they look different and not related. |
|
build passed this time: https://travis-ci.org/seancribbs/webmachine-ruby/builds/14881037 |
@tarcieri have gotten already freed errors when reading request.body objects on MRI 2.1 in https://github.com/celluloid/reel/blob/master/lib/reel/request/parser.rb#L44, have gotten around it by assigning a instance variable to request.body.to_s |
def is_authorized?(auth = nil)
if request.get?
if request.query['hub.challenge']
validate_challenge
elsif request.query['code']
validate_access_token
else
authenticate_user(auth)
end
elsif request.post?
@request_body = request.body.to_s
Celluloid::Actor[:facebook].validate_update(@request_body, request.headers)
else
false
end
end
def process_post
begin
info MultiJson.load(@request_body)
rescue StandardError => e
error e
raise Webmachine::MalformedRequest, 'Invalid body'
false
end
end when i try to use request.body.to_s in both i get the already freed error, also happens when i try to do @request_body = ''
request.body.each do |body|
@request_body << body
end |
Oh and, the Error happens when you try to access the Resource concurrently |
This one is new: https://gist.github.com/lgierth/8516794 looks like a total lockup. That's 1.9.3 with the |
Ugh, i got a quite strange error with Reel now, once i set RACK_ENV to development it does request.body.to_s somewhere inside Webmachine and stops request.body.each from working then. |
@Asmod4n :( That sounds like bad method protocol interaction unfortunately. Would it make more sense for one of the sides (probably Reel) to use |
It works just fine when RACK_ENV is not defined or set to production oO |
That's really weird, especially considering Reel doesn't use Rack whatsoever! :O (without Reel::Rack anyway) |
Written a repro, happens when reel runs behind a apache 2.2 reverse proxy and $RACK_ENV is set to development |
Yikes. I might be able to give that a try in a bit. Still trying to wrangle the HTTP Gem into shape... |
From running several builds today (....) i believe what might solve some problems with travis is to assign a random port to each adapter, like: require 'socket'
port = TCPServer.new('::', 0).addr[1] |
which problems would that solve? a fix for the random failure(s) would be great |
nearly all failures i am getting today is because of the default port set in the spec. |
yeah, that seemed like the most common one for me too. i think it's worth a shot. |
could a sequential bump on a port also work, for each adapter? i did something like that in #150 but it could probably be revived to use unique ports for each adapter instead. |
maybe create a array with a random port for each adapter and have a counter which one to use for the current run. |
yeah, that works. you might be able to make the tests use Webmachine::Test.port and that just returns the port for the current adapter. |
@lgierth @seancribbs @robgleeson made a new branch to hopefully fix some spec fails, will make a PR when it looks good to you, it also changes how adapters work, so i can implement #153. https://github.com/seancribbs/webmachine-ruby/compare/expose_application |
looks good to me |
I'm having another look at the failing stuff: https://github.com/seancribbs/webmachine-ruby/compare/build-fix This fixes #154 in that it creates a server before each example, instead of before all examples. It also intercept's IO.select and overrides its timeout, otherwise WEBrick.shutdown would take the whole 2 seconds for each example. One really interesting observation is that all the applicable adapters (WEBrick, Reel, Rack) fail on the same example now (still only JRuby), which is
It seems like the FiberEncoder's fibers die prematurely. On that note, we can delete fiber18.rb. |
I just ran the build-fix branch against JRuby 1.7.6 to 1.7.12, and this seems to be a bug introduced with 1.7.11. With all tested versions, there were one to three JRuby exceptions like this one.
Until 1.7.10, these looked this:
The good news is that on JRuby 1.7.10 the build seems to be green! |
I tried to eliminate possible GC/reference issues by modifying TestResource#to_fiber to remember all Fibers.
|
Okay, that didn't fix it, and I was so frustrated that I'm updating here a couple of days later! ;) So I'm getting only greens locally, and only reds on Travis. My bet's this is reliant on either timing, or subtle changes between JVM versions. One way or the other, I'm gonna report this with JRuby, and add JRuby to |
@headius any ideas? |
@lgierth fixed it by currently ignoring jruby and rbx |
This has been mentioned in the other PR, and I remember a couple of random build failures in the past. Can someone point me to a failed build or provide intel? Let's try to come up with a fix. ⚡
The text was updated successfully, but these errors were encountered: