From:  Steve Fink <sfink@mozilla.com>
Date:  19 May 2016 02:14:16 Hong Kong Time
Newsgroup:  news.mozilla.org/mozilla.dev.tech.js-engine.internals
Subject:  

componentization and specialization, was Re: Clang-format

NNTP-Posting-Host:  63.245.214.181

Plenty have people have weighed in on the pro-integration side, which I 
largely agree with, but I think nbp's motivation is valid and worth 
looking into as an indication of real problems that are only going to 
get worse if we commit entirely to maximum integration.

On 05/12/2016 05:04 PM, Ehsan Akhgari wrote:
> On 2016-05-12 9:53 AM, Nicolas B. Pierron wrote:
>> The more we empower people for working only on their domain(s) of
>> expertise, the less we would have need for such heroes.  Having persons
>> responsible for the integration would help us on that.
>
> As someone who has worked on many parts of the browser, I cannot
> disagree more.  Bugs just don't tend to align themselves within
> someone's "domain expertise".  I think of bugs which seem to only
> require me to only remember knowledge that I have accrued over the years
> as "easy bugs", and the more I learn about different parts of the
> browser, the more I come across issues that touch on the things that I
> have never learned about before, or things that I haven't learned well
> enough.  Those I call "normal bugs".

But as you say yourself, below, there is a place for specialists. If 
everyone tries to spread themselves across the whole browser, we're as 
surely doomed as if everyone tried to carve out their own corner. Some 
bugs *do* align well within one person's domain of expertise, and some 
of those bugs are nasty enough that you really do want a specialist to 
handle them.

> Also bugs are only one part of our work.  I have found out that the more
> parts of the browser I have learned about, the more I have been able to
> work on "cross-functional" things that have benefited the browser as a
> whole, and nowadays these sorts of stuff are commonplace projects that
> we're working on.  Take e10s as an example, that project doesn't map at
> all to any of the islands that we have created in our organization
> and/or code base.  We need more and more people to be able and willing
> to step out of the little islands, as the whole world has grown much
> smaller now and the islands that maybe we created for good reasons back
> in the day are really just an artifact of the past.

It's an important message, echoed with hard data at 
. But it doesn't negate the need for specialists.

> It's true that a successful project requires both specialists and
> generalists, and none of the above means that all people need to be
> generalists.  It means that we need to see these little islands
> (SpiderMonkey included) as what they are, remnants of the past simple
> days where we'd get away with pretending to have we have a bunch of
> loosely integrated components that we could make a browser out of.
> (Anyone remember the pre-libxul days where we pretended to have
> "libraries" and whatnot? ;-)

This is the part where I cannot fully agree. Sure, we started out with 
too fine of a breakdown, but the conclusion is not to roll everything up 
in a monolithic ball of mud.

Any boundary between components has a cost. Sometimes the cost is worth 
it, sometimes not. If you look at spidermonkey in particular, it seems 
pretty clear to me that keeping it architecturally separate from gecko 
is far better than letting them merge indiscriminately. The places where 
gecko-isms have crept into SM already make me pretty nervous.

Random example from an area I am close to: we have reasons for 
triggering a GC that are reserved for gecko's use, like REFRESH_FRAME 
and PAGE_HIDE. That's not something the JS engine should really be aware 
of. We really ought to be translating those actions into only the 
semantics that are meaningful to scheduling GC. As we say in the docs, 
"Mostly, Gecko uses these to indicate that a significant fraction of the 
scheduled zone's memory is probably reclaimable." We really ought to be 
communicating only in those sorts of terms. (And in practice, we pretty 
much are; those individual enum values are just documentation that gets 
into GC logs and telemetry.)

Another example would be how the way the DOM and the JIT integrate is 
kept as general as possible. It would be easier to just let them both 
know anything and everything about each other, but it would result in us 
painting ourselves into a corner.

As we have done numerous times in the past. The multiple JSContext 
model, I feel was based too closely on Gecko's needs, and we're 
architecturally better off making gecko adapt to something simpler. 
XPConnect (thankfully always kept separate from SM proper) continues to 
be difficult to disentangle from places where it's no longer the best 
solution. The CC/GC integration has always been problematic, though by 
its nature it's a hard one to keep separate. It's often hard to see when 
we're doing something mind-numbingly stupid, simply because we've rolled 
too much complexity up together for expedience or short-term performance 
wins.

I know I'm taking your argument to an extreme, making it something of a 
strawman, but I think my point stands that more integration is not 
always better.

I wouldn't go so far as to say that SM should be its own repo -- in 
fact, I'd be dead set against it. But I *do* think that keeping gecko 
and SM as disentangled as possible is a worthwhile goal.

nbp talked about one reason that I'd like to amplify: test failures that 
require the entire system, with huge setup and low visibility into what 
is actually happening. These are always going to be an unfortunate fact 
of life, but I believe we need to be better about reproducing these 
sorts of failures and extracting standalone test cases from them. While 
the ideal would be some sort of record-and-replay system that captures 
communication between gecko and SM, I think there are much more modest 
steps that could be taken to improve determinism and make it faster to 
focus in on the actual problematic pieces of a test failure. Sometimes 
that means coming up with a way to construct an SM-only test case, and 
sometimes it means being able to conclusively determine that the problem 
is either in the test or outside of SM.

Whether or not concluding a failure is due to non-SM gecko means that 
the person investigating the bug changes is a separate question, having 
to do with workloads and priorities. Which brings me to the next issue: 
letting specialists be specialists. I believe we would benefit from 
having someone around who could (if it turned out to make sense) rewrite 
the whole JIT engine on a somewhat different model. But that is simply 
impossible if that person is constantly getting distracted by test 
failures that require deep investigation of unfamiliar corners of Gecko. 
It's too much to ask of an individual person to be able to be very deep 
and concentrated on one subsystem and at the same time be productive in 
jumping into completely unrelated areas. (Or rather, expect what you 
want, but if you took away half of that job, they'd be far better at the 
other half, and some tasks can only be accomplished by someone working 
at full capacity in a specialized area. It doesn't matter how much extra 
time you give them, it simply *will not happen*. And we restrict our 
possible advances to those doable by generalists or distracted specialists.)

I don't think that requires a whole separate integration team. It does 
mean that the expectations for some people will be different than for 
others. Which I'm sure is already the case, but I think that 
organizationally we should support this rather than discouraging it.

On a related note, I think we're also organizationally weak about taking 
advantage of ideas and new approaches. Specifically, you can come up 
with all the great ideas you want, as long as you're going to 
personally implement them. Anything that would best be done by someone 
else or a collaboration of multiple people requires strong salesmanship, 
which is not always the forté of the person coming up with the idea. But 
I've rambled on too long already, so I'm not going to get into that.

> The current reality how web browsers are made don't afford us treating
> our codebase in that way any more.  To me, discussions around unifying
> the coding style drill into the heart of this problem: we need to move
> on from having small compartments with people painting the walls to
> their heart's content, and treat the browser as the one unified beast
> that it truly is.  Not because it matters how much whitespace we put
> where, but because a unified style paints the walls in a way that urges
> people to stop thinking in terms of small islands, and start thinking in
> terms of Firefox as a whole.

Nothing I particularly disagree with here, although I think of it more 
as minimizing the pointless difference between components that add 
friction, while still maintaining the separability in terms of testing 
and even distribution (in the case of spidermonkey). I want to think of 
Firefox as a whole that is comprised of separable parts, though only 
when separating those parts is actually useful.

Anyway, I'll try to list out some specifics that I see lacking and have 
potential for improvement.

- too hard for specialists to focus and stay specialized

- too hard to reproduce test failures

- too hard to minimize test failures

- we don't have a clear view of where the gecko architecture should and 
shouldn't have component boundaries

- we learn too little from tests we fix. We need some root cause 
analysis after the fix, especially for bugs in the tests themselves. And 
the results should be disseminated.

- the Mozilla "idea/innovation engine" is powered solely by heroes