Inverting the Web 

We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.

Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.

@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).

Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.

My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...

@freakazoid ... and reputation.

But a mechanism in which:

1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known

... might disrupt things a tad.

Somewhat more:
news.ycombinator.com/item?id=2

NB: the reputation bits might build off social / netgraph models.

But yes, I've been thinking on this.

@enkiv2 I know SEARX is: en.wikipedia.org/wiki/Searx

Also YaCy as sean mentioned.

There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.

Being dropped by Firefox BTW.

That provides a query API only, not a distributed index, though.

@freakazoid @drwho

@dredmorbius @enkiv2 @freakazoid YaCy isn't federated, but Searx is, yeah. YaCy is p2p.
@dredmorbius @enkiv2 @freakazoid Also, the initial criticism of the URL system isn't entirely there: the DNS is annoying, but isn't needed for accessing content on the WWW. You can directly navigate to public IP addresses and it works just as well, which allows you to skip the DNS. (You can even get HTTPS certs for IP addresses.)

Still centralized, which is bad, but centralized in a way that you can't really get around in internetworked communications.

@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.

If you request by IP, you'll get only the default / primary host on that IP address.

That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.

@freakazoid Question: is there any inherent reason for a URL to be based on DNS hostnames (or IP addresses)?

Or could an alternate resolution protocol be specified?

If not, what changes would be required?

(I need to read the HTTP spec.)

@kick @enkiv2

@dredmorbius @kick @enkiv2 HTTP URLs don't have any way to specify the lookup mechanism. RFC3986 says the part after the // and optional authentication info followed by @ is a "registered name" or an address. It doesn't say the name has to be resolved via DNS but does say it is up to the local system to decide how to resolve it. So if you just wanted self-certifying names or whatever you can use otherwise unused TLDs the way Tor does with .onion.

@freakazoid Hrm....

So:

There are alternate URLs, e.g., irc://host/channel
news://newsgroup/

I'm wondering if a standard for an:

http://<address-proto><delim>address> might be specifiable.

Onion achieves this through the onion TLD. But using a reserved character ('@' comes to mind) might allow for an addressing protocol _within_ the HTTP URL itself, to be used....

@kick @enkiv2

@dredmorbius @kick @enkiv2 @ is already reserved for the optional username[:password] portion before the hostname.

@freakazoid @dredmorbius @enkiv2 Is ! still reserved (! may be a DNS thing actually, thinking about it further)?

@kick As of RFC 2369, "!" was unreserved. That RFC is now obsolete. Not sure if status is changed.

tools.ietf.org/html/rfc2396

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid Entirely unrelated because I just remembered this based on @kragen's activity in this thread:

Vaguely shocked that I'm interacting with both of you because I'm pretty sure you two are the people I've (at least kept in memory for long enough) read the words of online consistently for longest. (Since I was like, eight, maybe, on Kragen's part. Not entirely sure about you but less than I've checked canonical.org/~kragen for by a decent margin at least.)

@kick Clue seeks clue.

You're asking good questions and making good suggestions, even where wrong / confused (and I do plenty of both, that's not a criticism).

You're helping me (and I suspect Sean) think through areas I've long been bothered about concerning the Web / Internet. Which I appreciate.

(Kragen may have this all figured out, he's far certainly ahead of me on virtually all of this, and has been for decades.)

@enkiv2 @kragen @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid while I appreciate the vote of confidence, and I did spend a long time figuring out how to build a scalable distributed index, I am as at much of a loss as anyone when it comes to figuring out the social aspect of the problem (SEO spam, ranking, funding).

@kragen I see a lot of this coming down to:

- What is the incremental value of additional information sources? At some point, net of validation costs, this falls below zero.

- Google's PageRank relied on inter-document and -domain relations. Author-based trust hasn't carried as much weight. I believe it needs to.

- Randomisation around ranking should help avoid systemib bias lock-ins.

- Penalties for fraud, with increasing severity and duration for repeats.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid one of the nice things about PageRank is that the Perron–Frobenius theorem guarantees a well-defined result precisely because it has no penalties; penalties can give rise to the Eigenmoses problem, as described in news.ycombinator.com/item?id=2

Follow

@dredmorbius @kick @enkiv2 @freakazoid Trump supporters label NPR as "fake news"; Trump opponents label Fox as "fake news". Presumably one side will win and the other will be penalized for linking to fake news, with increasing severity and duration or repeats. There's no particular reason to expect that it will be the correct side. See also: the Crusades, blood libel, babies ripped out of incubators, Lysenkoism. PageRank is immune to that.

@kragen True.

There's objective truth, and there's concensus truth. The two seldom match up.

Old Mr. Free Speech Hisself, John Stuart Mill, wasn't optimistic on the truth's capacity to out.

If it's necessary to set up competing credentialing networks which operate independently (competing churches?), that ... might have to happen.

Motivated irrationality is, unfortunately, A Thing. And can be quite lucrative and rewarding, at least in the short term.

@kick @enkiv2 @freakazoid

@kragen @dredmorbius @kick @enkiv2 @freakazoid
In the absence of any negative feedback, whoever can produce the most positive feedback will win (and when competing on access to information, winning accumulates). Whoever gets an early monopoly has a lot of control over the worldview even after they lose that monopoly...

@enkiv2 Pretty much this.

It's an evolutionary problem, I think, with likely analogues and lessons in biological evolution.

Negative feedbacks are fitness checks?

@enkiv2 Right.

Though my question was, specifically: are negative feedbacks fitness checks? That is, the "selection" process within "variation, inheritance, and selection".

And vice versa: are fitness checks / selection processes negative feedback?

Not sure that they are or aren't. Musing on this.

Within a systems context, yes, negative feedback is required for sustainable function.

@dredmorbius @enkiv2
elimination of options based on failure of fitness checks certainly is a subset of negative feedback. i'm not assuming that the negative feedback in question is non-arbitrary though. it's just that in the absence of any negative feedback, everything goes positive, and whoever has the largest reach cannot be beaten. with negative feedback a powerful actor can be deplatformed by a coalition.

Sign in to participate in the conversation
Mastodon on NerdCulture

All friendly creatures are welcome. Be excellent to each other, live humanism, no nazis, no hate speech. Not only for nerds, but the domain is somewhat cool. ;) No bots in general! (only with prior permission)