Inverting the Web 

We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.

Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.

@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).

Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.

My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...

@freakazoid ... and reputation.

But a mechanism in which:

1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known

... might disrupt things a tad.

Somewhat more:

NB: the reputation bits might build off social / netgraph models.

But yes, I've been thinking on this.

@enkiv2 I know SEARX is:

Also YaCy as sean mentioned.

There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.

Being dropped by Firefox BTW.

That provides a query API only, not a distributed index, though.

@freakazoid @drwho

@dredmorbius @enkiv2 @freakazoid YaCy isn't federated, but Searx is, yeah. YaCy is p2p.
@dredmorbius @enkiv2 @freakazoid Also, the initial criticism of the URL system isn't entirely there: the DNS is annoying, but isn't needed for accessing content on the WWW. You can directly navigate to public IP addresses and it works just as well, which allows you to skip the DNS. (You can even get HTTPS certs for IP addresses.)

Still centralized, which is bad, but centralized in a way that you can't really get around in internetworked communications.

@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.

If you request by IP, you'll get only the default / primary host on that IP address.

That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.

@freakazoid Question: is there any inherent reason for a URL to be based on DNS hostnames (or IP addresses)?

Or could an alternate resolution protocol be specified?

If not, what changes would be required?

(I need to read the HTTP spec.)

@kick @enkiv2

@dredmorbius @kick @enkiv2 HTTP URLs don't have any way to specify the lookup mechanism. RFC3986 says the part after the // and optional authentication info followed by @ is a "registered name" or an address. It doesn't say the name has to be resolved via DNS but does say it is up to the local system to decide how to resolve it. So if you just wanted self-certifying names or whatever you can use otherwise unused TLDs the way Tor does with .onion.

@freakazoid Hrm....


There are alternate URLs, e.g., irc://host/channel

I'm wondering if a standard for an:

http://<address-proto><delim>address> might be specifiable.

Onion achieves this through the onion TLD. But using a reserved character ('@' comes to mind) might allow for an addressing protocol _within_ the HTTP URL itself, to be used....

@kick @enkiv2

@dredmorbius @kick @enkiv2 @ is already reserved for the optional username[:password] portion before the hostname.

@freakazoid @dredmorbius @enkiv2 Is ! still reserved (! may be a DNS thing actually, thinking about it further)?
@dredmorbius @enkiv2 @freakazoid Entirely unrelated because I just remembered this based on @kragen's activity in this thread:

Vaguely shocked that I'm interacting with both of you because I'm pretty sure you two are the people I've (at least kept in memory for long enough) read the words of online consistently for longest. (Since I was like, eight, maybe, on Kragen's part. Not entirely sure about you but less than I've checked for by a decent margin at least.)

@kick Clue seeks clue.

You're asking good questions and making good suggestions, even where wrong / confused (and I do plenty of both, that's not a criticism).

You're helping me (and I suspect Sean) think through areas I've long been bothered about concerning the Web / Internet. Which I appreciate.

(Kragen may have this all figured out, he's far certainly ahead of me on virtually all of this, and has been for decades.)

@enkiv2 @kragen @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid while I appreciate the vote of confidence, and I did spend a long time figuring out how to build a scalable distributed index, I am as at much of a loss as anyone when it comes to figuring out the social aspect of the problem (SEO spam, ranking, funding).

@kragen I see a lot of this coming down to:

- What is the incremental value of additional information sources? At some point, net of validation costs, this falls below zero.

- Google's PageRank relied on inter-document and -domain relations. Author-based trust hasn't carried as much weight. I believe it needs to.

- Randomisation around ranking should help avoid systemib bias lock-ins.

- Penalties for fraud, with increasing severity and duration for repeats.

@kick @enkiv2 @freakazoid

@kragen - Some way of vetting new arrivals / entities, such that legitimate newcomers aren't entirely locked out of the system. Effectively letters of recommendation or reference.

@kick @enkiv2 @freakazoid

@dredmorbius @kragen @enkiv2 @freakazoid How much privacy are you willing to sacrifice with this?

Taking a single possibility (I listed a few) from a thing I wrote to a couple of posts up-thread but didn’t send because I want to hear someone’s opinion on a sub-problem of one of the guesses listed:

Seed with trusted users (i.e. people submitting sites to crawl), rank preferentially by age (time-limited; would eventually wear off), then rank on access-by-unique-users. Given that centralized link aggregators wouldn’t disappear, someone throws HN in, for example, the links on HN get added into the pool, whichever get clicked on most rise up, eventually get their own ranking, etc.

This works especially well if using what I sent the e-mail to inquire a little more about: cluster sorting rather than just barebacking text (this is what Yippy does, for example, and what Blekko used to do), because it promotes niche results better than Google’s model with smaller datasets, and when users have more seamless access to better niches, more sites can get rep easier. Example: try vs. throwing your username into Google. The clustering allows for much more informative/interesting results, I think, especially if doing inquisitive searching.

Kragen mentioned randomly introducing newcomers (adding noise), but I think it might work better still if noise was added to the searches for at least the beginning of it. A single previously-unclicked link on the first five pages of search results?

@kick As little as possible.

I've not participated online under my real name (or even vague approximations of it) for a decade or more. That was seeming increasingly unattractive to me already then. And I'd been online for at least two decades by that point.

Of the various dimensions of trust, anti-sock-puppetry is one axis. It's not the only one. It matters a lot in some contexts. Less in others.

Doxxing may be occasionally warranted.

Umasking is a risk.

@enkiv2 @kragen @freakazoid

@dredmorbius @enkiv2 @kragen @freakazoid Privacy isn't just deanonymizing! You can also track pseudonyms.

@kick Right. My comments were aimed more at qualifying my interest in / preferences for privacy.

I'm finding contemporary society to be very nearly intolerable. And probably ultimately quite dangerous.

@enkiv2 @kragen @freakazoid


@dredmorbius @kick @enkiv2 @freakazoid yeah, although in many ways it's an improvement over Golden Horde society, Ivan the Terrible society, Third Crusade society, Diocletian society, Qin Er Shi society, Battle of the Bulge society, Khmer Rouge society, Holodomor society, People's Temple society, the society that launched the Amistad, etc. We didn't start the fire.

· · Web · 3 · 0 · 2

@kragen I'm referencing specifically the surveillance aspects, and the accellerating pace of that espeically over the past two decades or so. Though you can trace the trends back the the 1970s, generally.

Paul Baran was writing of the risks ~1966-1968, which is 52-54 years ago now.

IBM were actively demonstrating the risks 1939-1945.

Herbert Simon conveniently ignorant of this in 1978, when Zuboff discovered surveillance capitalism in her research.

@kick @enkiv2 @freakazoid

@kragen Of the various drawbacks of the Mongol Hordes, massive mobile technological surveillance was not a prominent aspect.

The Battle of the Bulge and Holdomor societies _did_ benefit from informational organisation. Khmer Rouge and People's Temple may have, and the capabilities certainly existed.

General capabilities began ~1880, again with Holerith, nascent IBM.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid depending on who you were and where you lived, it was easy to end up with very little privacy after the Mongol invasion. The fact that the technologies employed were things like chains and swords rather than punched cards and loyalty scores was cold comfort to the enslaved. But, yes, I meant that the societies were more regrettable overall, not necessarily specifically along the surveillance axis.

@kragen My evolving thought is that privacy is an emergent concept, it's a force that grows proportionately to the ability to invade personal space and sanctum.

Pretechnical society had busybodies, gossibs, evesdroppers, spies, and assassins.

But if you wanted to listen to or observe someone, you had to put a body in proximity to do it. Preliterate (or largely so) society plebes didn't even leave paper trails. A baptismal, marriage, and will, if you were lucky.

@kick @enkiv2 @freakazoid

@kragen We're at an age where a chat amongst friends, as here, is creating a distributed global written record, doubtless being scraped by academics, corporations, and state and nonstate surveillance systems.

US phone call history records date to the mid-1980s (if not before). Purchase, social, employment, and location records are comprehensive for at least the past decade, if not five or more.

@kick @enkiv2 @freakazoid

If privacy is the ability to define and defend limits on information disclosure, there is precious little left.

The information glut is so immense that even multi-billion-dollar-funded state intelligence apparatus cannot meaningfully utilise the information preemptively. And yet those same state actors leak and lose their own personnel and intelligence data. Political organisations have email leaked. Generals and possibly presidents are downed.

@kick @enkiv2 @freakazoid

@kragen The same state actors drop death on the sky based on cellphone metadata and other data traces.

And those are the ones we think of as the good guys.

China, Saudi, Israel, Russia, and who knows who all else, are doing far worse.

And we're only really a decade in to this brave new mobile-data-surveillance world.

@kick @enkiv2 @freakazoid

@dredmorbius @kragen @enkiv2 @freakazoid Bingo, this is exactly what I was thinking when I posted that Cantrill quote.

@kick @enkiv2 @dredmorbius @freakazoid OH. I see now. You weren't referring to what Bryan was doing to Dave or vice versa; you were referring to the fact that we are talking about it a quarter century later. Yeah, it seemed like a good idea at the time. 'course, at the time we were only a few tens of millions of Netizens.

@kragen How many times have you been thankful you went to university in the age of film cameras, and prior to Facebook, Twitter, Snapchat Tik Tok, YouTube, Imgur, Reddit, ...

My Stupid Shit is at best recorded on a single frame of film, or a few fading memories.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid Well, privacy invasion was more typically done by your father, your husband, or your owner in many of these societies, rather than by the secret police. But it was in many cases quite pervasive. Of course when we think about medieval Europe, it's easier to imagine ourselves as monks, knights, or at least yeomen, than as villeins in gross, vagabonds, or women who died in forced childbirth, precisely because of that paper trail.

@kragen @dredmorbius @enkiv2 @freakazoid It's still done by all of those! Now it's just a mixed bag. Think about how much an adolescent risks if a guardian finds one of their social media handles (assuming they're doing anything interesting), for example.

@kick @enkiv2 @dredmorbius @freakazoid Much less so! In rich countries most women do have a room of their own, for example, and very few families will disown their children for premarital sex. Even gay sex is unlikely to result in fatal social sanctions in much of the world. Being kicked out of the house by your parents in your childhood is no longer a near death sentence. And of course many fewer people have owners at all, much less owners who can kill them at will with impunity.

@kragen On the other hand, previously one could travel, even a short distance, though also longer, and put much of that threat behind, starting over with a fresh identity.

That's ... extraordinarily difficult these days. Not unheard of, but it takes far more effort, risks and likelihood of being caught and exposed are much higher, and The System Never Forgets.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid most people couldn't; banishment was tantamount to a death sentence unless there was a recently-genocided frontier nearby. even then, it meant you'd never see anyone you loved ever again. but there were intermediate levels. even if traveling from town to town in many epochs posed a high risk of being robbed, and the near certainty of being raped if you were a young woman, there were other times when it did not; and even if there was a risk, you might return

@kragen @dredmorbius @enkiv2 @freakazoid This is ignoring the recent past pre-network somewhat, isn't it? Before passports were required for international flight, people could more or less do this without consequence. Also, the minimal size of community probably made the prospect of not seeing loved ones again much easier even back in the oldest times.

@kick @enkiv2 @dredmorbius @freakazoid when there was flight but no passports required for international flight, almost everyone either could not afford flight at all, lived in countries like the USSR that granted exit visas sparingly, or both. And international travel was itself riskier; my sister rode her motorcycle from the US to Argentina to visit me a few years ago, prompting the remark from my father that when he was her age, in the late 1970s, nobody would have survived attempting that.

@kragen @enkiv2 @dredmorbius @freakazoid US to Argentina is a lot more difficult a trek than US to Canada ever was, isn't it? My view here may be a bit influenced by all of the people I know who fled the US for Canada before the border was actually maintained in any way other than symbolically, so I figure I could be biased, here.

Agree with most of the rest of that, though.

@kick @enkiv2 @dredmorbius @freakazoid as for the prospect of not seeing loved ones again, I think the truth is rather the contrary: people today are much less close to their families than even a quarter century or half century ago, emotionally speaking. (For many of them that's a blessing, of course, but it still makes moving easier.)

@kragen Since the time periods and regimes we're discussing seem rather vaguely defined:

- When I spoke of modern surveillance society being near intolerable, I'm contrasting it with my own personal experience of the relatively recent past, say, life since 1970.

- More broadly, there's been a recent history of high mobility starting roughly 1800 - 1850 (corresponding largely with industrialisation and motorised sea and land transport), through about 2000.

@kick @enkiv2 @freakazoid

@kragen Travel freedoms weren't complete, but were _extensive_.

Modern passport controls began roughly in WWI.

Ethnic emigration controls existed, though were successively lifted largely ~1920 - 1970 in many areas.

*Internal* migration within nation-states was extensive, e.g., the Great Migration, Westward Migration, Dust Bowl migration, Rust-Belt to Sun-Belt, Brooklyn-to-Miami, California migration ~1930 - 1980, and general rural-to-urban and core->suburb flight.

@kick @enkiv2 @freakazoid

@kragen You also had criss-crossing transatlantic flows, blacks out of the United States, jews in, in the early-to-mid 20th century. Much movement throughout British Commonwealth states. Huge movements throughout Europe.

Generally: an ability from 1800 - 2000 of picking up, moving elsewhere, and starting over again, throughout large (and for that time an expanding) part of the world.

And tracking was ... limited.

Passports and driver's licences: paper-based.

@kick @enkiv2 @freakazoid

@kragen Some banking records and the like.

And the precursors of modern credit bureaux: Dunn and Bradstreet dates to the 1800s (the increased mobility made tracking reputations more important). The first modern novel on con-men, as opposed to mere tricksters, Melville's "The Confidence Man", is set on the high-mobility throughway of its time, the steamboat-traversed Mississippi River. Mobility and distance communications opens new avenues of fraud.

@kick @enkiv2 @freakazoid

@kragen But for the average person, *with the ability to travel*, one that was *widly* available 1850 - 2000, you could, for the most part, get up, transfer, and leave your past behind.

Not perfectly. But as a real possibility.

That ... seems far less possible now, taking a static read. More troubling is the trend, which looks strongly exponential, suggesting the near future will not resemble a decades-to-centuries distant past much at all.

That's my argument.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid yeah, I mostly agree. Most people couldn't fly, but more than a third of them could travel, and more than half could travel with their families. And there was no real way to track people.

@kragen And yet, as the Chinese noted: Heaven is high and the emperor far away.

The inefficiencies of medieval systems (even highly-evolved bureaucratic ones as in China) left a great deal of latitude.

The lack of *material* wealth, or useful knowledge, imposed strong constraints. But the idea of being watched by unknown eyes, from anywhere on the planet, didn't exist. Your watchers were neighbours, and had profound limitations.

Still a threat, but knowable.

@kick @enkiv2 @freakazoid

@kragen @dredmorbius @enkiv2 @freakazoid It was better in the 1960-80s for the most part, but sometimes I still think of:

[5000 well thought out lines of a single mail response on how Linux wipes the floor with Solaris performance-wise >quoted] Have you ever kissed a girl? - Bryan

So the problem was at least prevalent by ‘96.

@kick @enkiv2 @dredmorbius @freakazoid not sure Dave Miller's privacy was being invaded there? much less in a technologically inescapable way

@kragen @enkiv2 @dredmorbius @freakazoid No, not Miller (I was referring to Bryan, because that post will never, ever be forgotten). I admittedly might have gotten lost (it's 6:00AM here and I haven't slept in two days, so I may have gotten threading messed up), but the connection in my head was -

Ah, yeah, I see what's up: I was thinking of a different thread with a similar set of people in it + @dredmorbius's line "I'm finding contemporary society to be very nearly intolerable. And probably ultimately quite dangerous." + comments RE: previous art of problem-space.

There's something that resembles danger in some manner when you can track everything a person's ever said with a name that can be paired with their home address pretty easily I think; lack of privacy mixed with full, unmutable history (for the bad parts, less so for the good parts) makes things very interesting nowadays.

@kick That danger / risk is an interesting one.

Some people focus on strictly one element -- the State, or Corporations, or Terrorists, or Narcocriminals, or the Criminally Insane, or Griefers, or Stalkers / Exes.

It's kind of all of the above.

In some cases I'm not fully sure that it's simply having civic systems and rule of law which matter more.

But mostly it' the data, the ability to use and misuse it, or simply presuming data exist, that enables evil.

@enkiv2 @kragen @freakazoid

@kick I've been kicking around the idea of manifestation vs. latency. Sociologist Robert K. Merton used the terms in context of _functions_, but they're fundamental to information.

Some is manifest: immediately apparent, graspable, understood in totality.

Some is latent: the opposite in every way.

Paired with benefits and risks, it means we value manifest benefit and discount *both* latent risk and benefit. It's a built-in short-termism.

Not by human nature.

@enkiv2 @kragen @freakazoid

@kick That's simply how information works.

So with pervasive recorded fungible, manipulable, queryable, records, on tremendous numbers of people, you don't know what future motives, contexts, norms, values, power structures, etc., will be.

The problem with Google's policy of getting right up to the creepy line, is that that creepy line moves.

So does the Surveillance Data Risk Line.

And we don't know what parts will move which way for what people and data.

@enkiv2 @kragen @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid Right. Today recreational marijuana is legal in California; 30 years ago it could end your career in many jobs, and even today it can get you executed in much of Asia. Who's to say what its legality or public perception will be in another 30 years? Similarly for abortion, divorce, adultery, job-hopping, capitalism, or opposition to global pervasive surveillance.

@kragen I think my periodic observations that numerous states within the US *still* don't have a legal minimum age for marriage annoys a fair portion of the Fediverse.

Moral values are profoundly fungible, over time. Sometimes in as little as a few years, but staggeringly so over decades and centuries.

I've reasons for believing we may be entering a period of higher flux in values serving as social identifiers, adopted as moral codes.

@kick @enkiv2 @freakazoid

@kragen Quite possibly in different direction in different locales, and not necessarily in a consistent direction over time even within given jurisdictions.

Drug (or sex, marriage, possibly business or technical) laws may swing wildly.

Where there's an overload of information, clearly evident, durable signifiers take on signalling significance, especially for group identity and loyalty.

@kick @enkiv2 @freakazoid

@dredmorbius @kragen @kick @enkiv2 The search for the optimal culture is a simulated annealing process and we're entering a "heating up" phase.

@kick Google's watching that line move in all kinds of ways.

Ways that impose $5 billion fines in Europe.

Ways that may be turning public sentiment against it in the US. Quite possibly harder and fiercer than happened to Microsoft in the 2000s.

Google's been openly mocked on Hacker News for at least five years, if not longer. Dang just commented that their A/B practices have him switching his habits. I'd changed mine in 2013, and don't regret it at all.

@enkiv2 @kragen @freakazoid

Sign in to participate in the conversation

All friendly creatures are welcome. Be excellent to each other, live humanism, no nazis, no hate speech. Not only for nerds, but the domain is somewhat cool. ;) No bots in general! (only with prior permission) - Registration temporarily closed/approval required, contact me if you want an invite!