CALI Home  |  About CALI  |  Site Index  |  Sign-In
Search CALI:
 
Technology: Conference for Law School Computing
Learning the Law  |  Teaching the Law   |  Technology in Law Schools  
    
1996 Conference: Year of the Electronic Author
  

 

The Web: How Big? How Far?

Sieglinde Schreiner-Linford
European University Institute, Florence, Italy


This outline gives a brief overview of the topics I will try to cover in this session. The list of surveys is by no means complete. To get a fuller and up-to-date taste of the variety of materials available re-perform a search at altavista or check yahoo's collections on web statistics and surveys. There is a lot of very assorted material, and more surveys to fill in than reports of results, though! Some theoretical back-ground material, and a list of players in the field can be found at the WWW consortium.

What is the Internet? What is the WWW? How big are they? Or rather: how can they be measured?

Definitional and practical questions make the operationalization of these concepts for size estimations a bit more difficult than is probable at first sight. Wanting to know about internet and www users one first has to know who 'qualifies'. How can they be counted, how can we learn about their characteristics, their expectations and experiences on the net?

John Quarterman and Smoot Carl-Mitchell, What is the Internet, Anyway?, distinguish between the different physical networks and computers making up the internet, the underlying protocol (TCP/IP), typical internet services (Gopher, TELNET, FTP, mail, lists, news and WWW, constituting it's own 'subunit') and geographical areas connected to the internet. The DNS analysis in the Internet Domain Survey puts the number of hosts at roughly 9.500,000 and the number of domains at 240,000 (Worldwide - Jan 1996). With a different methodological approach MIDS (e-mail questionaires to 'many' of the domains representing organizations on the internet - September/October 1995) estimate a core internet with 16.9 million users of 7.7 million computers (capable to distribute information), a consumer internet with 26.4 million users of 10.1 million users, and 39 million users of e-mail (concentric cirles).

Robot based estimates of the size of the web are provided by WebWanderer, WebCrawler and Netcraft's survey of the web server software usage.

Web-survey respondents - self-selected web-based user surveys

Any web-site that is able to process forms can conduct it's own survey. Put out a form, ask users to fill in some information, gather and analyze the data. Response rates of some well known (or well publicized) surveys have reached the 10.000s. Surveys of this kind are relatively easy and cheap to conduct (this is not to say that some of them are not very well constructed, and of great scientific value). Their accessability for (theoretically) every 'publisher' and every 'user' and their interactive quality make good on the web's promise of global participation. They are not geographically limited, their response numbers allow for detailed analysis of smaller population groups (like webmasters). But whom do they represent? Can these results be extrapolated, to whom, to what time span?

An impressive time-series of internet surveys has been built up by the GVU Graphic, Visualization & Usability Center. They are presently conducting their fifth survey, with the fourth (October and November 1995) having yielded 23.000 responses from all over the world, with detailed questions about usage patterns and preferences, problems encountered, and specific questionaires for information providers/html authors and web service providers.

The 'two halves' of the internet are described in an interesting attempt to focus on psychographic attributes of web-users by InternetVALS (from Feb 1995 on).

Why representative surveys? About random samples, error margins, and statistical inference.

To generate results that can be thought of as 'representative' of a bigger population randomly selected samples of this population are tested (or, in this case, questioned). Only this way results measured in the sample can be assumed to 'represent' the population (with a known error factor).

"Typical web-users" - random-sample based, representative surveys

To be able to apply results of a survey to 'typical web-users' within a given, bigger polulation a random sample of that population has to be drawn. It has to be big enough to include enough members of the subgroup to allow meaningful analysis (Say: You want to question thousand web users. You guess that 10% of the population are webusers. Start off with 10.000 people + enough to make up for those who don't want to answer a questionaire, or are never at home or such ....)

The Nielsen/Commercenet survey (August 1995) of the US and Canadian population targeted internet users, on-line service users and non-users (with a webserver-based survey as a control group for the first cell). They estimate internet access at 17% (37million) of the total population (above 16years), actual internet use at 11%, and www use at 8% (18 million). This group is (mostly) upscale, professional, educated and male.

The American Internet User Survey, conducted November and December 1995, placed the number of American internet users (current users of at least one internet application besides e-mail), at 8.4 million adults, and estimates that 7.5 million people have access to the web. It confirms the growth rate of the internet with half of the respondents recalling that they started using the net during the last year, and come up with high numbers of home use, specially in the new users groups.

The O'Reilly study ('Defining the internet opportunity' - May to August 1995), again aiming at internet users and users of online services comes up with a somewhat lower figure - 5.8 million, plus 3.9 million on-line service users.

These three studies are commercially sponsored and full results can be bought (?, $5.000 and $25.000). They are (at least partially) inspired by commercial interest in the internet or www user market.

'Standard' population samples also have some methodological problems. Specifically they tend to be biased - some population groups in them are notoriously over- or under-represented (middle-aged or elderly females as compared to young, lower status males). But these effects are relatively minor and -well-known.

Site specific usage tracking: What can I know about my users or what can all those sites know about me?

As HTTP is based on negotiation between a client and a server, some information has to be exchanged to make a dialog at all possible. But just what is this information? This actually depends on the HTTP specification itself, on extended communications between the server and browsers, like netscape's cookie facility, and a growing range of commercially offered logging services. An overview over the multitude of developments in this area was presented at the W3C/MIT workshop on web demography.


Sieglinde Schreiner-Linford, May 19, 1996, e-mail: sieglind@datacomm.iue.it