Explanation of Web activity statistics report

The web activity statistics reports show activity on SIRIS data bases from the Horizon internet interface. Prior to December 2002, the web interface was WebPAC. During December 2002, we switched to the later epixtech product, Ipac. Subsequently, another product from the same company was used (HIP). However, the information logged by these products is largely similar, and the statistics may be compared over this range.

Report description

For each month's report, the date the report was generated is given in the second line, and the inclusive dates represented are given in the third line. These reports are broken down to show activity generated from within the Smithsonian Institution and activity from outside. Not all 'outside' activity is actually non-SI. Any activity generated by a web address that begins '160.111' or '172.17' is considered within-SI activity, all else is considered non-SI activity. However, some SI units have domain addresses outside 160.111 or 172.17. While this criterion correctly identifies most SI and non-SI activity, some 'non-SI' activity may actually represent activity from SI units outside these domains. Percentages of non-SI usage are calculated in each case as

    Percentage non-SI = ((non-SI)/(non-SI)+(SI))*100

Sessions initiated in 2-hr Increments: This section counts individual sessions initiated within the two-hour segment. Amount of time a particular session exists is not counted.

Total Numbers, by Catalog, of Individual Sessions: This section records the total number of individual sessions initiated over the measurement period, according to the data base queried. A single session can include many searches.

Search Numbers, by Catalog, by Search Type: This section records the number of types of searches done in the data bases over the measurement period. Note that the counts prior to January 2004 do not include redirected searches. The report was modified in January 2004 to include redirected searches. The percent of a given search type accounted for by redirected searches is given reports from January 2004 on. The percentage of redirected searches is displayed in parentheses. From May to July 2004 report, percentages of broadcast searches are displayed. These are indicated by percentages in square brackets. (After July 2004, broadcast searching was disabled). Both redirected and broadcast searches are included in the total of a particular search, and in grand totals for each data base. Note also that not all data bases use the same indexes. Total searches over the measurement period are given for each data base.

Here is a scenario to illustrate how these counts are generated. Imagine that a user connects to the Archives catalog and begins variously searching and browsing for ten minutes, clicking the 'Go' button on the search page 50 separate times. This user has been counted for one session in the two-hour increment when the session started, and then the 50 searches each have been counted under their appropriate search types. This user represents one individual session and 50 of the searches in SIArchives. When the user changes to the Art Inventories catalog and does yet more searching, this is counted as a new session under Art Inventories and the searches are totalled under Art Inventory search types.

Fullbib direct searches. In September 2006, we began counting searches that were scoped to a single bibliographic record in a data base. This 'full bib direct' searching is enabled by the 'openURL' architecture that allows a single data base record to be accessed directly. For example, users may link to specific records from their web pages or favorites lists. Using this capability, we have provided site maps of our data bases to Google, which accounts for the huge increase in full bib direct searches since April 2007. Finally, the CSI application links to the HIP data bases via a full bib direct search. Since July 2007, I have broken down the full bib direct counts with an estimation of the number of hits from outside robots such as Google, as well as the count of redirections from CSI (from inside the SI domain and outside). The count for outside robots is an estimate because we can only guess which IP addresses represent robots based on their activity (any site sending over 2,000 full bib requests in a day is considered a robot).

