Tuesday, January 19, 2010

Data

The word data is the Latin plural of datum, neuter past participle of dare, "to give", hence "something given". The past participle of "to give" has been used for millennia, in the sense of a statement accepted at fn, Data). In discussions of problems in geometry, mathematics, engineering, and so on, the terms givens and data are used interchangeably.Also, data is a representation of a fact, figure, and idea. Such usage is the origin of data as a concept in computer science: data are numbers, words, images, etc., accepted as they stand.

Usage in English

In English, the word datum is still used in the general sense of "something given". In cartography, geography, nuclear magnetic resonance and technical drawing it is often used to refer to a single specific reference datum from which distances to all other data are measured. Any measurement or result is a datum, but data point is more common albeit tautological. Both datums (see usage in datum article) and the originally Latin plural data are used as the plural of datum in English, but data is more commonly treated as a mass noun and used with a verb in the singular form, especially in day-to-day usage. For example, This is all the data from the experiment. This usage is inconsistent with the rules of Latin grammar and traditional English (These are all the data from the experiment).

Some British and international academic, scientific and professional style guides require that authors treat data as a plural noun. Other international organizations, such as the IEEE Computer Society, allow its usage as either a mass noun or plural based on author preference. The Air Force Flight Test Center on the other hand, specifically states that the word data is always plural, never singular.

Data is now often treated as a singular mass noun in informal usage, but usage in scientific publications shows a divide between the United States and United Kingdom. In the United States the word data is sometimes used in the singular, though scientists and science writers more often maintain the traditional plural usage. Some major newspapers such as the New York Times use it alternately in the singular or plural. In the New York Times the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared on the same day. In scientific writing data is often treated as a plural, as in These data do not support the conclusions, but many people now think of data as a singular mass entity like information and use the singular in general usage. British usage now widely accepts treating data as singular in standard English, including everyday newspaper usage] at least in non-scientific use. UK scientific publishing still prefers treating it as a plural. Some UK university style guides recommend using data for both singular and plural use and some recommend treating it only as a singular in connection with computers.

Raw data refers to a collection of numbers, characters, images or other outputs from devices to convert physical quantities into symbols, that are unprocessed. Such data is typically further processed by a human or input into a computer, stored and processed there, or transmitted (output) to another human or computer (possibly through a data cable). Raw data is a relative term; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.

Mechanical computing devices are classified according to the means by which they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A digital computer represents a datum as a sequence of symbols drawn from a fixed alphabet. The most common digital computers use a binary alphabet, that is, an alphabet of two characters, typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet.

Some special forms of data are distinguished. A computer program is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.

Experimental data refers to data generated within the context of a scientific investigation by observation and recording.

Meaning of data, information and knowledge

The terms information and knowledge are frequently used for overlapping concepts. The main difference is in the level of abstraction being considered. Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three.[citation needed] Data on its own carries no meaning. In order for data to become information, it must be interpreted and take on a meaning. For example, the height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge".

Information as a concept bears a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.

Beynon-Davies uses the concept of a sign to distinguish between data and information; data are symbols while information occurs when symbols are used to refer to something.

It is people and computers who collect data and impose patterns on it. These patterns are seen as information which can used to enhance knowledge. These patterns can be interpreted as truth, and are authorized as aesthetic and ethical criteria. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.

Protocol (computing)

Protocol (computing)

In computing, a protocol is a set of rules which is used by computers to communicate with each other across a network. A protocol is a convention or standard that controls or enables the connection, communication, and data transfer between computing endpoints. In its simplest form, a protocol can be defined as the rules governing the syntax, semantics, and synchronization of communication. Protocols may be implemented by hardware, software, or a combination of the two. At the lowest level, a protocol defines the behavior of a hardware connection.

Typical properties

While protocols can vary greatly in purpose and sophistication, most specify one or more of the following properties:[citation needed]

  • Detection of the underlying physical connection (wired or wireless), or the existence of the other endpoint or node
  • Handshaking
  • Negotiation of various connection characteristics
  • How to start and end a message
  • Procedures on formatting a message
  • What to do with corrupted or improperly formatted messages (error correction)
  • How to detect unexpected loss of the connection, and what to do next
  • Termination of the session and/or connection.''it

Importance

The protocols in human communication are separate rules about appearance, speaking, listening and understanding. All these rules, also called protocols of conversation, represent different layers of communication. They work together to help people successfully communicate. The need for protocols also applies to network devices. Computers have no way of learning protocols, so network engineers have written rules for communication that must be strictly followed for successful host-to-host communication. These rules apply to different layers of sophistication such as which physical connections to use, how hosts listen, how to interrupt, how to say good-bye, and in short how to communicate, what language to use and many others. These rules, or protocols, that work together to ensure successful communication are grouped into what is known as a protocol suite.

The widespread use and expansion of communications protocols is both a prerequisite for the Internet, and a major contributor to its power and success. The pair of Internet Protocol (or IP) and Transmission Control Protocol (or TCP) are the most important of these, and the term TCP/IP refers to a collection (a "protocol suite") of its most used protocols. Most of the Internet's communication protocols are described in the RFC documents of the Internet Engineering Task Force (or IETF).

Object-oriented programming has extended the use of the term to include the programming protocols available for connections and communication between objects.

Generally, only the simplest protocols are used alone. Most protocols, especially in the context of communications or networking, are layered together into protocol stacks where the various tasks listed above are divided among different protocols in the stack.

Whereas the protocol stack denotes a specific combination of protocols that work together, a reference model is a software architecture that lists each layer and the services each should offer. The classic seven-layer reference model is the OSI model, which is used for conceptualizing protocol stacks and peer entities. This reference model also provides an opportunity to teach more general software engineering concepts like hiding, modularity, and delegation of tasks. This model has endured in spite of the demise of many of its protocols (and protocol stacks) originally sanctioned by the ISO.

Common protocols

  • IP (Internet Protocol)
  • UDP (User Datagram Protocol)
  • TCP (Transmission Control Protocol)
  • DHCP (Dynamic Host Configuration Protocol)
  • HTTP (Hypertext Transfer Protocol)
  • FTP (File Transfer Protocol)
  • Telnet (Telnet Remote Protocol)
  • SSH (Secure Shell Remote Protocol)
  • POP3 (Post Office Protocol 3)
  • SMTP (Simple Mail Transfer Protocol)
  • IMAP (Internet Message Access Protocol)
  • SOAP (Simple Object Access Protocol)
  • PPP (Point-to-Point Protocol)
  • RFB (Remote Framebuffer Protocol)

Protocol testing

In general, protocol testers work by capturing the information exchanged between a Device Under Test (DUT) and a reference device known to operate properly. In the example of a manufacturer producing a new keyboard for a personal computer, the Device Under Test would be the keyboard and the reference device, the PC. The information exchanged between the two devices is governed by rules set out in a technical specification called a "communication protocol". Both the nature of the communication and the actual data exchanged are defined by the specification. Since communication protocols are state-dependent (what should happen next depends on what previously happened), specifications are complex and the documents describing them can be hundreds of pages.

The captured information is decoded from raw digital form into a human-readable format that permits users of the protocol tester to easily review the exchanged information. Protocol testers vary in their abilities to display data in multiple views, automatically detect errors, determine the root causes of errors, generate timing diagrams, etc.

Some protocol testers can also generate traffic and thus act as the reference device. Such testers generate protocol-correct traffic for functional testing, and may also have the ability to deliberately introduce errors to test for the DUT's ability to deal with error conditions.

Protocol testing is an essential step towards commercialization of standards-based products. It helps to ensure that products from different manufacturers will operate together properly ("interoperate") and so satisfy customer expectations.