## ‣ An incremental space to visualize dynamic data sets

PINHO, Roberto Dantas de; OLIVEIRA, Maria Cristina Ferreira de; LOPES, Alneu de Andrade
In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. We have developed an inherently incremental technique (incBoard) that maintains a coherent disposition of elements from a dynamic multidimensional data set on a 2D grid as the set changes. Here, we introduce a novel layout that uses pairwise similarity from grid neighbors, as defined in incBoard, to reposition elements on the visual space, free from constraints imposed by the grid. The board continues to be updated and can be displayed alongside the new space. As similar items are placed together, while dissimilar neighbors are moved apart, it supports users in the identification of clusters and subsets of related elements. Densely populated areas identified in the incSpace can be efficiently explored with the corresponding incBoard visualization, which is not susceptible to occlusion. The solution remains inherently incremental and maintains a coherent disposition of elements, even for fully renewed sets. The algorithm considers relative positions for the initial placement of elements, and raw dissimilarity to fine tune the visualization. It has low computational cost, with complexity depending only on the size of the currently viewed subset...

## ‣ Conjunto de dados mínimos de enfermagem : construindo um modelo em saúde ocupacional; Nursing minimum data set : setting up a model occupational health; Conjunto de datos mínimos de enfermería: construyendo un modelo en salud ocupacional

Silveira, Denise Tolfo; Marin, Heimar de Fatima
## ‣ Uso da imputação múltipla de dados faltantes : uma simulação utilizando dados epidemiológicos; Multiple imputations for missing data : a simulation with epidemiological data

Nunes, Luciana Neves; Kluck, Mariza Machado; Fachel, Jandyra Maria Guimarães
## ‣ A semi-automatic method for indirect orientation of aerial images using ground control lines extracted from airborne laser scanner data

Dos Santos, Daniel Rodrigues; Tommaselli, Antonio Maria Garcia; Dalmolin, Quintino; Mitishita, Edson Aparecido
This paper presents a method for indirect orientation of aerial images using ground control lines extracted from airborne Laser system (ALS) data. This data integration strategy has shown good potential in the automation of photogrammetric tasks, including the indirect orientation of images. The most important characteristic of the proposed approach is that the exterior orientation parameters (EOP) of a single or multiple images can be automatically computed with a space resection procedure from data derived from different sensors. The suggested method works as follows. Firstly, the straight lines are automatically extracted in the digital aerial image (s) and in the intensity image derived from an ALS data-set (S). Then, correspondence between s and S is automatically determined. A line-based coplanarity model that establishes the relationship between straight lines in the object and in the image space is used to estimate the EOP with the iterated extended Kalman filtering (IEKF). Implementation and testing of the method have employed data from different sensors. Experiments were conducted to assess the proposed method and the results obtained showed that the estimation of the EOP is function of ALS positional accuracy.

## ‣ Health information system reform in South Africa: developing an essential data set.

Shaw, Vincent
Fonte: World Health Organization Publicador: World Health Organization
Health services are increasingly under pressure to develop information systems that are responsive to changing health needs and appropriate to service objectives. Developing an essential data set provides managers with a clearly defined set of indicators for monitoring and evaluating services. This article describes a process that resulted in the creation of an essential data set at district level. This had a significant impact on neighbouring districts and resulted in the development of a regional essential data set, which in turn helped to influence the creation of a provincial and then national essential data set. Four key lessons may be drawn from the process. The development of an essential data set both requires and can contribute to a process that allows the reporting requirements to be adjusted over time in response to changing circumstances. In addition, it contributes to (and requires) the integration of programme reporting requirements into a coherent information system. While the case study describes a bottom-up approach, a top-down consultative process is advocated because it establishes a framework within which information needs can be reviewed. Lastly, the use of surveys can aid efforts to keep the essential elements to a minimum. In conclusion...

## ‣ A large-scale crop protection bioassay data set

Gaulton, Anna; Kale, Namrata; van Westen, Gerard J. P.; Bellis, Louisa J.; Bento, A. Patrícia; Davies, Mark; Hersey, Anne; Papadatos, George; Forster, Mark; Wege, Philip; Overington, John P.
Fonte: Nature Publishing Group Publicador: Nature Publishing Group
ChEMBL is a large-scale drug discovery database containing bioactivity information primarily extracted from scientific literature. Due to the medicinal chemistry focus of the journals from which data are extracted, the data are currently of most direct value in the field of human health research. However, many of the scientific use-cases for the current data set are equally applicable in other fields, such as crop protection research: for example, identification of chemical scaffolds active against a particular target or endpoint, the de-convolution of the potential targets of a phenotypic assay, or the potential targets/pathways for safety liabilities. In order to broaden the applicability of the ChEMBL database and allow more widespread use in crop protection research, an extensive data set of bioactivity data of insecticidal, fungicidal and herbicidal compounds and assays was collated and added to the database.

## ‣ A spatially comprehensive, hydrometeorological data set for Mexico, the U.S., and Southern Canada 1950–2013

Livneh, Ben; Bohn, Theodore J.; Pierce, David W.; Munoz-Arriola, Francisco; Nijssen, Bart; Vose, Russell; Cayan, Daniel R.; Brekke, Levi
Fonte: Nature Publishing Group Publicador: Nature Publishing Group
Tipo: Artigo de Revista Científica
A data set of observed daily precipitation, maximum and minimum temperature, gridded to a 1/16° (~6 km) resolution, is described that spans the entire country of Mexico, the conterminous U.S. (CONUS), and regions of Canada south of 53° N for the period 1950–2013. The dataset improves previous products in spatial extent, orographic precipitation adjustment over Mexico and parts of Canada, and reduction of transboundary discontinuities. The impacts of adjusting gridded precipitation for orographic effects are quantified by scaling precipitation to an elevation-aware 1981–2010 precipitation climatology in Mexico and Canada. Differences are evaluated in terms of total precipitation as well as by hydrologic quantities simulated with a land surface model. Overall, orographic correction impacts total precipitation by up to 50% in mountainous regions outside CONUS. Hydrologic fluxes show sensitivities of similar magnitude, with discharge more sensitive than evapotranspiration and soil moisture. Because of the consistent gridding methodology, the current product reduces transboundary discontinuities as compared with a commonly used reanalysis product, making it suitable for estimating large-scale hydrometeorologic phenomena.

## ‣ Trade Costs and Development : A New Data Set

Arvis, Jean-François; Shepherd, Ben; Duval, Yann; Utoktham, Chorthip
Fonte: World Bank, Washington, DC Publicador: World Bank, Washington, DC
The World Bank and the United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP) jointly prepared a new global data set of bilateral trade costs based on trade and production data. Accessible on the World Bank Open Data Web site, it opens new analytical possibilities for policy makers and researchers working on trade integration. The data stress the importance of supply chains and connectivity constraints in explaining the higher costs and lower levels of trade integration observed in developing countries. To measure trade costs in the developing world over the 1995-2010 period, UNESCAP and the World Bank embarked on a joint data collection exercise. In addition to data on export and import flows, calculation of trade costs using the inverse gravity methodology also requires information on domestic production in each country. Usage can then be calculated as domestic production less total exports. The result of the data collection exercise is a database covering up to 178 countries...

## ‣ Economic Growth, Inequality, and Poverty : Findings from a New Data Set

Fonte: World Bank, Washington, DC Publicador: World Bank, Washington, DC
## ‣ Conclusions from a NAIVE Bayes Operator Predicting the Medicare 2011 Transaction Data Set

Williams, Nick
Tipo: Artigo de Revista Científica
Introduction: The United States Federal Government operates one of the worlds largest medical insurance programs, Medicare, to ensure payment for clinical services for the elderly, illegal aliens and those without the ability to pay for their care directly. This paper evaluates the Medicare 2011 Transaction Data Set which details the transfer of funds from Medicare to private and public clinical care facilities for specific clinical services for the operational year 2011. Methods: Data mining was conducted to establish the relationships between reported and computed transaction values in the data set to better understand the drivers of Medicare transactions at a programmatic level. Results: The models averaged 88 for average model accuracy and 38 for average Kappa during training. Some reported classes are highly independent from the available data as their predictability remains stable regardless of redaction of supporting and contradictory evidence. DRG or procedure type appears to be unpredictable from the available financial transaction values. Conclusions: Overlay hypotheses such as charges being driven by the volume served or DRG being related to charges or payments is readily false in this analysis despite 28 million Americans being billed through Medicare in 2011 and the program distributing over 70 billion in this transaction set alone. It may be impossible to predict the dependencies and data structures the payer of last resort without data from payers of first and second resort. Political concerns about Medicare would be better served focusing on these first and second order payer systems as what Medicare costs is not dependent on Medicare itself.; Comment: 8 Pages...

## ‣ Identifying the largest complete data set from ALFRED

Uduman, Mohamed
Fonte: Rochester Instituto de Tecnologia Publicador: Rochester Instituto de Tecnologia
ALFRED is a central and curated repository for allele frequency data for anthropologically defined human populations. To study and estimate the relationships and similarities between populations, researchers require a large and complete data set. However, the data set within ALFRED is not complete. Specifically, not all the populations in the database have been typed for all the polymorphisms. Mining ALFRED for the largest complete data set is equivalent to the 'Maximal Biclique' problem in graph theory. This is proven to be NP-Complete and no single algorithm can find the perfect solution in polynomial time. This project describes a heuristic (Largest Maximal Biclique Heuristic) which finds the largest complete data set from ALFRED, in real time. The program is compared to various other methods, including Wen- Chieh Chang's implementation of the 'maximal biclique' algorithm proposed by Alexe et.al. The algorithm efficiently mines ALFRED to extract the largest complete data set, and the results are made available for researchers in uniform data exchange format, through a Web site. Since ALFRED is updated frequently, the LMBH program is set up to mine ALFRED on a regular basis and provide researchers with the most up-to-date, largest complete data set from ALFRED.

## ‣ Cokriging for optimal mineral resource estimates in mining operations

Minnitt,R.C.A.; Deutsche,C.V.
Fonte: Journal of the Southern African Institute of Mining and Metallurgy Publicador: Journal of the Southern African Institute of Mining and Metallurgy
Cokriging uses a sparsely sampled, but accurate and precise primary data-set, together with a more abundant secondary data-set, for example grades in a polymetallic orebody, containing both error and bias, to provide improved results compared to estimation with the primary data alone, as well as filtering the error and mitigating the effects of conditional bias. The method described here may also be applied in polymetallic orebodies and in other cases where the primary and secondary data could be collocated, and one of the data-sets need not be biased, unreliable, etc. An artificially created reference data-set of 512 lognormally distributed precious metal grades sampled at 25x25 m intervals constitutes the primary data-set. A secondary data-set on a 10x10 m grid comprising 3200 samples drawn from the reference data-set includes 30 per cent error and 1.5 multiplicative bias on each measurement. The primary and secondary non-collocated data-sets are statistically described and compared to the reference data-set. Variograms based on the primary data-set are modelled and used in the kriging of 10x10 m blocks using the 25x25 m and 50x50 m data grids for comparison against the results of the cokriged estimation. A linear model of coregionalization (LMC) is established using the primary and secondary data-sets and cokriging using both data-sets is shown to be a significant improvement over kriging with the primary data-set alone. The effects of the error and bias are filtered and removed during the cokriging estimation procedure. Thus cokriging using the more abundant secondary data...

## ‣ Nutritional status of breastfed infants in rural Zambia: comparison of the National Center for Health Statistics growth reference versus the WHO 12-month breastfed pooled data set

Hautvast,J.L.A.; Pandor,A.; Burema,J.; Tolboom,J.J.M.; Chishimba,N.; Monnens,L.A.H.; Staveren,W.A. van
