Monday, March 4, 2019
Data Warehouses & Data Mining
selective discipline storeS & DATA mining Term-Paper In Management Support System pic Submitted BySubmitted To Chitransh NamanAnita Maam A22-JK903Lecturer 10900100MSS ABSTRACT - Collection of integrated, subject-oriented, prison term-variant and non-volatile development in punt of circumspections decision making process. Described as the single point of legality, the corporate memory, the sole historical register of virtu entirely toldy all minutes that occur in the life of an organization.A fundamental concept of a info storage storage storage wargonhouse is the distinction between info and information. info is self-possessed of observable and recordable facts that atomic number 18 much found in operating(a) or transactional systems. At Rutgers, these systems include the registrars entropy on students (widely k at present as the SRDB), human re blood and payroll selective informationbases, course programing info, and information on financial aid. In a information w behouse environment, information only comes to spend a penny apprise to end- exploiters when it is organized and presented as information.Information is an integrated ingathering of facts and is use as the basis for decision-making. For example, an academic unit occupys to have diachronic information about(predicate) its extent of instructional output of its distinguishable faculty members to gauge if it is becoming more(prenominal) or less dependent on part- fourth dimension faculty. pic INTRODUCTION - The info warehouse is always a physically separate store of entropy transformed from the application selective information found in the operational environment. Data entering the info warehouse comes from operational environment in almost both case.Data store domiciliates architectures and tools for business organization administrators to syste-matically organize ,understand ,and use their info to make stragetic decisions. A vainglorious number of o rganizations have found that selective information warehouse systems are worthful tools in to twenty-four hourss competive,fast-evolving world. In the last several(prenominal) geezerhood ,mevery firms have spent millions of dollars in building enterprise wide selective information warehouses. Many people feel that with competition mounting in every industry , entropy warehouse is the latest must have marketing branch a way to keep customers by learning more about their needs.Data warehouses have been outlined in many ways,making it difficult to formulate a rigorous definition. Loosely speaking , a entropy warehouse refers to a entropybase that is of importtened separately from an organization,s operational databases. Data warehouse systems al pocket-sized for integration of a variety of applications systems . They ca-ca over information processing by providing a fast platform of consolidated historical data for analysis. Data wareho use is a more formalised methodology of t hese techniques.For example, many sales analysis systems and executive information systems (EIS) get their data from summary files rather then operational transaction files. The method of victimisation summary files instead of operational data is in essence what data warehousing is allabout. Some data warehousing tools neglect the immenseness of modelling and building a datawarehouse and focus on the storage and retrieval of data only. These tools might havestrong analytical facilities, just now deprivation the qualities you need to build and maintain a corporatewide data warehouse.These tools belong on the PC rather than the host. Your corporate wide (or division wide) data warehouse needs to be scalable, secure, openand, above all, suitable for publication. NEED OF DATA WAREHOUSE - Missing data last support requires historical data which operational DBs do not typically maintain Data Consolidation DS requires consolidation (aggregation, summarization) of data from heterogene ous commencements operational DBs, external sources Data bore Different sources typically use inconsistent data representations, codes and formats which have to be reconciled. pic DATA WAREHOUSE ARCHITECTURE - pic Components - OPERATIONAL DATA WAREHOUSE ( for the DW is supplied from central processor operational data held in prototypical generation hierarchical and vane databases, departmental data held in proprietary file systems, private data held on workstaions and private serves and external systems such(prenominal) as the Internet, commercially operational DB, or DB assoicated with and organizations suppliers or customers OPERATIONAL DATABASE( is a repository of current and integrated operational data used for analysis.It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a stage area for data to be moved into the warehouse LOAD MANAGER ( also called the frontend component, it performance all the trading op erations associated with the origination and lading of data into the warehouse. These operations include simple transformations of the data to establish the data for entry into the warehouse WAREHOUSE MANAGER ( performs all the operations associated with the management of the data in the warehouse. The operations performed by this component include analysis of data to ensure consistency, transformation and merging of source data, presentation of indexes and messs, generation of denormalizations and aggregations, and archiving and backing-up data. QUERY MANAGER( also called backend component, it performs all the operations associated with the management of substance abuser queries.The operations performed by this component include directing queries to the allow for tables and scheduling the execution of queries. . END-USER ACCESS TOOLS( raise be categorized into five main groups data reporting and query tools, application development tools, executive information system (EIS) t ools, online analytical processing (OLAP) tools, and data archeological site tools. DATA mart - It is a subset of a data warehouse that supports the requirements of particular department or business function.The characteristics that differentiate data marts and data warehouses include a data mart focuses on only the requirements of users associated with one department or business function as data marts contain less data compared with data warehouses, data marts are more easily understood and navigated data marts do not normally contain detailed operational data, unlike data warehouse. pic META DATA- Metadata is about controlling the quality of data entering the data stream. Batch processes can be run to address data debasement or changes to data policy. Metadata policies are enhance by using metadata repositories. greatness OF META DATA - The integration of meta-data, that is data about data Meta-data is used for a variety of purposes and the management of it is a critical is sue in achieving a fully integrated data warehouse The major purpose of meta-data is to award the pathway back to where the data began, so that the warehouse administrators know the archives of any item in the warehouse The meta-data associated with data transformation and loading must describe the source data and any changes that were made to the data The meta-data associated with data management describes the data as it is stored in the warehouse The meta-data is ask by the query manager to generate appropriate queries, also is associated with the user of queries The major integration issue is how to synchronize the various types of meta-data use passim the data warehouse. The challenge is to synchronize meta-data between different products from different vendors using different meta-data stores Two major standards for meta-data and modeling in the areas of data warehousing and component- found development-MDC(Meta Data Coalition) and OMG(Object Management Group) a data warehouse requires tools to support the judgeship and management of such complex enviroment. for the various types of meta-data and the day-to-day operations of the data warehouse, the administration and management tools must be capable of supporting those tasks supervise data loading from multiple sources data quality and integrity checks managing and update meta-data monitoring database performance to ensure efficient query response quantify and resource utilization. pic pic DATA WAREHOUSING PROCESSES - The process of extracting data from source systems and plant it into the data warehouse is commonly called ELT, which stands for inception, transformation, and loading. In addition, after the data warehouse (detailed data) is created, several data warehousing processes that are relevant to implementing and using the data warehouse are needed, which include data summarization, data warehouse maintenance. extraction in Data Warehouse - fall is the operation of extracting data from a source system for future use in a data warehouseenvironment. This is the first step of the ETL process. After extraction, data can be transformed and slicked into the data warehouse. Extraction process does not need involve complex algebraic database operations, such as join and aggregate functions. Its focus is determine which data needs to be extracted, and bring the data into the data warehouse, unique(predicate)ally, to the staging area. The data has to be extracted normally not only once, but several times in a periodic manner to supply all changed data to the data warehouse and keep it up-to-date.Thus, data extraction is not only used in the process of building the data warehouse, but also in the process of maintaining the data warehouse. Every often, entire documents or tables from the data sources are extracted to the data warehouse or staging area, and the data completely contain whole information from the data sources. There are two kinds of logic extraction methods in data warehousing. Full Extraction - The data is extracted completely from the data sources. As this extraction reflects all the data currently available on the data source, there is no need to keep track of changes to the data source since the last successful extraction. The source data will be provided as-is and no additional logic information is necessary on the source site. Incremental Extraction -At a specific point in time, only the data that has changed since a well-defined eccentric back in history will be extracted. The event may be the last time of extraction or a more complex business event like the last sale day of a fiscal period. This information can be either provided by the source data itself, or a change table where an appropriate additional mechanism keeps track of the changes besides the originating transaction. in most case, using the latter method agency adding extraction logic to the data source. For the independence of data sources, many data ware houses do not use any change-capture technique as part of the extraction process, instead, use full extraction logic.After full extracting, the entire extracted data from the data sources can be compared with the preceding(prenominal) extracted data to nominate the changed data. Unfortunately, for many source systems, telling the recently modified data may be difficult or intrusive to the operation of the data source. Change Data Capture is typically the most challenging technological issue in data extraction. pic DATA MINING - Data tap is the process of discovering new(a) correlations, patterns, and trends by digging into (mining) large amounts of data stored in warehouses, using artificial intelligence, statistical and mathematical techniques. Data mining can also be defined as the process of extracting association unfathomable from large volumes of raw data i. e. he nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The al ternative name of Data Mining is cognition discovery (mining) in databases (KDD), intimacy extraction, data/pattern analysis, etc. The importance of accumulation data thai reflect your business or scientific activities to achieve competitive advantage is widely recognized now. Powerful systems for collecting data and managing it in large databases are in place in all large and mid-range companies. pic How Data Mining Works - While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two.Data mining packet analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought Classes Stored data is used to locate data in determine groups. For example, a restaurant chain could mine customer purchase data to deter mine when customers visit and what they typically order. This information could be used to gain traffic by having daily specials. Clusters Data items are grouped fit to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.Associations Data can be mined to identify associations. The beer-diaper example is an example of associative mining. Sequential patterns Data is mined to bide behavior patterns and trends. For example, an otitdoor equipment retailer could predict the likelihood of a backpack beingness purchased based on a consumers purchase of sleeping bags and hiking shoes. DATA MINING MODELS - 1. Predictive Model Prediction a. determining how certain attributes will stomach in the future Regression b. mapping of data item to legitimate valued prediction variable Classification c. categorization of data based on combinations of attributes Time Series analysis xamining values of attributes with respe ct to time 2. Descriptive Model Clustering most closely data clubbed together into clusters Data Summarization extracting representative information about database Association Rules associativity defined between data items to form relationship Sequence Discovery it is used to determine sequential patterns in data based on time sequence of action pic APPLICATIONS OF DATA WAREHOUSE - Exploiting Data for Business Decisions The value of a decision support system depends on its ability to provide the decision-maker with relevant information that can be acted upon at an appropriate time. This means that the information needs to be Applicable.The information must be current, given(p) to the field of interest and at the correct level of detail to cotton up any potential issues or benefits. Conclusive. The information must be decent for the decision-maker to derive actions that will bring benefit to the organisation. Timely. The information must be available in a time frame that allows d ecisions to be effective. Decision Support through Data Warehousing One approach to creating a decision support system is to implement a data warehouse, which integrates active sources of data with accessible data analysis techniques. An organisations data sources are typically departmental or functional databases that have evolved to aid specific and localised requirements.Integrating such highly focussed resources for decision support at the enterprise level requires the addition of early(a) functional capabilities turbulent query handling. Data sources are normally optimised for data storage and processing, not for their speed of response to queries. Increased data depth. Many business conclusions are based on the comparison of current data with historical data. Data sources are normally focussed on the present and so deficiency this depth. Business language support. The decision-maker will typically have a context in business or management, not in database programming. It is important that such a person can request information using spoken language and not syntax. picThe proliferation of data warehouses is highlighted by the customer loyalty schemes that are now run by many leading retailers and airlines. These schemes illustrate the potential of the data warehouse for micromarketing and profitability calculations, but there are other applications of lucifer value, such as Stock control Product category management Basket analysis Fraud analysis All of these applications offer a direct payback to the customer by facilitating the identification of areas that require attention. This payback, in particular in the fields of fraud analysis and stock control, can be of high and immediate value. APPLICATIONS OF DATA MINING- Banking loan/credit pester approval predict good customers based on old customers customer relationship management identify those who are likely to leave for a competitor. Targeted marketing identify likely responders to promoti ons Fraud detection telecommunications, financial transactions from an online stream of event identify fraudulent events Manufacturing and production automatically position knobs when process parameter changes Medicine disease outcome, effectiveness of treatments analyze forbearing disease history find relationship between diseases Molecular/pharmaceutic identify new drugs Scientific data analysis identify new galaxies by searching for sub clusters Web site/store physical body and promotion find affinity of visitor to pages and modify layout. pic CONCLUSION - What we are seeing is two-fold depending on the retailers strategy 1) Most retailers build data warehouses to conduct specific markets and customer segments. Theyre trying to know their customers. It all starts with CDI customer data integration. By starting with CDI, the retailers can build the DW around the customer. 2) On the other side there are retailers who have no idea who their customers are, or feel th ey dont need to. the world is their customer and low prices will keep the worldloyal. They use their data warehouse to control schedule and negotiate with suppliers.The future will bring real time data warehouse updateswith the ability to give the retailer an minute to minute view of what is going on in a retail locationand take action either manually or through a teach triggered by the data warehouse data The future belongs to those who 1) Possess knowledge of the Customer and 2) Effectively use that knowledge REFERENCES - 1. Mining interesting knowledge from weblogs a survey Federico Michele Facca, Pier Luca lanzi. http//software. techrepublic. com. com/abstract. aspx http//en. wikipedia. org/ http//msdn. microsoft. com/ Google Books Google Images Google Search www. seminarprojects. com Self =========================================================
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment