觀世界|外媒預測2019年大資料趨勢
Big Data Trends in 2019
資料觀|黃玉葉(譯)
【編者按】2019年,新的大資料概念及技術將陸續浮出市面,老舊技術會逐步消失,或者出現舊術新用的情況。物聯網的持續壯大為大資料提供了鮮活資源,新技術不僅可以改變商業情報的收集方式,同樣也會改變商業運作的模式……
The accessibility of data has provided a new generation of technology and has shifted the business focus towards data-driven decision making. Big Data Analytics is now an established part of gathering Business Intelligence. Many businesses, particularly those online, consider Big Data a mainstream practice. These businesses are constantly researching new tools and models to improve their Big Data utilization.
資料的可訪問性衍生出新一代技術,並將商務重頭轉向資料驅動的決策制定。現下,大資料分析已成為收集商業情報的組成部分。許多企業,尤其是線上企業,都認為大資料是主流標配。這些企業馬不停蹄地研究新工具、新模型,以提高他們的大資料利用率。
In 2019, some tools and trends will be more popular than others. New Big Data concepts and technologies are constantly appearing on the market, and older technologies fade away, or get used in new ways. The continuous growth of the Internet of Things (IoT) has provided several new resources for Big Data. New technologies change not only how Business Intelligence is gathered, but how business is done.
2019年,一些工具和趨勢將脫穎而出,更受青睞。新的大資料概念及技術將陸續浮出市面,老舊技術會逐步消失,或者出現舊術新用的情況。物聯網的持續壯大為大資料提供了新的資源,新技術不僅改變了商業情報的收集方式,同樣也改變了商業運作模式。
Streaming the IoT for Machine Learning
將物聯網(IoT)串聯至機器學習
There are currently efforts to use the Internet of Things (IoT) to combine Streaming Analytics and Machine Learning. In 2019, we can anticipate significant research on this theme, and possibly a startup or two marketing their services or software.
當前,相關研究正努力讓物聯網和流分析、機器學習結合起來。2019年,我們可以對這一主題的重大研究翹首以盼,一兩家初創企業有望從事相關服務或軟體營銷。
Typically, Machine Learning uses “stored” data for training, in a “controlled” learning environment. In this new model, streaming data provides useful information from the Internet of Things to offer Machine Learning in real time, in a less controlled environment. A primary goal in this process is to provide more flexible, more appropriate responses to a variety of situations, with a special focus on communicating with humans.
通常,機器學習使用“儲存”資料在“受控”的學習環境中進行訓練。在新的模型中,物聯網中的流資料提供有用資訊,在一個不那麼“受控”的環境中實時支援機器學習。這個過程的主要目的是重點關注人機交流,讓機器面對各種情況可以作出更靈活更適當的反應。
Changing from a training model that uses a controlled environment and limited training data to a much more open training system requires more complex algorithms. Machine Learning then trains the system to predict outcomes with reasonable accuracy. As the primary model adjusts and evolves, models at the edge or in the Cloud will coordinate to match the changes, as needed. Ted Dunning, the Chief Application Architect at MapR said:
從一種使用受控環境加有限訓練資料的訓練模型到一個更加開放的訓練系統,需要更復雜的演算法。機器學習繼而訓練系統以合理的精度預測結果,隨著初級模型的調整和演進,邊緣計算或雲端計算中的模型將根據需要進行協調以匹配這些變化。MapR(知名大資料企業)的首席應用程式設計師Ted Dunning說:
“We will see more and more businesses treat computation in terms of data flows rather than data that is just processed and landed in a database. These data flows capture key business events and mirror business structure. A unified data fabric will be the foundation for building these large-scale flow-based systems.”
“我們將看到越來越多的企業以資料流的方式來處理計算,而不是僅僅處理資料並將其存入資料庫。這些資料流捕獲關鍵業務事件並反映業務結構,要構建這些大型的,基於流的系統,統一的資料結構是基礎。”
AI Platforms
人工智慧平臺
Big Data as a tool of discovery continues to evolve and mature, with some enterprises accessing significant rewards. A recent advancement is the use of AI (Artificial Intelligence) platforms. AI platforms will have significant impact over the next decade. Using AI platforms to process Big Data is a significant improvement in gathering Business Intelligence and improving efficiency. Anil Kaul, CEO and Co-Founder of Absolutdata stated:
大資料作為一種探索工具不斷髮展趨向成熟,一些企業因此獲得了可觀回報。最近的一項進展是人工智慧平臺的使用,人工智慧平臺將在未來十年產生重大影響。利用人工智慧平臺處理大資料,是收集商業情報,提高效率的一個重要改進。Anil Kaul,Absolutdata(知名大資料企業)的執行長和聯合創始人說:
“We started an email campaign, which I think everybody uses Analytics for, but because we used AI, we created a 51 percent increase in sales. While Analytics can figure out who you should target, AI recommends and generates what campaigns should be run.”
“我們發起了一個電子郵件活動,我認為每個人都要用到大資料分析,但是通過使用人工智慧,我們創造了51%的銷售增長額。當大資料分析找出你的既定目標物件時,人工智慧會建議並生成應該發起的活動。”
AI platforms will gain in popularity in 2019. AI platforms are frameworks designed to work more efficiently and effectively than more traditional frameworks. When an AI platform is designed well, it will provide faster, more efficient communications with Data Scientists and other staff. This can help reduce costs in several ways—such as by preventing the duplication of efforts, automating basic tasks, and eliminating simple, but time-consuming activities (copying, data processing, and constructing ideal customer profiles).
人工智慧平臺將在2019年普及。人工智慧平臺比傳統框架更有效,平臺的設計,能夠建立與資料科學家和其他工作人員之間快速、高效的交流方式,多方降低成本,比如防止重複工作、自動完成基礎任務、消除簡單又耗時的內容(複製、資料處理和構建理想客戶檔案)。
AIs will also provide Data Governance, making best practices available to Data Scientists and staff. The AI becomes a trusted advisor, and can also help to ensure work is spread more evenly, and completed more quickly. Artificial Intelligence platforms are arranged into five layers of logic:
人工智慧系列還將提供資料治理,為資料科學家和工作人員帶來最佳實踐。人工智慧會成為一個值得信賴的顧問,幫助確保均勻分工並快速完成工作。人工智慧平臺可以分為五層邏輯:
·The Data & Integration Layer gives access to the data. (Critical, as developers do not hand-code the rules. Instead, the rules are being “learned” by the AI.)
·The Experimentation Layer lets Data Scientists develop, test, and prove their hypothesis.
·The Operations & Deployment Layer supports model governance and deployment. This layer offers tools to manage the deployment of various “containerized” models and components.
·The Intelligence Layer organizes and delivers intelligent services and supports the AI.
·The Experience Layer is designed to interact with users through the use of technologies such as augmented reality, conversational UI, and gesture control.
①資料和整合層:提供對資料的訪問。(關鍵是,開發人員不會手工編寫規則;相反,人工智慧正在“學習”這些規則)
②實驗層:允許資料科學家開發、測試和驗證他們的假設。
③操作和部署層:支援模型管理和部署。這一層提供了管理各種“集裝箱化”模型和元件部署的工具。
④智慧層:組織和交付智慧服務,支援人工智慧。
⑤體驗層:旨在通過使用增強現實、對話介面和手勢控制等技術與使用者互動。
The Data Curator
資料管理員
In 2019, many organizations will find the position of Data Curator (DC) has become a new necessity. The Data Curator’s role will combine responsibility for managing the organizations metadata, as well as Data Protection, Data Governance, and Data Quality. Data Curators not only manage and maintain data, but may also be involved in determining best practices for working with that data. Data Curators are often responsible for presentations, with the data shown visually in the form of a dashboard, chart, or slideshows.
2019年,大眾會發現資料管理員(DC)的職位將成為一種新的需要。資料管理員的角色將把管理元資料的責任和資料保護、資料治理和資料質量結合起來。資料管理員不僅管理和維護資料,而且還可能參與確定與該資料的最佳工作實踐。資料管理員通常負責演示,資料顯示在儀表板、圖表或幻燈片的形式中。
The Data Curator regularly interacts with researchers, and also schedules educational workshops. The DC communicates with other curators to collaborate and coordinate, when appropriate. (Good communication skills are a plus). Tomer Shiran, co-founder and CEO of Dremio, said:
資料管理員定期與研究人員進行互動,並安排教育研討會。在適當的情況下,資料管理員與其他策展人交流合作和協調。Dremio(知名大資料企業)的聯合創始人兼執行長Tomer Shiran說:
“The Data Curator is responsible for understanding the types of analysis that need to be performed by different groups across the organization, what datasets are well suited for this work, and the steps involved in taking the data from its raw state to the shape and form needed for the job a data consumer will perform. The data curator uses systems such as self-service data platforms to accelerate the end-to-end process of providing data consumers access to essential datasets without making endless copies of data.”
“資料管理員負責理解跨組織中不同組執行的分析型別,什麼資料集適配什麼工作,以及資料消費者將資料從原始狀態轉換為執行形態時所涉及的步驟。資料管理員使用自助資料平臺等系統加速端到端的流程,為資料消費者提供對基礎資料集的訪問,而非無休止地複製資料。”
Politics and GDPR
政治與《通用資料保護條例》(GDPR)
The European Union’s General Data Protection Regulation (GDPR) went into effect on May 25, 2018. While GDPR is focused in Europe, some organizations, in an effort to simplify their business and promote good customer relations, have stated they will provide the same privacy protections for all their customers, regardless of where they live. This approach, however, is not the general position taken by businesses and organizations outside of Europe. Many corporations have chosen to revamp their consent procedures and data handling processes, and to hire new staff, all in an effort to maximize the private data they “can” gather.
歐洲聯盟的通用資料保護條例(GDPR)已於2018年5月25日生效。雖然GDPR針對歐洲國家,但一些企業為了簡化業務,促進良好客戶關係,也宣告他們將為所有客戶提供同樣的隱私保護,不管他們來自哪個國家。然而,這種方法並不是歐洲以外的企業和組織所採取的基本立場,許多公司選擇修改他們的同意程式和資料處理流程,並僱傭新員工,這一切做法都是為了使他們“可以”最大化收集私人資料。
Businesses relying on “assumed consent” for all processing operations can no longer make this assumption when doing business with Europeans. Businesses have had to implement new procedures for notices and receiving consent, and many are currently trying to plan for what’s next, while simultaneously struggling with problems in the present.
所有業務運作都依賴於“假定同意”的企業,在與歐洲人做生意時,不能再做出假定同意了。企業不得不實施通知和徵求同意的新程式,許多企業目前正在努力為下一步做計劃,同時也在努力解決當前問題。
Several organizations have assigned GDPR responsibilities to their Chief Security Officers. (The CDC should be responsible for having these changes made.) Though GDPR fines can be quite large (fines can be as high as 20 million Euros or four percent of the annual global turnover, depending on which is higher), many businesses, especially in the United States, are still not prepared.
一些組織已經將GDPR的責任交給了他們的首席安全官(首席安全官應對這些變化負責)。雖然GDPR的罰款金額可能相當大(罰款金額可能高達2000萬歐元或4%的年度全球營業額,這取決於兩者哪個更高),但許多企業,尤其在美國,仍然沒有準備好。
In 2019 the U.S. government could make an effort to imitate the GDPR and hold businesses accountable for how they handle privacy and personal data. In the short term, it would make sense for online businesses to begin implementing new privacy policies or simply make the shift to a GDPR policy format. Making the shift now, and advertising it on the company’s website, has the potential to develop a good relationship with the customer base.
2019年,美國政府可能會努力模仿GDPR,讓企業對他們如何處理隱私和個人資料負責。從短期來看,線上企業開始實施新的隱私政策,或者乾脆改用GDPR政策模式,都是有意義的。現在,在公司網站上做廣告,有可能與客戶建立良好的關係。
5G Not Likely in 2019
2019年5G不太可能實現
Switching to a 5G (fifth generation) system is expensive and comes with some potential issues. While the expense may not stop 5G implementation in 2019, other problems might.
切換到5G(第五代)系統相當昂貴,並且存在一些潛在的問題。雖然高昂的費用可能不會阻擋2019年實施5G的步伐,但其他問題也許會。
Though the U.S. Federal Government completely supports the implementation of a 5G system, some communities have passed ordinances halting the installation of a 5G infrastructure. It seems likely this will become a standard practice for blocking 5G systems.
雖然美國聯邦政府完全支援實施5G系統,但一些社群已經通過了阻止5G基礎設施安裝的條例,這似乎將成為阻止5G系統的標準做法。
An additional factor blocking 5G is a decision by the United States FCC, which eliminated regulations supporting net neutrality. Net neutrality offered internet providers, and their users, a level playing field, and promoted competition. Net neutrality is the concept that internet providers should treat all data, and people, equally, without discrimination and without charging different users different rates based on such things as speed, content, websites, platforms, or applications.
阻礙5G的另一個因素是美國聯邦通訊委員會(FCC)的一項決定,該決定取消了支援網路中立性的法規。網路中立為網際網路提供商及其使用者提供了一個公平的競爭環境,促進公平競爭。網路中立性是指網際網路供應商應該平等對待所有資料和人,不歧視,不根據速度、內容、網站、平臺或應用程式向不同的使用者收取不同的費用。
Hybrid Clouds Will Gain in Popularity
混合雲將或將普及
Clouds and Hybrid Clouds have been steadily gaining in popularity and will continue to do so. While an organization may want to keep some data secure in its own data storage, the tools and benefits of a hybrid system make it worth the expense. Hybrid Clouds combine an organization’s private Cloud with the rental of a public Cloud, offering the advantages of both. Expect a significant increase in the use of Hybrid Clouds in 2019.
雲和混合雲一直在穩步增長,並將繼續這樣做。雖然企業可能希望在自己的資料儲存中保持某些資料的安全性,但是混合系統的工具和優點使其值得付出代價。混合雲將企業的私有云與租用公共雲結合在一起,提供了兩者的優點,預計混合雲的使用將在2019年顯著增加。
Generally speaking, the applications and data in a Hybrid Cloud can be transferred back and forth between on-premises (private) Clouds and IaaS (public) Clouds, providing more flexibility, deployment options, and tools. A public Cloud, for example, can be used for the high-volume, low-security projects, such as email advertisements, and the on-premises Cloud can be used for more sensitive projects, such as financial reports.
一般來說,混合雲中的應用程式和資料可以在本地雲(私有)和IaaS雲(公共)之間來回傳輸,從而提供更多的靈活性、部署選項和工具。例如,公共雲可以用於高容量、低安全性的專案,如電子郵件廣告,而本地雲可以用於更敏感的專案,如財務報告。
The term “Cloud Bursting” is a feature of Hybrid Cloud systems and describes an application that is running within the on-premises Cloud, until there is a spike in the demand (think Christmas shopping online, or filing taxes), and then the application will “burst” through, into the public Cloud, and tap into additional resources.
“雲爆發”這一術語是混合雲系統的功能,描述了一個執行在本地雲上的應用程式,當該應用程式遇到一個激增的需求(例如聖誕節網上購物,或申請稅等情況),通過“爆發”至公共雲,攫取和利用額外的資源。
注:《外媒預測2019年大資料趨勢》來源於 ofollow,noindex"> Dataversity (點選檢視原文)。資料觀編譯/黃玉葉,轉載請註明譯者和來源。
責任編輯:李蘭鬆